Usually blocks end with a branch, which also consumes all flags, but in
case the block is aborted early (or any other reason to not finish the
block on a branch), it will result in only a subset of flags being
generated, which causes problems in a couple of games.
This performs an out of bounds flag read, which is incorrect
This is based on the MIPS dynarec (more or less) with some ARM
borrowings. Seems to be quite fast (under my testing fixed results:
faster than ARM on A1 but not a lot faster than the interpreter on
Android Snapdragon 845) but still some optimizations are missing at the
moment.
Seems to pass my testing suite and compatibility wise is very similar to
arm.
This adds support for x86-64 dynarec both on Windows and Linux. Since
they have different requirements there's some macro magic in the stubs
file.
This also fixes x86 support in some cases: stack alignment requirements
where violated all over. This allows the usage of clang as a compiler
(which has a tendency to use SSE instructions more often than gcc does).
To support this I also reworked the mmap/VirtualAlloc magic to make sure
JIT arena stays close to .text.
Fixed some other minor issues and removed some unnecessary JIT code here
and there. clang tends to do some (wrong?) assumptions about global
symbols alignment.
This gets rid of the bloated memmap_win32.c in favour of a much simpler
wrapper. This will be needed in the future since the wrapper does not
support MAP_FIXED maps (necessary for some platforms)
This fixes a race condition that happens whenever the ROM cache is flushed but
the RAM one is not, causing any SWI calls (implemented as direct branches) to
jump to random instructions.
The fix could be to flush both caches at the same time (~expensive on
low mem platforms), use indirect jumps (a bit expensive) or emit the SWI
handler below the watermark to ensure it is never flushed. This is cheap
and effective, requires minimal changes.
This removes ram_block_ptrs and encodes the pointer directly in the
block tag. Saves ~256KB at no performance cost.
Drawback is that it limits the ram cache size to 512KB (we were using
768KB before). Should not be a problem since most games use less than
32KB of cache anyway.
Fixed ARM routines accordingly.
This fixes issue #133
The explanation is as follows. Most blocks end on an inconditional
jump/branch, but there's two cases where this doesn't happen:
translation gates and when we hit MAX_EXITS. These are very uncommon
cases and therefore more prone to hidden bugs.
When this happens, the last instruction emits a conditional jump (via
arm_conditional_block_header macro) which is patched by a later
instruction via generate_branch_patch_conditional. Typically the last
unconditional branch will trigger the patching condition (which is
aproximately condition != last_condition), but in these two cases it
might not happen, leaving an unpatched branch. This makes x86 and ARM
dynarecs crash in interesting ways (although it might not crash
depending on $stuff and make the bug even harder to track).
Cleans up a ton of whitespace in cpu.c (like 100KB!) and improves
readability of some massive decode statements.
Added an optimization for PC-relative loads (pool load) in ROM (since
it's read only and cannot possibily change) that directly emits an
immediate load. This is way faster, specially in MIPS/x86, ARM can be
even faster if we rewrite the immediate load macros to also use a pool.
Seems that using the __atribute__ magic for sections is not the best way
of doing this, since it injects some default atributtes that collide
with the user defined ones. Using assembly is far easier in this case.
Reworked definitions a bit to make it easier to import from assembly.
Also wrapped stuff around macros for easy and less verbose
implementation of the symbol prefix issue.
This saves a few cycles in MIPS and simplifies a bit the core.
Removed the write map, only affects interpreter performance very
minimally. Rewired ARM and x86 handlers to support direct access to
I/EWRAM (and VRAM on ARM) to compensate. Overall performance is slightly
better but code is cleaner and allows for further improvements in the
dynarecs.
Added a more thorough cache cleanup for reset/mode-change too.
Fixed the mmap initialization that ends up leaking memory.
Minor x86 asm fixes for Android.
This is not really necessary since it can share area with ROM.
Performance impact should be very minimal (haven't noticed it myself)
and could be compensated (even by a positive offset) if we bump the ROM
cache area size.
Tested with several dynarecs.
This allows us to emit the handlers directly in a more efficient manner.
At the same time it allows for an easy fix to emit PIC code, which is
necessary for libretro. This also enables more platform specific
optimizations and variations, perhaps even run-time multiplatform
support.
Turns out most of that file ends up in JIT section, which is RWX and not
a very nice way to run code really (security issues aside).
This also makes possible to build that file with -ggdb otherwise it
complains about stuff.
Turns out there were a couple of very interesting and hard to track
bugs. A missing comma made the reg list too short, leaving the 31th
element at the mercy of the linker ordering algorithm, which seems to
work in some cases depending on the compiler version.
Also the cache flush code seemed not to work on my machine (OGA),
not sure why it wored in the past :/