Usually blocks end with a branch, which also consumes all flags, but in
case the block is aborted early (or any other reason to not finish the
block on a branch), it will result in only a subset of flags being
generated, which causes problems in a couple of games.
This performs an out of bounds flag read, which is incorrect
This is based on the MIPS dynarec (more or less) with some ARM
borrowings. Seems to be quite fast (under my testing fixed results:
faster than ARM on A1 but not a lot faster than the interpreter on
Android Snapdragon 845) but still some optimizations are missing at the
moment.
Seems to pass my testing suite and compatibility wise is very similar to
arm.
There's a race condition on CPSR store (only if mode is changed) where,
if an IRQ is pending, the IRQ will be served, but the saved LR value
will be wrong (will skip the return instruction).
Fixed this and improved the logic a bit to make it faster and not use
unnecessary save slots.
Converted palette was missing a byte swap after state unserialization
This should likely fix WiiU graphics when using rewind and similar
techniques that rely on states.
This causes the MSB to be copied into the green channe LSB, causing a
very subtle (almost imposible to see) color distortion.
Dynarecs use their own code path so are not afected.
This adds support for x86-64 dynarec both on Windows and Linux. Since
they have different requirements there's some macro magic in the stubs
file.
This also fixes x86 support in some cases: stack alignment requirements
where violated all over. This allows the usage of clang as a compiler
(which has a tendency to use SSE instructions more often than gcc does).
To support this I also reworked the mmap/VirtualAlloc magic to make sure
JIT arena stays close to .text.
Fixed some other minor issues and removed some unnecessary JIT code here
and there. clang tends to do some (wrong?) assumptions about global
symbols alignment.
This gets rid of the bloated memmap_win32.c in favour of a much simpler
wrapper. This will be needed in the future since the wrapper does not
support MAP_FIXED maps (necessary for some platforms)
This removes one branch and emits the region selection code directly in
the JIT cache. Trading memory for speed (although it's not a big
improvement).
This is a step towards enabling MMAP caches in ARM (due to the 32MB
offset limitation in branches).