This accounts for DMA stealing cycles from the CPU whenever the CPU
triggers a DMA (does not affect H/V blank or sound DMAs).
Works by moving the CPU to a PAUSED state where the cycles are accounted
for, reusing a similar mechanism for HALT/STOP.
Fixes a couple of games, notably GTA that has a DMA/IRQ race condition
(likely a bug really) if cycles are grossly miscalculated.
This rewrites the way that CPU alerts work, making them a bitmap (since
multiple alerts can happen simultaneously, like SMC and IRQ). This
doesn't really fix many games but improves accuracy overall and improves
performance on some I/O writes (the ones without side effects).
The IRQ raising is now decoupled and explicitely called via a new
function (check_and_raise_interrupts) to avoid issues such as invalid
CPSR values (doesn't seem to bother most games!). There's more side
effects missing, so this just lays the ground for more fixes.
This fixes ROM swapping for x86/64, arm32 and arm64. On top of that it
improves speed by removing unnecessary slow paths on small ROMs for
arm32 and mips. If the ROM can fit in RAM, it will emit more efficient
code that assumes the ROM is fully loaded.
For low-memory Linux platforms it would be better to use some mmap'ed
ROM, that way the OS would transparently handle page swapping, which is
perhaps faster. Will investigate and follow up on this in a separate
commit.
gpsp doesn't differentiate between USER and SYSTEM mode, most likely
cause it is not that important for most games. This implements the modes
correctly and adds checks for privileged operations. Still some
bugs/hacks but it mostly fixes CPSR/SPSR reads/writes.
To implement PSR writes we are using a more refined masks and force mode
bit num. 4 to always be one. Reserved bits are forced to zero (this
needs to be validated on a real device).
There's a race condition on CPSR store (only if mode is changed) where,
if an IRQ is pending, the IRQ will be served, but the saved LR value
will be wrong (will skip the return instruction).
Fixed this and improved the logic a bit to make it faster and not use
unnecessary save slots.
This adds support for x86-64 dynarec both on Windows and Linux. Since
they have different requirements there's some macro magic in the stubs
file.
This also fixes x86 support in some cases: stack alignment requirements
where violated all over. This allows the usage of clang as a compiler
(which has a tendency to use SSE instructions more often than gcc does).
To support this I also reworked the mmap/VirtualAlloc magic to make sure
JIT arena stays close to .text.
Fixed some other minor issues and removed some unnecessary JIT code here
and there. clang tends to do some (wrong?) assumptions about global
symbols alignment.
This gets rid of the bloated memmap_win32.c in favour of a much simpler
wrapper. This will be needed in the future since the wrapper does not
support MAP_FIXED maps (necessary for some platforms)
Removed the last bits of text relocations by moving all relevant RAMs to
the stub reachable area. This should be as fast or even faster than
previous code.
Seems that using the __atribute__ magic for sections is not the best way
of doing this, since it injects some default atributtes that collide
with the user defined ones. Using assembly is far easier in this case.
Reworked definitions a bit to make it easier to import from assembly.
Also wrapped stuff around macros for easy and less verbose
implementation of the symbol prefix issue.
This saves a few cycles in MIPS and simplifies a bit the core.
Removed the write map, only affects interpreter performance very
minimally. Rewired ARM and x86 handlers to support direct access to
I/EWRAM (and VRAM on ARM) to compensate. Overall performance is slightly
better but code is cleaner and allows for further improvements in the
dynarecs.
Added a more thorough cache cleanup for reset/mode-change too.
Fixed the mmap initialization that ends up leaking memory.
Minor x86 asm fixes for Android.
This removes libco and all the usages of it (+pthreads).
Rewired all dynarecs and interpreter to return after every frame so that
libretro can process events. This required to make dynarec re-entrant.
Dynarecs were updated to check for new frame on every update (IRQ, cycle
exhaustion, I/O write, etc). The performance impact of doing so should
be minimal (and definitely outweight the libco gains). While at it,
fixed small issues to get a bit more perf: arm dynarec was not idling
correctly, mips was using stack when not needed, etc.
Tested on PSP (mips), OGA (armv7), Linux (x86 and interpreter). Not
tested on Android though.