This rewrites the way that CPU alerts work, making them a bitmap (since
multiple alerts can happen simultaneously, like SMC and IRQ). This
doesn't really fix many games but improves accuracy overall and improves
performance on some I/O writes (the ones without side effects).
The IRQ raising is now decoupled and explicitely called via a new
function (check_and_raise_interrupts) to avoid issues such as invalid
CPSR values (doesn't seem to bother most games!). There's more side
effects missing, so this just lays the ground for more fixes.
This fixes many games and brings it closer to what the other dynarecs
do. Without this patch there's register corruption if the memory write
triggers an IRQ (since raise_interrupt mangles the registers that have
not been saved).
At this point there's still an issue with CPSR saving but that affects
aso the other dynarecs.
This fixes ROM swapping for x86/64, arm32 and arm64. On top of that it
improves speed by removing unnecessary slow paths on small ROMs for
arm32 and mips. If the ROM can fit in RAM, it will emit more efficient
code that assumes the ROM is fully loaded.
For low-memory Linux platforms it would be better to use some mmap'ed
ROM, that way the OS would transparently handle page swapping, which is
perhaps faster. Will investigate and follow up on this in a separate
commit.
gpsp doesn't differentiate between USER and SYSTEM mode, most likely
cause it is not that important for most games. This implements the modes
correctly and adds checks for privileged operations. Still some
bugs/hacks but it mostly fixes CPSR/SPSR reads/writes.
To implement PSR writes we are using a more refined masks and force mode
bit num. 4 to always be one. Reserved bits are forced to zero (this
needs to be validated on a real device).
A missing usermode check (present in MIPS and x86) causes user-mode
returns to attempt returning into another CPU mode, which causes a bunch
of issues, mainly going into an invalid CPU state and corrupting some
registers.
This fixes a couple of games only (Colin McRae Rally 2, TOCA World
Touring, Starsky & Hutch ...)
This gets rid of the bloated memmap_win32.c in favour of a much simpler
wrapper. This will be needed in the future since the wrapper does not
support MAP_FIXED maps (necessary for some platforms)
This removes one branch and emits the region selection code directly in
the JIT cache. Trading memory for speed (although it's not a big
improvement).
This is a step towards enabling MMAP caches in ARM (due to the 32MB
offset limitation in branches).
This introduced a potential race condition between the start of a SWI
and the BIOS handling the exception by returning to system mode. During
this ~10 instruction window, having an IRQ that issues a SWI causes bad
behaviour that results in crashes or other weirdness.
Fixes a couple of games and potentially many weird and obscure bugs here
and there (hard to reproduce sometimes).
Removed the last bits of text relocations by moving all relevant RAMs to
the stub reachable area. This should be as fast or even faster than
previous code.
This fixes unallowed BIOS accesses (outside from BIOS) which fixes some
games like Silent Scope but also Zelda (fixes rolling as reported by
neonloop!)
This removes ram_block_ptrs and encodes the pointer directly in the
block tag. Saves ~256KB at no performance cost.
Drawback is that it limits the ram cache size to 512KB (we were using
768KB before). Should not be a problem since most games use less than
32KB of cache anyway.
Fixed ARM routines accordingly.
This might make a handful games slightly slower (but on the upper side
they work now instead of crashing or restarting).
Also while at it, fix some minor stuff in arm stubs for speed.
Using an invalid SP makes Vita crash (for an unkown reason) and makes
things like C signal handlers crash (luckily Retroarch doesn't use
them). It is also a violation of the ABI and not a great idea.
Recycled some little used registers to free SP. Perf should be roughly
the same.
Seems that using the __atribute__ magic for sections is not the best way
of doing this, since it injects some default atributtes that collide
with the user defined ones. Using assembly is far easier in this case.
Reworked definitions a bit to make it easier to import from assembly.
Also wrapped stuff around macros for easy and less verbose
implementation of the symbol prefix issue.
This saves a few cycles in MIPS and simplifies a bit the core.
Removed the write map, only affects interpreter performance very
minimally. Rewired ARM and x86 handlers to support direct access to
I/EWRAM (and VRAM on ARM) to compensate. Overall performance is slightly
better but code is cleaner and allows for further improvements in the
dynarecs.
This is not really necessary since it can share area with ROM.
Performance impact should be very minimal (haven't noticed it myself)
and could be compensated (even by a positive offset) if we bump the ROM
cache area size.
Tested with several dynarecs.
This removes libco and all the usages of it (+pthreads).
Rewired all dynarecs and interpreter to return after every frame so that
libretro can process events. This required to make dynarec re-entrant.
Dynarecs were updated to check for new frame on every update (IRQ, cycle
exhaustion, I/O write, etc). The performance impact of doing so should
be minimal (and definitely outweight the libco gains). While at it,
fixed small issues to get a bit more perf: arm dynarec was not idling
correctly, mips was using stack when not needed, etc.
Tested on PSP (mips), OGA (armv7), Linux (x86 and interpreter). Not
tested on Android though.