This fixes a race condition that happens whenever the ROM cache is flushed but
the RAM one is not, causing any SWI calls (implemented as direct branches) to
jump to random instructions.
The fix could be to flush both caches at the same time (~expensive on
low mem platforms), use indirect jumps (a bit expensive) or emit the SWI
handler below the watermark to ensure it is never flushed. This is cheap
and effective, requires minimal changes.
Removed the last bits of text relocations by moving all relevant RAMs to
the stub reachable area. This should be as fast or even faster than
previous code.
Cleans up a ton of whitespace in cpu.c (like 100KB!) and improves
readability of some massive decode statements.
Added an optimization for PC-relative loads (pool load) in ROM (since
it's read only and cannot possibily change) that directly emits an
immediate load. This is way faster, specially in MIPS/x86, ARM can be
even faster if we rewrite the immediate load macros to also use a pool.
Using an invalid SP makes Vita crash (for an unkown reason) and makes
things like C signal handlers crash (luckily Retroarch doesn't use
them). It is also a violation of the ABI and not a great idea.
Recycled some little used registers to free SP. Perf should be roughly
the same.
I think this does not make a difference at all in the code, since PC is
treated in a special way anyway (reloaded with an immediate when read
and treated as an indirect branch when written). However for the sake of
completeness I'm undoing what I did. (The comma fix stays :P)
Turns out there were a couple of very interesting and hard to track
bugs. A missing comma made the reg list too short, leaving the 31th
element at the mercy of the linker ordering algorithm, which seems to
work in some cases depending on the compiler version.
Also the cache flush code seemed not to work on my machine (OGA),
not sure why it wored in the past :/