Cleans up a ton of whitespace in cpu.c (like 100KB!) and improves
readability of some massive decode statements.
Added an optimization for PC-relative loads (pool load) in ROM (since
it's read only and cannot possibily change) that directly emits an
immediate load. This is way faster, specially in MIPS/x86, ARM can be
even faster if we rewrite the immediate load macros to also use a pool.
Using an invalid SP makes Vita crash (for an unkown reason) and makes
things like C signal handlers crash (luckily Retroarch doesn't use
them). It is also a violation of the ABI and not a great idea.
Recycled some little used registers to free SP. Perf should be roughly
the same.
I think this does not make a difference at all in the code, since PC is
treated in a special way anyway (reloaded with an immediate when read
and treated as an indirect branch when written). However for the sake of
completeness I'm undoing what I did. (The comma fix stays :P)
Turns out there were a couple of very interesting and hard to track
bugs. A missing comma made the reg list too short, leaving the 31th
element at the mercy of the linker ordering algorithm, which seems to
work in some cases depending on the compiler version.
Also the cache flush code seemed not to work on my machine (OGA),
not sure why it wored in the past :/