Cleans up a ton of whitespace in cpu.c (like 100KB!) and improves
readability of some massive decode statements.
Added an optimization for PC-relative loads (pool load) in ROM (since
it's read only and cannot possibily change) that directly emits an
immediate load. This is way faster, specially in MIPS/x86, ARM can be
even faster if we rewrite the immediate load macros to also use a pool.
Seems that using the __atribute__ magic for sections is not the best way
of doing this, since it injects some default atributtes that collide
with the user defined ones. Using assembly is far easier in this case.
Reworked definitions a bit to make it easier to import from assembly.
Also wrapped stuff around macros for easy and less verbose
implementation of the symbol prefix issue.
This saves a few cycles in MIPS and simplifies a bit the core.
Removed the write map, only affects interpreter performance very
minimally. Rewired ARM and x86 handlers to support direct access to
I/EWRAM (and VRAM on ARM) to compensate. Overall performance is slightly
better but code is cleaner and allows for further improvements in the
dynarecs.
Added a more thorough cache cleanup for reset/mode-change too.
Fixed the mmap initialization that ends up leaking memory.
Minor x86 asm fixes for Android.
This removes libco and all the usages of it (+pthreads).
Rewired all dynarecs and interpreter to return after every frame so that
libretro can process events. This required to make dynarec re-entrant.
Dynarecs were updated to check for new frame on every update (IRQ, cycle
exhaustion, I/O write, etc). The performance impact of doing so should
be minimal (and definitely outweight the libco gains). While at it,
fixed small issues to get a bit more perf: arm dynarec was not idling
correctly, mips was using stack when not needed, etc.
Tested on PSP (mips), OGA (armv7), Linux (x86 and interpreter). Not
tested on Android though.