This uses BSON as savestate format, to allow external tools to parse it
(so that we can add proper test of the states). The BSON is not 100%
correct according to spec (no ordered keys) but can be parsed by most
libraries.
This fixes also a bug in the savestate palette color recalculation that
was wrongly overwritting the original palette (which could cause some
problems on some games).
Also fixes some potential issues by serializing some more stuff and
cleans up unused stuff.
Testing shows that states look good and there's only minor differences
in audio ticks, related to buffer sizes (since buffer flushes are
de-synced from video frames due to different frequency).
Seems that, on IRQ, it is assumed that the PC will change (which happens
most of the time). However if the IRQ is masked it resumes execution.
On masked DMA IRQ the interpreter jumps to alert without incrementing
the instruction pointer (right after).
Removed the last bits of text relocations by moving all relevant RAMs to
the stub reachable area. This should be as fast or even faster than
previous code.
This improves the existing on-demand ROM paging and also breaks down ROM
buffers into 1MB blocks for platforms with memory fragmentation issues.
Fixed some potential RTC reg issue in said platforms too.
Fixed page pinning on interpreter (would crash due to LRU evictions).
This patch adds big-endian compatibility in gpsp (in general but only
for the interpreter). There's no performance hit for little-endian
platforms (should be a no-op) and only add a small overhead in memory
accesses for big-endian platforms.
Most memory accesses are wrapped with a byteswap instruction and I/O reg
accesses are also rewired for proper access (using macros). Video
rendering has been fixed to also do byteswaps but there's a couple of
games and rendering modes that still seem broken (but they amount to
less than 20 games in my tests with 1K ROMs).
This also adds build rules and CI for NGC/WII/WIIU (untested)
Cleans up a ton of whitespace in cpu.c (like 100KB!) and improves
readability of some massive decode statements.
Added an optimization for PC-relative loads (pool load) in ROM (since
it's read only and cannot possibily change) that directly emits an
immediate load. This is way faster, specially in MIPS/x86, ARM can be
even faster if we rewrite the immediate load macros to also use a pool.
An address check was missing to read aligned 32 (stm/ldm) data from
high mem areas (0xX0000000). This fixes SM4 EU that for some reason has
some weird memory access (perhaps a bug?)
This saves a few cycles in MIPS and simplifies a bit the core.
Removed the write map, only affects interpreter performance very
minimally. Rewired ARM and x86 handlers to support direct access to
I/EWRAM (and VRAM on ARM) to compensate. Overall performance is slightly
better but code is cleaner and allows for further improvements in the
dynarecs.
Add options to select whether to boot from BIOS (default is no, as it is
now) and whether to use the original bios or the builtin one (default is
auto, which tries to use the official but falls back to the builtin if
not found).
This is not really necessary since it can share area with ROM.
Performance impact should be very minimal (haven't noticed it myself)
and could be compensated (even by a positive offset) if we bump the ROM
cache area size.
Tested with several dynarecs.
This removes libco and all the usages of it (+pthreads).
Rewired all dynarecs and interpreter to return after every frame so that
libretro can process events. This required to make dynarec re-entrant.
Dynarecs were updated to check for new frame on every update (IRQ, cycle
exhaustion, I/O write, etc). The performance impact of doing so should
be minimal (and definitely outweight the libco gains). While at it,
fixed small issues to get a bit more perf: arm dynarec was not idling
correctly, mips was using stack when not needed, etc.
Tested on PSP (mips), OGA (armv7), Linux (x86 and interpreter). Not
tested on Android though.