Turns out there's a couple of inaccuracies that do affect a couple of
games. Most of them are buggy games but emulating these accesses
correctly helps jumping over some bugs.
This rewrites the way that CPU alerts work, making them a bitmap (since
multiple alerts can happen simultaneously, like SMC and IRQ). This
doesn't really fix many games but improves accuracy overall and improves
performance on some I/O writes (the ones without side effects).
The IRQ raising is now decoupled and explicitely called via a new
function (check_and_raise_interrupts) to avoid issues such as invalid
CPSR values (doesn't seem to bother most games!). There's more side
effects missing, so this just lays the ground for more fixes.
Whenever an interrupt is pending and interrupts are disabled (via
IME/IE), an IE/IME write that re-enables IRQs will fail to raise an IRQ.
This makes some games hang. Most games seem to use CPSR.IRQ to enable
and disable interruts, so they are not affected. However some others use
IME/IE (or all of them), causing these deadlocks and some race
conditions.
This fixes a bunch of games that did not crash but would "hang" in some
interesting ways.
In thumb mode, a store that triggers a DMA and its correspondoing DMA
IRQ (since emulate DMAs being instantanious, which is another can of
worms tho...) overrwrites the PC with the PC-next value (disregarding
the IRQ handler address). This causes quite a few bugs. I'd expect it
causing way more bugs, but it doesnt...?
Anyway this fixes a few games like Buffy, Woody, Medabots, Super
Duper Sumos and Power Rangers
gpsp doesn't differentiate between USER and SYSTEM mode, most likely
cause it is not that important for most games. This implements the modes
correctly and adds checks for privileged operations. Still some
bugs/hacks but it mostly fixes CPSR/SPSR reads/writes.
To implement PSR writes we are using a more refined masks and force mode
bit num. 4 to always be one. Reserved bits are forced to zero (this
needs to be validated on a real device).
This introduced a potential race condition between the start of a SWI
and the BIOS handling the exception by returning to system mode. During
this ~10 instruction window, having an IRQ that issues a SWI causes bad
behaviour that results in crashes or other weirdness.
Fixes a couple of games and potentially many weird and obscure bugs here
and there (hard to reproduce sometimes).
This uses BSON as savestate format, to allow external tools to parse it
(so that we can add proper test of the states). The BSON is not 100%
correct according to spec (no ordered keys) but can be parsed by most
libraries.
This fixes also a bug in the savestate palette color recalculation that
was wrongly overwritting the original palette (which could cause some
problems on some games).
Also fixes some potential issues by serializing some more stuff and
cleans up unused stuff.
Testing shows that states look good and there's only minor differences
in audio ticks, related to buffer sizes (since buffer flushes are
de-synced from video frames due to different frequency).
Seems that, on IRQ, it is assumed that the PC will change (which happens
most of the time). However if the IRQ is masked it resumes execution.
On masked DMA IRQ the interpreter jumps to alert without incrementing
the instruction pointer (right after).
Removed the last bits of text relocations by moving all relevant RAMs to
the stub reachable area. This should be as fast or even faster than
previous code.
This improves the existing on-demand ROM paging and also breaks down ROM
buffers into 1MB blocks for platforms with memory fragmentation issues.
Fixed some potential RTC reg issue in said platforms too.
Fixed page pinning on interpreter (would crash due to LRU evictions).
This patch adds big-endian compatibility in gpsp (in general but only
for the interpreter). There's no performance hit for little-endian
platforms (should be a no-op) and only add a small overhead in memory
accesses for big-endian platforms.
Most memory accesses are wrapped with a byteswap instruction and I/O reg
accesses are also rewired for proper access (using macros). Video
rendering has been fixed to also do byteswaps but there's a couple of
games and rendering modes that still seem broken (but they amount to
less than 20 games in my tests with 1K ROMs).
This also adds build rules and CI for NGC/WII/WIIU (untested)
Cleans up a ton of whitespace in cpu.c (like 100KB!) and improves
readability of some massive decode statements.
Added an optimization for PC-relative loads (pool load) in ROM (since
it's read only and cannot possibily change) that directly emits an
immediate load. This is way faster, specially in MIPS/x86, ARM can be
even faster if we rewrite the immediate load macros to also use a pool.
An address check was missing to read aligned 32 (stm/ldm) data from
high mem areas (0xX0000000). This fixes SM4 EU that for some reason has
some weird memory access (perhaps a bug?)
This saves a few cycles in MIPS and simplifies a bit the core.
Removed the write map, only affects interpreter performance very
minimally. Rewired ARM and x86 handlers to support direct access to
I/EWRAM (and VRAM on ARM) to compensate. Overall performance is slightly
better but code is cleaner and allows for further improvements in the
dynarecs.
Add options to select whether to boot from BIOS (default is no, as it is
now) and whether to use the original bios or the builtin one (default is
auto, which tries to use the official but falls back to the builtin if
not found).
This is not really necessary since it can share area with ROM.
Performance impact should be very minimal (haven't noticed it myself)
and could be compensated (even by a positive offset) if we bump the ROM
cache area size.
Tested with several dynarecs.
This removes libco and all the usages of it (+pthreads).
Rewired all dynarecs and interpreter to return after every frame so that
libretro can process events. This required to make dynarec re-entrant.
Dynarecs were updated to check for new frame on every update (IRQ, cycle
exhaustion, I/O write, etc). The performance impact of doing so should
be minimal (and definitely outweight the libco gains). While at it,
fixed small issues to get a bit more perf: arm dynarec was not idling
correctly, mips was using stack when not needed, etc.
Tested on PSP (mips), OGA (armv7), Linux (x86 and interpreter). Not
tested on Android though.