This removes ram_block_ptrs and encodes the pointer directly in the
block tag. Saves ~256KB at no performance cost.
Drawback is that it limits the ram cache size to 512KB (we were using
768KB before). Should not be a problem since most games use less than
32KB of cache anyway.
Fixed ARM routines accordingly.
This fixes issue #133
The explanation is as follows. Most blocks end on an inconditional
jump/branch, but there's two cases where this doesn't happen:
translation gates and when we hit MAX_EXITS. These are very uncommon
cases and therefore more prone to hidden bugs.
When this happens, the last instruction emits a conditional jump (via
arm_conditional_block_header macro) which is patched by a later
instruction via generate_branch_patch_conditional. Typically the last
unconditional branch will trigger the patching condition (which is
aproximately condition != last_condition), but in these two cases it
might not happen, leaving an unpatched branch. This makes x86 and ARM
dynarecs crash in interesting ways (although it might not crash
depending on $stuff and make the bug even harder to track).
Cleans up a ton of whitespace in cpu.c (like 100KB!) and improves
readability of some massive decode statements.
Added an optimization for PC-relative loads (pool load) in ROM (since
it's read only and cannot possibily change) that directly emits an
immediate load. This is way faster, specially in MIPS/x86, ARM can be
even faster if we rewrite the immediate load macros to also use a pool.
Seems that using the __atribute__ magic for sections is not the best way
of doing this, since it injects some default atributtes that collide
with the user defined ones. Using assembly is far easier in this case.
Reworked definitions a bit to make it easier to import from assembly.
Also wrapped stuff around macros for easy and less verbose
implementation of the symbol prefix issue.
This saves a few cycles in MIPS and simplifies a bit the core.
Removed the write map, only affects interpreter performance very
minimally. Rewired ARM and x86 handlers to support direct access to
I/EWRAM (and VRAM on ARM) to compensate. Overall performance is slightly
better but code is cleaner and allows for further improvements in the
dynarecs.
Added a more thorough cache cleanup for reset/mode-change too.
Fixed the mmap initialization that ends up leaking memory.
Minor x86 asm fixes for Android.
This is not really necessary since it can share area with ROM.
Performance impact should be very minimal (haven't noticed it myself)
and could be compensated (even by a positive offset) if we bump the ROM
cache area size.
Tested with several dynarecs.
This allows us to emit the handlers directly in a more efficient manner.
At the same time it allows for an easy fix to emit PIC code, which is
necessary for libretro. This also enables more platform specific
optimizations and variations, perhaps even run-time multiplatform
support.
Turns out most of that file ends up in JIT section, which is RWX and not
a very nice way to run code really (security issues aside).
This also makes possible to build that file with -ggdb otherwise it
complains about stuff.
Turns out there were a couple of very interesting and hard to track
bugs. A missing comma made the reg list too short, leaving the 31th
element at the mercy of the linker ordering algorithm, which seems to
work in some cases depending on the compiler version.
Also the cache flush code seemed not to work on my machine (OGA),
not sure why it wored in the past :/