Just a quick update on the dynarec status, as I know a lot of people are more interested in this than the grizly details of branch delay instructions
Last weekend (13/14 May) I managed to assemble the fragment buffers into native x86 code, and execute this dynamically. I spent some time debating whether to target MIPS or Intel initially, but I decided that it would be a lot easier for me to debug the code generation on the PC than it would be to debug code gen on the PSP.
In the end I'm glad I started with the PC as it allowed me to fix a number of hairy problems without going down the torturous path of debugging self modifying code on the PSP with just a few printf() statements to help track down any problems.
With this version of the dynarec, 200KB of N64 code was now generating just 2MB of x86 assembly (i.e. an expansion ratio of around 1000%). The PC version was running around 60% faster with dynarec enabled than with it disabled, which is a pretty significant speedup (although this is still very early in the process).
What's also important is that this is before I've done any real optimisation of the generated code. For each instruction I'm still calling the generic instruction handler which has the overhead of figuring out which source registers to use, which register is the destination etc. The *real* speedup comes from generating code to handle op codes explicitly, as you remove all this decoding overhead along with the overhead of jumping to another function. Once you've removed most of the generic instruction handling you can start looking at caching register values to minimise the amount of memory that's being moved around.
With the PC version up and running fairly successfully, I've spent this weekend getting the PSP code generation working. I don't want to go into too many details (as I want to go into more depth in future posts), but I know people are keen to hear some news about how this is going.
I got the basic code generation working on Saturday morning (thankfully I'd already resolved most of the tricky issues in developing the x86 version the previous weekend). I spent most of Saturday afternoon fixing some really horrible instruction cache related bugs. I'm still not 100% sure I've fixed them, but it seems very stable at the moment. At the moment I'm at the same stage with the PSP version of the dynarec that I was with the PC version last weekend - the code generation is running fine (and executing on the PSP without crashing more importantly
but I've only just started looking at optimising things. It's still too early to speculate on numbers for the performance improvement it will give. Currently it's running around 10% faster with dynarec enabled, but it's still very early days.
More soon.
-StrmnNrmn