Before I went away on holiday I asked you what you thought I should look at working on for the next release of Daedalus. Over 200 of you replied, and I've greatly enjoyed reading what you've had to say. There were some brilliant suggestions, so many thanks for your contributions.
It seems pretty clear to me that speed is the single biggest issue that most people want to see addressed. Many people also mentioned compatibility and savestate or save game support, but in nowhere near the same kind of numbers as those wanting speed improvements.
Based on your feedback my current plan is to release Daedalus R10 at the end of March, focusing mostly on speed improvements. If I can fit in any easy compatibility fixes, I'll do this too*.
Several people have asked what possibilities remain for optimisation. Here's a short list of things I know need more work:
In many games, a lot of the time spent executing dynamically recompiled code is doing things which can potentially be emulated at a high level. For instance, over 5% of the time spent executing dynarec code in Mario64 is just converting matrices from floating point to fixed point format. Another 4-5% of the time is spent in a loop invalidating areas of the data cache (which is irrelevent in an emulator.)
Some of the most expensive fragments are those which branch to themselves (i.e. those doing many loops). I can optimise for this to avoid loading and flushing cached registers on each iteration through the loop.
I can implement a frameskip option (I had intended to implement this for R9, but forgot!)
I can make use of the Media Engine (as Exophase suggested in conversation, as the ME can't access VRAM, it might make more sense to execute Audio and Display Lists on the main CPU, and run the N64 CPU emulation on the PSP ME)
There are certain situations where I fail to create fragments in the dynamic recompiler - for instance if the code being recompiled writes to a hardware register, this triggers an interrupt and causes fragment generation to be aborted. I should be able to deal with situations such as this more gracefully.
The fragment generator can do a lot more to improve register caching, and eliminating redundant 64-bit operations.
There are many situations where N64 roms busy wait. I detect very simple occurances of this, but not all of them. If I manually identify more complex examples I can have the fragment generator optimise them away.
Some roms are causing the dynarec fragment cache to be repeatedly dumped and recreated (I think Banjo Kazooie is one example of this). Fixing this may just involve tweaking a couple of magic numbers.
I currently optimise memory accesses under the assumption that most accesses are in the range 0x80000000 - 0x80800000, which is incorrect in the case of roms that make heavy use of virtual memory, or access RAM through the mirrored range at 0xa0000000. I can improve the trace recorder to collect information on which range a memory access fell in, and generate code to speculatively optimise for this.
Now that the dynarec engine is producing much better code, the cost of display list processing is becoming more significant, and may finally be worth profiling and optimising.
That's quite a big list, so I doubt I'll be able to work on these things before the end of March, but I think it shows there's still a lot of scope for further optimisation.
-StrmnNrmn
*Just this morning, I figured out why the Expansion Pak support was broken, so Majora's Mask and a couple of other games relying on this are booting correctly now