Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New 3DS CHD Performance Degraded #575

Closed
MarioKartFan opened this issue Nov 7, 2021 · 38 comments
Closed

New 3DS CHD Performance Degraded #575

MarioKartFan opened this issue Nov 7, 2021 · 38 comments
Labels
Performance regression Performance becoming worse over time

Comments

@MarioKartFan
Copy link

Returned to this port on my New 3DS after a few months of absence. Last version I regularly used was probably somewhere around 1.9.9 or 1.9.10.

I could not believe how poor performance had become in Final Fantasy IX. Began seeing frame tearing running through towns, which never happened previously; far more stutters in fights, sound crackling in movies (after fast forwarding briefly) etc. There has always been a slight sound stutter at the start of fights, going back to when Justin Weiss first added async rendering support. But the issues that exist now are far worse.

I checked all settings, deleted configs and set everything up properly. Issues remained.

Finally tried using PBP instead of CHD. Fixed all problems immediately.

This surprised me as I would have expected CHD to provide similar performance. I then did a very silly performance test. I started a new game using a CHD and a PBP. Using fast forward, the introduction ended with Zidane control in 1:10:15 on CHD and 1:05:01 on PBP. Not saying this is scientific at all but just another data point.

@gameblabla
Copy link
Collaborator

gameblabla commented Nov 10, 2021

It would be helpful to know which commit exactly caused the performance issue.
However, i must add that CHD support from late 2020 up until september 2021 was not even working properly on some games on the 3DS due to a threading issue with the CDDA code.

CHD does have a performance cost on the CPU, mostly related on decompressing LZMA2 and FLAC.
What's going on here is that, since the emulator may have gotten slower over time, it has a domino effect on everything else.

Possible suspects are

CDROM timing changes : 068a613
That commit fixed a few games that were not booting or playing at some point but it sort of made the ADPCM sectors in some games choppier. There's a separate issue about it but a partial fix has been done for now.

Icache interpreter merge : 7a81171
This mostly affects the interpreter but given that it's now at least compiled, this could increase memory usage on lower end platforms.

ARM : Always look up verify_dirty literals from offsets by neonloop 12aa995

This is the commit that's probably affecting you performance wise.
This fixes a crash issue that happened on ARMv5 (and supposedly ARMv6 as well) that could occur when playing Final Fantasy 7.
However because it has to check twice, that function is now twice as slower.

It's possible however, that the performance being degraded could be caused by a compiler upgrade, if they did any.
I don't think that's the case but a regression could happen.
It would be helpful if you could try the nightly build for each commit i submitted and see if one in particular affects it.

EDIT: You also mentionned that you had a new 3DS and not an old 3DS. It's possible that the CDDA code made the new 3DS models slower as it used to be threaded. However because of #521 and Kyuukentai's CDROM delay, the threaded CDDA code was eventually removed as it caused crashes and also caused performance issues on platforms with just one CPU/core.

@gameblabla gameblabla added the Performance regression Performance becoming worse over time label Nov 10, 2021
@notaz
Copy link
Collaborator

notaz commented Oct 24, 2023

Is this still an issue nowadays?

@rtissera
Copy link

rtissera commented Sep 5, 2024

Older CHDv5 use LZMA + FLAC
Recent MAME and latest libchdr support ZSTD which should be at least 10x times faster to decompress than LZMA (FLAC CPU cost is very low) for roughly same file size (a bit bigger, let's be honest).

@notaz
Copy link
Collaborator

notaz commented Sep 5, 2024

libchdr was updated already (notaz#339) but I have no way to test on N3DS.

@rtissera
Copy link

rtissera commented Sep 5, 2024

Well I can only recommend @MarioKartFan to keep PBP or use zstd chd

@InquisitiveCoder
Copy link

MiSTer FPGA can also have stuttering issues with CHDs when using CD speed hacks, so I did some digging on this topic recently. If you just want the solution, try recreating the CHD with chdman createcd -hs 9792 -c cdzs,cdfl and if that's still too slow, reduce the hunk size even further to 7344 or 4896. I made a little Bash script to convert my PSX collection to these settings, it can be run from Git Bash: https://github.com/InquisitiveCoder/mkchd

My understanding from snooping around in the MiSTer and libchdr source code is that before the emulator can read a sector off the CHD file, it has to read the compressed hunk off of the file, then decompress the entire thing. After that, it can reuse the same decompressed hunk to read the next couple of the sectors "for free", before this process repeats itself. Hopefully by then the OS will have pre-fetched the next couple of pages from the disk/SD card.

The hunk size is important because it determines how many 4 KB pages the compressed hunk will span on the filesystem (and the OS's page cache), and because bigger hunks means more data to decompress. chdman defaults to 8 sectors or 19,584 bytes; this is too large to get full speed on MiSTer FPGA even with Zstd compression. Final Fantasy IX discs compress down to about 60% of their original size, so a hunk of that size would span three 4 KB pages.

The other issue is the compression used. chdman will throw every algorithm at the CD trying to get the file size down as much as possible, but anything besides FLAC for audio plus either Zstd (newer and faster but less widely supported) or Zlib/Deflate (old reliable) is going to slow decompression down a lot for very little size gain.

A hunk size of 4 sectors/9792 bytes was fast enough to get the full x8 CD speed on MiSTer reading the file off of an SD card, even with Zlib compression instead of the faster Zstd. That's small enough to go down from 3 pages to 2, and some games can squeeze down to one page. It's also ~10 KB less that needs to be decompressed. Additionally, the size of some games can blow up at lower hunk sizes; when I tested different hunk sizes on my collection of PSX games, 9792 bytes was the start of the diminishing returns on compression and had the least amount of size edge cases:

Sum of CHD sizes (GB) vs hunk size

CHD size (MB) vs hunk size

@InquisitiveCoder
Copy link

Dug out my New 3DS XL and tried running FFIX disc 1 compressed with Zstd + FLAC at various hunk sizes, but none would play the intro FMV 100% free of stutter and audio pops. (Don't do a hunk size of 2448 though, it really started stuttering since every single sector read had to go to the SD card.)

Then again, even with an uncompressed CHD I still get a small amount of stutter. Maybe the 3DS's SD card slot just isn't fast enough.

@InquisitiveCoder
Copy link

Grabbed some footage of the attract mode FMV with my phone. I can't tell a difference between a CHD compressed with zstd + FLAC (9792 byte hunks) and an uncompressed CHD (hunk size reduced slightly to 7344 so it reads a similar amount of bytes.)

https://youtu.be/GahwXmDPt2E

https://youtu.be/5wnPzok5ifQ

Unfortunately when I tried to load a CHD made with the standard chdman settings (lzma + zlib + FLAC, 19584 byte hunks), the core crashed, so I don't have the worst case scenario for the comparison. I've attached the dump file.

PXL_20240912_122016392

Tested on a New 3DS XL and core version r24l 237887e.

@InquisitiveCoder
Copy link

For the sake of completeness, I tried loading it up as a bin/cue; it stuttered as badly as setting the CHD hunk size to 2448. I'm guessing the I/O is completely unbuffered and going to the SD card for every sector read.

https://youtu.be/pDj8-mOgAEs

@rtissera
Copy link

Hard to point CHD reading code then.
Zstd and flac should actually lower I/O needed accesses in trade of little more CPU load even on modest hardware like 3DS one.
LZMA decoding is a totally different subject.

@InquisitiveCoder
Copy link

I agree, whatever performance issue was reported originally probably isn't a problem any more if your CHD uses the right settings. But I do find it troubling that running a CHD made with chdman's defaults crashed the core. That's arguably a worse problem, since the documentation for the various chd-compatible cores don't instruct users to change the hunk size or compression algorithm.

@rtissera
Copy link

Probably worth compiling 3DS without LZMA support in libchdr to reject such CHD files.

@notaz
Copy link
Collaborator

notaz commented Sep 17, 2024

I/O stalling issues could probably be relieved by reviving threaded cdrom code, although I'm not sure there are many people still using 3DS for PSX...

As for the crash, I need to have at least the exact binary to find the faulty code, otherwise the crash dump isn't useful. The nightly builds are different every time because they are statically linked to RetroArch itself which is changing a lot. Or ideally it would be useful to get the dump file (no screenshot needed) for the following build for which I have the debug symbols:
pcsx_rearmed_libretro_v1.2-39855-gd08b867e7d_237887e8_2.zip

Another useful thing would be a .chd that is known to crash the 3DS which could be investigated for out-of-bounds reads or whatever. Perhaps some homebrew could be repacked like from here.

@JORGETECH
Copy link

I/O stalling issues could probably be relieved by reviving threaded cdrom code, although I'm not sure there are many people still using 3DS for PSX...

For what it's worth, I'm one of those people that is still using (New) 3DS for PSX since it's still a very nice and compact handheld that has all the controls needed in order to play PSX games. I assume there are many more out there that don't watch GitHub issues and won't answer here.

Anyways, I was interested in testing the recently merged zstd compression in libchdr, I'm actually using a really old version of RetroArch in the New 3DS and I think part of the reason for that was the performance regression mentioned in the first post. I should try the games I found to be most problematic, PaRappa The Rapper and Vib-Ribbon, with zstd and report back.

It would also be interesting to test that threaded cdrom code but I understand it's low priority right now.

@notaz
Copy link
Collaborator

notaz commented Sep 24, 2024

For a start fixing the crash would be great, for which I need a crash .dmp created on a build I posted above (assuming it works at all).

@InquisitiveCoder
Copy link

Didn't mean to leave you all hanging, I was just very busy last weekend. I can get those crash dumps tonight, but I'm not sure what to do with the retroarch_3ds.elf file. As for the core, I assume I have to install the CIA? Sorry, I've never had to deal with debug builds before.

@notaz
Copy link
Collaborator

notaz commented Sep 26, 2024

Just ignore any files that you don't need. I zipped all the files that came out of the build together to not have to track which files correspond which build.

@InquisitiveCoder
Copy link

I'm hoping the debug core was installed correctly; I have no idea how to verify.

This crash dump is for the PS1 Graphics Demo homebrew from the link you shared.

This crash dump is for Final Fantasy IX (USA) (Disc1) (Rev1). Both CHD files were made with chdman v0.268 using the default settings (equivalent to chdman createcd -c cdzl,cdlz,cdfl -hs 19584).

@notaz
Copy link
Collaborator

notaz commented Sep 27, 2024

The dump matches the binary but it shows the crash happens in ctr_frame() which is part of RetroArch's video driver, so I don't know what to make of it. Maybe it's possible to enable RetroArch's logging to see if it's not running out of memory or something.

@InquisitiveCoder
Copy link

That's strange. All right then, I turned up both the frontend and core log verbosity to Debug, enabled logging to file with timestamps and tried loading up the graphics demo CHD again. Not sure how much this'll help, but here's the log file and crash dump.

retroarch__2024_09_27__11_39_06.log

crash_dump_00000002.dmp

@notaz
Copy link
Collaborator

notaz commented Sep 27, 2024

I've stared at the code some more and it looks like it's indeed is running out of memory. When the heap memory usage grows (for larger CHD hunks I guess) RetroArch has special code to reduce other "linear heap" that seems to be used for texture memory. With that the video driver fails to allocate texture memory, but error handling is missing and it caries on just to crash later in ctr_frame().

Unclear what can be done about this though.

@InquisitiveCoder
Copy link

I'm skeptical it's an issue with the hunk size. It didn't crash when the CHDs used zlib, zstd, or no compression. When I was messing around with different hunk sizes to see if any of them could run with 0 stutters, I even tested 1 MB hunks (the maximum chdman supports) and it loaded up fine. The bug seems to be related to using lzma. At any rate, sounds like it's definitely a frontend bug. Sorry for sending you on a wild goose chase!

@notaz
Copy link
Collaborator

notaz commented Sep 29, 2024

Well even of the frontent handled OOM gracefully the emulator would still not work with an error message at best.

Could you try this build? Please post a log regardless if it crashes or not, it should print some mem usage info I'm curious about.
pcsx_rearmed_libretro_v1.2-39855-gd08b867e7d_237887e8_patch3.zip

@InquisitiveCoder
Copy link

Here's the logs and crash dump after installing CIA file for that build. Still trying to run the PS1 Graphics Demo CHD, same as last time.

crash_dump_00000003.dmp

retroarch__2024_09_29__19_37_42.log

retroarch__2024_09_29__19_38_44.log

@notaz
Copy link
Collaborator

notaz commented Sep 30, 2024

Thanks, here is another one. This might require quite a few tries, it's part normal debugging process sadly...

pcsx_rearmed_libretro_v1.2-39855-gd08b867e7d_237887e8_patch4.zip

@InquisitiveCoder
Copy link

Hey, that one booted up the demo!

retroarch__2024_09_29__20_12_44.log

retroarch__2024_09_29__20_12_59.log

@notaz
Copy link
Collaborator

notaz commented Oct 5, 2024

As guessed it shows all 128MB used (which is an apparent limit?) and it starting to eat into some reserved linear memory. It's kind of weird as standalone version of pcsxr on r-pi4 shows ~68MB usage, and that's on 64bit.

Here is a test build with threaded cdrom code:
v1.2-39899-gfbf2c70e0d_r24l-73-gb3a51433_1.zip
There should be a new "System->CD read-ahead" option which allows you to tune how many sectors to read. Naturally it'll need more memory for the new thread and cache, so it'd be interesting to see the logs about memory (assuming it doesn't just crash right away again).

@InquisitiveCoder
Copy link

InquisitiveCoder commented Oct 5, 2024

Here's the RetroArch logs from the PS1 Graphics Demo CHD that used to crash the core. Still boots fine on this build.

retroarch__2024_10_04__20_18_21.log
retroarch__2024_10_04__20_18_40.log
retroarch__2024_10_04__20_19_44.log

The threaded CD access definitely makes a difference. FFIX's attract mode FMV seems to run full speed now. (I didn't do a side by side with real hardware or anything, but the stutters were quite noticeable before, and I didn't notice any this time.) Here's a recording of it; I left the CD read-ahead at its default value of 12.
https://youtu.be/MJgIZTXJu5w

When I first hacked my 3DS I remember Brave Fencer Musashi stuttering occasionally when voiceovers were playing, and those seem to be gone too. More importantly, Battle Arena Toshinden couldn't run at full speed at all (presumably because it constantly streams music tracks.) With the new build, I still get some slowdowns, but it holds 60 FPS most of the time.

I know the 3DS isn't the best system to be running RetroArch, but it's still the best solution for Virtual Boy, DS and 3DS games in my opinion, so I still use it from time to time. It's really cool to see PSX games running this well on it. Thanks for all your hard work getting this working.

@notaz
Copy link
Collaborator

notaz commented Oct 6, 2024

The OP report says there were problems after fast forwarding, does my implementation handle it ok? What about streaming things on lzma compressed CHDs and raw cue/bin?

In either case thanks for all the testing.

@InquisitiveCoder
Copy link

Good news! I was about to test a default settings FFIX CHD (lzma + zlib + flac, 19584 byte hunk size) but after looking at my RetroArch history, I noticed I already accidentally used it in that last video I recorded. So yeah, it runs fine; the threaded I/O is enough to make that particular game run fine even without optimizing the CHD settings. (I definitely used a zstd CHD when I tested Battle Arena Toshinden though, which just goes to show that some games can still push the SD card to the limit.)

Tried doing some Fast Forward with that last build and didn't notice any issues, but for whatever it's worth I didn't have any FF issues when I was testing the CHD stutters in the previous builds either.

@notaz
Copy link
Collaborator

notaz commented Oct 6, 2024

What about single-sector formats (cue/bin or CHD with hunk size 2448)? Your previous reports showed those were the worst. From those reports it would seem it takes a long time to start a SD card transfer on the 3DS so small transfers are problematic. Or maybe the mismatch of SD card and CD sector sizes causes some SD card controller or driver inefficiency or something.

@InquisitiveCoder
Copy link

Just tested out the FFIX attract mode with single sector formats and it worked pretty well. I did notice a stutter or two, but you'd have to be paying attention to notice. This used to stutter so much the intro felt like it was running at half speed.

I recorded a video of the CHD with no compression and 2448 byte hunk size. (Sorry that it came out sideways, not sure why my phone did that.) I did test bin/cue as well but the result was more or less the same, wasn't worth capturing a second video. https://youtube.com/shorts/I3MP3AaPAiY?feature=share

So yeah, the threaded CD build makes the two most common and slowest use cases (bin/cue and lzma-compressed 8 sector CHDs) run smoother than the Zstd + 4 sector hunk CHD and the uncompressed 3 sector hunk CHD I tested out in this comment. Seems like a slam dunk to me.

@InquisitiveCoder
Copy link

Just wanted to say I grabbed the latest build through RetroArch (as opposed to test build you attached here) and didn't notice any differences.

@notaz
Copy link
Collaborator

notaz commented Oct 25, 2024

@InquisitiveCoder could you try another build? This one has multithreaded dynarec, it does seem to help on the switch but no idea if it's any good for 3ds, if it works there at all. It can be disabled in "System->DynaRec threading" where "auto" is the same as "on" on multicore systems (the 3ds is one of them). The desired effect is to reduce stutters when things happen for the first time and MIPS code is recompiled to ARM.

v1.2-42455-g75c647d3ca-g907c42ea.zip

As for this issue I'm closing it as I think I did what I could for CD image reading slowdowns.

@notaz notaz closed this as completed Oct 25, 2024
@InquisitiveCoder
Copy link

Since you said it seems to help on Switch, do you know of any games that had issues? Or should I just load up random games and see if there's any difference between threaded dynarec on and off?

@notaz
Copy link
Collaborator

notaz commented Oct 27, 2024

"Brave Fencer Musashi" had stutters at the start of the level, despite a fast CPU the Switch has. 3DS ARM11 design could be considered ancient in comparison so I'd guess most games would be affected. Not to mention that out of 4 cores 2 seem to be reserved for the system on 3DS...

@InquisitiveCoder
Copy link

Sorry, it was a very busy week for me so it took me a while to test this thoroughly. I started a new game in Brave Fencer Musashi and played up to the Steam Knight. I used a zstd+flac CHD with 4 sector hunk size as usual. Most of the emulator settings were left at their defaults except I turned on synchronous threaded rendering; I wanted to lighten the load on the main thread and also see if having threaded rendering, CD access and dynarec simultaneously would cause any contention issues with the 3DS's 2 user space cores.

The good news is that enabling threaded dynarec doesn't seem to cause any problems. However, it also didn't seem to change any of the slowdowns. To be clear, the game runs at a steady 30 FPS most of the time, with the only notable slowdown happening whenever the transparent artwork for an Assimilate ability appears (probably related to issue #607 ), and when the wall explodes at the start of the cutscene with Rootrick.

In short I don't see any harm in providing this option but I also wasn't able to find a scenario where it helps. If you have other games or more specific test setups you'd like me to try, I'd be happy to test further.

@notaz
Copy link
Collaborator

notaz commented Nov 4, 2024

Thanks again for the testing. I got hold of an old3ds, even if it's possible to run code on the syscore, running the recompiler there makes things a lot worse for the old3ds. I'll just default it to off for the 3ds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance regression Performance becoming worse over time
Projects
None yet
Development

No branches or pull requests

6 participants