Finally, it’s time for an update!
Since the last update I was mainly busy with the FPGA and firmware, so here are a couple of points:
Menu
Last time I complained about the text size of the menu. To fix it, I enhanced the sprite module to include a global integer scaling factor. The sprite module will essentially scale all sprites on the screen by that factor and I made sure to include it in a register to access it from the syscon via SPI.
I’m not intending for this to be modifiable; it will be hardcoded. In the video below you can see a scaling factor of 2, which makes each sprite 16x16 pixels -> This would be my preferred text size and I will most likely keep it.
Configuration of the Video Processor
Implementing the sprite module and SPI controller in the FPGA were the testing ground for adding a feature I’ve been working towards a lot in recent months: controlling various aspects of the video processor in real time via the syscon’s configuration menu. My goal was not only to add debug related settings, but also allow changing the input and output resolution, the horizontal phase, motion threshold and more.
Although it could be implemented in the syscon, right now my plan is not to have any configuration profiles – each resolution (512i, 480i and 480p) has its own independent set of configuration registers. The video processor then applies the correct settings when the respective resolution is detected. There are 3 additional registers:
- Resolution: read only; reports the currently detected video resolution
- Video Enable: write only; puts the whole video pipeline into reset when enabled
- Lock Config: write only; when enabled the video processor will store, but not apply any new configs transferred via SPI. Safety feature to prevent possible race conditions when transmitting multi-byte settings.
At startup the syscon must first load all settings and then write to the video enable register to start the video processor. Changing the settings is also handled via the syscon and can be done at any time. The procedure is as follows:
- Read the current input resolution from the video processor to apply the correct setting
- Lock the configuration registers.
- Update the desired settings for the respective resolution
- Unlock the configuration registers to apply the new settings
Luckily, I was defining the architecture with this in mind from the beginning, so all I needed to do for the most part was updating the video input module and adding a module containing all configuration registers accessible via SPI. The video input now has an additional state at the start of each frame to load the respective settings from the configuration registers instead of hardcoded constants. Those are pushed through the video input CDC buffer and distributed to all modules needing them. All modules sample the video config at the start of each frame as first step in their state machine, which made the implementation much easier. This makes the design quite robust in regards to modifying these settings in real time.
LCD Backlight PWM
This module was also finally implemented and is now accessible by the syscon. Its functionality is actually quite similar to the PWM peripherals in the RP2040 - it contains 2 registers: max_count and trigger_count. It's just a counter fed by the 150MHz global clock and it counts to max_count before it resets both the counter and PWM signal. When it reaches the value of trigger_count it toggles the PWM output signal. That's it. max_count is hardcoded in the syscon to achieve about 46kHz PWM frequency and trigger_count is used to control the brightness.
Bilinear Scaler
This is the last “big” module needed to fulfill my initial requirements. Even though a bilinear scaler is quite a simple scaling algorithm on its own, I was expecting this one to be challenging – and it was challenging!
All modules so far were straight forward to implement and test, because they were quite independent of each other and had a defined interface. The scaler on the other hand is now the heart of the video output data path and needs to work in coordination with the deinterlacer, line FIFO and video output module to produce a frame. You can imagine how many possibilities for errors this introduces, if one gear is blocked for whatever reason, everything stops working. An additional difficulty in my architecture was that the scaler doesn’t have direct read access to the framebuffer – everything needs to be done via the deinterlacer & line FIFO and the scaler needs to read 4 pixels from 4 different addresses in parallel.
Horizontally this was made possible by implementing single port write / dual port read linebuffers, so 2 pixels could be read from different addresses within a line.
Vertically it was more challenging, because the scaler produces an absolute Y read address – but the line FIFO can only provide the next 5 lines to read from, relative from the current line in the output. I solved this by only calculating the relative Y increment from line to line, instead of an absolute address -> the tail pointer of the FIFO will be updated by that value. This is generally OK, as the scaler will either not increment or increment by 1 in case of upscaling. But when downscaling, it can happen that the line address increments by 2, so the FIFO will need to handle throwing out a whole line – this happens all the time when scaling PAL down from 512 to 480 lines. In this use case it works OK, but the implementation limits the amount of downscaling the scaler can perform before the everything breaks. Not that you would actually want to do downscaling, apart from 512->480.
Another difficulty was the calculation pipeline. Floating point numbers are not easily feasible in FPGAs, so I had to rely on fixed point numbers. Sadly, I could not find a widely supported standard library for VHDL, so my two options were to either use an open-source library or to do my own thing. Consistent with my design approach so far, of course I implemented my own freestyle fixed point calculations in the whole pipeline. I ended up going for 18-bit integers to fit the 18x18 multipliers present in the Trion T20, with 15 fractional bits.
Now I also finally feel confirmed for saving all hardware multipliers for the scaler – the T20 only has 36 in total and 19 are now used by the bilinear scaling pipeline
The whole implementation is already working for the most part – there are still some combinations of settings that can cause artifacts, but I’m debugging those right now. It turns out that the possibility to set any combination of configs in real time will uncover edge cases you never even thought about!
Current top level block diagram:
Current T20 resource utilization:
Testing and debugging:
Finding all settings with issues is actually quite easy now, as all settings are configurable from the menu:
->cycle through settings to find problematic combinations, write down the settings to debug, enter the affected settings in the test bench and debug from there!
Before I had to generate a new bitstream each time I wanted to test a setting on hardware, which is absolute misery. Probably the smartest approach would be to ditch the VHDL test benches and go for something like cocotb, but I’m not quite there yet. That would enable me to test every setting in simulation, without writing thousands of lines of VHDL test benches.
After testing the fixes in simulation it’s really easy to deploy a new bitstream to the hardware. I generate the bitstream, recompile the firmware, connect the portable and drag the new image onto it. I would consider this one of the smartest decisions I made when defining the hardware architecture – being able to reconfigure the FPGA via firmware. Otherwise, I would need to take apart the portable for every little fix and upload the new bitstream using the debugger. That was only necessary a handful of times for live-debugging one or the other nasty bug in the FPGA.
In the video below you get a short overview of the current state of the menu, as well as the newly added video configurations.
(sorry for the bad video quality again, it's really difficult to film displays!)
Overall thoughts:
Enabling scaling takes a hit on image sharpness, that’s clearly evident, even on such a tiny display. I guess one factor is the fact that the resolution of 480x800 is not much higher than the PS2’s output resolution, the impact would probably be less noticeable on higher output resolutions. It's also possible that there is still a bug in the pipeline...
If image sharpness is more important than scaling, the scaler can also be turned off by setting the output resolution to the input resolution. In that case the data is still pushed through the bilinear scaler pipeline, but the scaling factors are 1.
For PAL there is probably no way around scaling, so image quality will always be worse. Previously I was just discarding every 16th line which led to an overall sharper image, but the missing lines were very apparent in the output image. I guess in the end it’s a compromise. Maybe I will consider re-enabling the previous scaling approach for PAL to have both options.
Next Hardware Revision:
With the major video processor modules out of the way I’m pretty confident that the current mainboard architecture is sufficient to support all features I want to have. That’s why I started to work on the next and hopefully final mainboard revision in parallel. As mentioned already, the architecture will probably not change much, but I do have some important points to address:
- Consolidate all power planes into one power layer instead of two. This is to reduce potential EMI issues when having adjacent power planes of different voltage. It was fine for the first revision, as the spacing was much larger, but with the JLC stackup it needs to be addressed. Rev. 0.2 featured a compromise to speed up development, but now I want to tackle this.
- Further BOM consolidation. Rev. 0.2 improved this a lot already, but I still want to get rid of a lot of components. Especially resistor and capacitor values where only 1 or 2 components are needed for the whole board.
- Bigger flash for the syscon. The 16Mbit flash is quite full already, as it also contains the FPGA bitstream. I’d like to reuse the 128Mbit flash of the SD2PSX in the future.
- Simplifying routing. With one of the power planes gone, I have room to move the low-speed signals to an inner layer. Depending on the space maybe even high-speed stuff, but there I’m worried about power plane splits.
- Adding more test points. During testing I found some signals where I would like to have a test point. I also know the function of a lot more EE, GS, DSP and MC pins now, some might be interesting to have on test points. (PPC UART for example)
- Adding buffers on the HP_SENSE and BOOTSEL signals. The long traces for HP_SENSE (high impedance source) and BOOTSEL (possibility to introduce SI issues) make me feel uncomfortable, even though they work just fine in rev. 0.2.
- I’m looking into replacing the mid mount USB A socket of the stock PS2 with something you can actually buy today, to reduce the number of components to be salvaged from a PS2.
- The memory card flex connector needs bulk decoupling on VCC
- I’m considering the option of making the USB C socket replaceable. That would add another PCB to the BOM, so I’m not sure yet.
- I’m looking into better connectors for connecting the batteries. Soldering the wires to the mainboard is also an open option (reduce BOM count).
- Some footprint adjustments for easier soldering
An improved SD SD2PSX was designed, including some of the points I mentioned previously. That will be ordered soonish.
Next point would be working on the Rev. 0.2 controller flex PCBs. The mechanical design was done already last year, but I didn’t start the layout yet.
Regarding the mechanical design, one of the upcoming tasks would be to adapt the heat spreader to the new mainboard design. I will shorten the depth of holes where it’s necessary to tap M1.6 threads to make tapping easier and to reduce the chances of breaking taps again. Another change would be related to thermals – I would like to increase surface area of the fins and increase the thickness of the material touching the EE, to make cooling a bit more efficient. Currently the EE settles at about 42°C @80% fan speed of the switch lite fan. Note that this is not comparable to a stock mainboard, as I decided to move the temperature sensor directly below the EE for more accurate temperature measurements. I think the temperature is OK, however I would like to reduce the fan speed a little (if possible). Luckily the fan curve is adjustable in the menu now, which makes testing easier!
Lowest priority is a new revision of the memory card flex, the only envisioned change there would be a footprint change of the VCC decoupling capacitor from 0402 to 0603.
Boot Rom Flex & PS2 Setup
Maybe a bit off-topic, but I managed to install a Noctua fan and the latest 64Mbit boot rom flex in my childhood 39k PS2, to finally start setting up my PS2 gaming setup for playing at home. The idea is to just remove the SD2PSX from the portable and to pop it into my fat PS2 to continue playing, which works really well!
For the boot rom I chose to do some light modifications: I added PS2BBL to the empty space at the end of the binary and added it to ROMDIR. Then I added a modified EELOAD module to launch into PS2BBL by default, eliminating the need for FMCB and making it compatible to the boot memory card I set up in the SD2PSX for my portable. The new boot rom flex keeps the DVD player fully functional, in case I ever watch DVDs on the thing (sometimes I do).
It’s coupled now to a Retrotink 5X, which I often use as a reference when debugging the video processor (not all image artifacts come from bugs, some show up in the RT5X too).