On the previous entries of this series we already commented about:
In this third part of the series (as promised), we will show how to implement the timers block by using, not registers, but memory blocks.
Memory blocks are an often unused capability of modern FPGAs and can in many cases (as in this one) be a nice alternative to save on scarce resources like registers and LUTs. As we commented in the previous entry, implementing a block of 32 x 16 bit timers took about 7% of the LUTs of a Cyclone, and we wanted to see if we can reduce the quantity of resources taken.
The idea behind this implementation that uses memory blocks is to make a sequential exploration of each timer in the block, as follows:
- Retrieve the current value of timer ‘n’ from memory
- Update its value [ timer(n) <= timer(n) -1 ]
- Write back to memory
This basic three steps have to be accompanied with additional logic two:
- Check if the last timer was accessed, and restart the access from the first timer
- Check if the timer is enabled
- Check if the timer reached zero and activate its done output
Additionally, the logic must take into account timing constraints. When accessing the memory for read, it takes one clock cycle from the time the address was changed until the data is available at the memory output.
To avoid blocking situations, the design uses dual port RAMS. One port is available always for the host to initialize any timer when he ‘wants’ to. The other port is used by the FPGA logic to read and write back the updated timing results for each timer.
All these operations are commanded by a state machine. The state machine has four cycles:
The waveforms show the access to the dual-port memory from port b, which is under control of the FPGA logic. Port a is under control of the host.
As engineers, we know that you almost cannot win on one table without losing on another. If we are winning less usage of LEs, chances are that we are losing something.
What we are losing is paralelism. Our original timers block, works in parallel (each timer register is decremented in parallel and with no connection nor dependancy to the operations being performed on other fellow timer registers).
With this new memory-based implementation, we have serialized the operations. We have to read from memory, decrement, and write-back. But not only that, since the memory is accessed sequentially, we must repeat this read-modify-write operation for all the timers.
This fact imposes a limit to the time base of the counters. Let’s say that our system clock is 50MHz. If we wanted, using the parallel timers, each one could time up to a smallest resolution of 20ns (the inverse of 50MHz).
That is not the case for the memory-based timer block. The lowest possible timing, or resolution, of our timers, is now limited by two factors:
- The quantity of cycles it take to read-modify-write one timer (4 clock cycles)
- The quantity of timers we want to implement (Notice that the width of the timers has no impact, only their quantity).
On our case we wanted to implement 32 timers. The maximum achievable resolution for the memory-based timer block is then:
20ns x 4 x 32 = 2,560 ns ~ 2.6us
For most applications this won’t be a problem. Most application involving a CPU won’t be able to react to changes on that scale, anyway. Many applications will get along happily with timers of 1ms resolution or 0.1ms = 100us, way above the limit we have calculated.
On the next entry of this series we will compare the quantity of FPGA resources taken by each solution, and we will also comment the code for this new, memory-based solution. See you soon!