Math : be fruitful and multiply

get the VHDL sources \ac_inout_psu\source\math_library\multiplier

https://github.com/johonkanen/ac_inout_psu/tree/bandpass_filter

The power supply is an embedded system which uses measurements from different voltages and currents to control the operation of the power supply. Control algorithms as well as measurements use signal processing to filter out unwanted noise from the analog to digital converter results and control outputs. Function approximations as well as filter algorithms use multiplications extensively so a multiplier is a basic building block for all signal processing systems.

When signal processing or math operations are coded to FPGA, cheapest and easiest way to do them is in fixed point arithmetic. This simply means that the calculations are done with integers.

Fixed point number representation

The number format used in VHDL natively is two’s complement fixed point number. This means that for example 13 bit integer all numbers above 2^12-1 (4095), are regarded as negative numbers and the numbers wrap around from positive extreme to negative extreme. Thus 4095+1 equals -4096 in 13 bit number representation. This is the most common number representation used by most digital arithmetic units, including the mantissa of floating point.

To use fixed point numbers when calculating non integer values, a slight trick is needed. Lets first define that we use 2^16, or 65536 to represent number 1.0. Thus number range from 0 to 65536 would equal the range between 0 and 1.0. Then if we would like to represent a number 0.5, it would be half of the number range, or 32768. In similar fashion 0.3 would be 2^16*0.3 or 19661 and 2^16*0.11583567888 is 7591. Numbers larger than 1 behave the same way, so 1.37 * 2^16 is 89783.

What we are doing when converting the numbers is multiplication with with the “decimal point” or radix at 16th bit position. Thus a multiplication of 0.5 * 0.5 in radix16 would be equal to (0.5*2^16) * (0.5 * 2^16). Reordering yields (0.5*0.5)*2^16*2^16. It is almost obvious that the result of the 0.5*0.5 multiplication can be obtained by dividing the fixed point multiplication result by 2^16. Due to binary representation, dividing by 2^16 equals shifting the result 16 bits times to the right. When using fixed point numbers a multiplication thus requires shifting the result.

Multiplier implementation

A multiplier can be coded in VHDL just as with any other language by using operator *, however in order to provide high level access to a multiplication a module is needed. The multiplier module provides access to a multiplier with interface functions that creates the multiplier, starts the multiplication and fetches the multiplication result with correct shift and rounding. The get_multiplier_result has radix as argument and the the shifting and rounding is implemented by the function so we don’t need to think about bit level operations when doing multiplications.

The multiplier has input and output pipeline stages declared in the multiplier record. In cyclone 10 lp this does not matter much, but for example in Spartan 6 and 7, the multiplier would be implemented with a DSP slice and the input and output buffers would significantly impact the maximum achievable performance.

The multiplier is instantiated by the application code by declaring a signal of multiplier_record type and the multiplier is created in a process with create_multiplier procedure call.

				
					package multiplier_pkg is
 
    subtype signed_36_bit is signed(35 downto 0);
    subtype int18 is integer range -2**17 to 2**17-1;
 
    type multiplier_record is record
        signed_data_a        : signed(17 downto 0);
        signed_data_b        : signed(17 downto 0);
        data_a_buffer        : signed(17 downto 0);
        data_b_buffer        : signed(17 downto 0);
        signed_36_bit_result : signed(35 downto 0);
        signed_36_bit_buffer : signed(35 downto 0);
        shift_register       : std_logic_vector(1 downto 0);
        multiplier_is_busy   : boolean;
        multiplier_is_requested_with_1 : std_logic;
    end record;
 
    constant multiplier_init_values : multiplier_record := ( (others => '0'),(others => '0'),(others => '0'), (others => '0'), (others => '0'), (others => '0'), (others => '0'), false, '0');
------------------------------------------------------------------------
    procedure create_multiplier (
        signal multiplier : inout multiplier_record);
------------------------------------------------------------------------

The multiplier is accessed with the following list of interface functions

				
					------------------------------------------------------------------------
    procedure multiply (
        signal multiplier : inout multiplier_record;
        data_a : in int18;
        data_b : in int18);
------------------------------------------------------------------------
    function get_multiplier_result (
        multiplier : multiplier_record;
        radix : natural range 0 to 18) 
    return integer ;
------------------------------------------------------------------------
    function multiplier_is_ready (
        multiplier : multiplier_record)
    return boolean;
------------------------------------------------------------------------
    function multiplier_is_not_busy (
        multiplier : multiplier_record)
    return boolean;
------------------------------------------------------------------------
    procedure sequential_multiply (
        signal multiplier : inout multiplier_record;
        data_a : in int18;
        data_b : in int18);
 
end package multiplier_pkg;

The difference between sequential_multiply and multiply is that the sequential version waits for the pipeline to be empty before pipelining new multiplication.

With the multiplier module, the multiplication 7591*65536 is done by a call to a procedure multiply with arguments (multiplier, 7591, 65536) and when multiplier_is_ready(multiplier) the result is fetched with result <= get_multiplier_result(multiplier, 16).

Multiplier implementation

The multiplier implementation casts the integers that are used for the multiplication to 18 bit signed data type, multiplies them together and then sets the multiplication result to output. The multiplier_is_requested_with_1 signal is simply pushed through a pipeline which has equal number of stages to the number of pipeline stages in the multiplier implementation. Signed datatype instead of integer or it’s subtypes is used because in VHDL integer has maximum word length of 32 bits which is less than the 36 bits required by a 18×18 bit multiplication.

				
					------------------------------------------------------------------------
    procedure create_multiplier
    (
        signal multiplier : inout multiplier_record
    ) is
 
        alias signed_36_bit_result is multiplier.signed_36_bit_result;
        alias shift_register is multiplier.shift_register;
        alias multiplier_is_busy is multiplier.multiplier_is_busy;
        alias multiplier_is_requested_with_1 is multiplier.multiplier_is_requested_with_1;
        alias signed_data_a is multiplier.signed_data_a;
        alias signed_data_b is multiplier.signed_data_b;
    begin
         
        multiplier.data_a_buffer <= signed_data_a;
        multiplier.data_b_buffer <= signed_data_b;
 
        multiplier.signed_36_bit_buffer <= multiplier.data_a_buffer * multiplier.data_b_buffer; 
        signed_36_bit_result <= multiplier.signed_36_bit_buffer;
 
        multiplier_is_requested_with_1 <= '0';
 
        shift_register <= shift_register(shift_register'left-1 downto 0) & multiplier_is_requested_with_1;
 
        multiplier_is_busy <= false;
        if shift_register /= "00" then
            multiplier_is_busy <= true;
        end if;
    end create_multiplier;

It takes 3 clock cycles for a multiplication operation to propagate through the multiplier. Consecutive calls to the multiplier are possible thus the module can process one multiplication per clock cycle with 3 pipeline delays.

Signal processing in fixed point arithmetic

For a multiplier test application a first order filter is designed. The filter is calculated using transposed form shown below

The first order filter algorithm with the transposed filter form is

				
					output = b0*input + filter_memory;
filter_memory = b1*input + a1*output;

The gains are b0, which is the lowest amount of the input signal that is directly fed through, b1 which is roughly the inverse of number of cycles it takes for the filter to respond to unit step and a1 which is simply 1-b1-b0. Also, a high pass filter can be obtained by subtracting the low pass filtered version of a signal from the unfiltered version. Any DSP book will show that there is little bit more to digital filtering than this, but for basic filtering needs the humble 1st order filter is very powerful for it’s simplicity and thus finds most use.

Since we are using just one multiplier we pipe the multiplications. In order to make filtering a sequential process, a process counter is used which is incremented at every clock cycle. The counter runs up to 9 and hangs there until filter is started by setting the counter to zero.

At process_counter at 0, the multiply procedure is called with filter input data and gain b0.

At process_counter 1 the , the multiply procedure is called with filter input data and gain b1. this is also the first pipeline stage for input * b0.

values 2 and 3 are delays for the pipeline stages for the input * b0 operation.

At process_counter 4, the first multiplication is ready. The output value is updated with sum of the multiplier result. Multiplier is also called with filter output and the a1 gain

At process_counter5, filter_input*b1 is ready and filter memory is updated with the result of in preparation to calculate b1*input + a1*output.

6 and 7 are pipeline stages for the a1*output multiplication

At process_counter 8, the filter memory is updated with the sum of the multiplier result and the filter memory value, which was set at cycle 5 with the value b1*filter_input.

				
					CASE process_counter is
        WHEN 0 =>
            multiply(multiplier, filter_input, b0);
            process_counter <= process_counter + 1;
        WHEN 1 =>
            multiply(multiplier, filter_input, b1);
            process_counter <= process_counter + 1;
        WHEN 2 =>
            process_counter <= process_counter + 1;
        WHEN 3 =>
            process_counter <= process_counter + 1;
        WHEN 4 =>
            filter_output <= filter_memory + get_multiplier_result(multiplier, 17);
            multiply(multiplier, filter_output, a1);
            process_counter <= process_counter + 1;
             
        WHEN 5 =>
            filter_memory <= get_multiplier_result(multiplier, 17);
            process_counter <= process_counter + 1;
        WHEN 6 =>
            process_counter <= process_counter + 1;
        WHEN 7 =>
            process_counter <= process_counter + 1;
        when 8 =>
            filter_memory <= filter_memory + get_multiplier_result(multiplier, 17);
            process_counter <= process_counter + 1;
            filter_is_ready <= true;
        WHEN others => -- do nothing
            filter_is_busy <= false;
    end CASE;

This implementation uses process_counter to calculate the clock cycles, which is somewhat dangerous since changing the number of pipeline cycles will make the code malfunction. To avoid this the filter is refactored to check for multiplier being ready instead of just counting the pipeline delay cycles. Using the multiplier_is_ready(multiplier) function allows the pipelined implementation of the filter to be insensitive toward the latency of the multiplication as long as there is at least 1 pipeline delay in the multiplier. By using the interface functions, the first order filter is written as

				
					CASE process_counter is
    WHEN 0 =>
        multiply(multiplier, filter_input, b0);
        process_counter <= process_counter + 1;
 
    WHEN 1 =>
        multiply(multiplier, filter_input, b1);
        process_counter <= process_counter + 1;
 
    WHEN 2 =>
        if multiplier_is_ready(multiplier) then
            filter_output <= filter_memory + get_multiplier_result(multiplier, 17);
            multiply(multiplier, filter_memory + get_multiplier_result(multiplier, 17), a1);
            process_counter <= process_counter + 1;
        end if;
 
    WHEN 3 =>
        filter_memory <= get_multiplier_result(multiplier, 17);
        process_counter <= process_counter + 1;
 
    WHEN 4 => 
 
        if multiplier_is_ready(multiplier) then
            filter_memory <= filter_memory + get_multiplier_result(multiplier, 17);
            process_counter <= process_counter + 1;
            filter_is_ready <= true;
        end if;
 
    WHEN others => -- do nothing
        filter_is_busy <= false;
 
end CASE;

To test the multiplier with the designed filter, a test code is compiled to FPGA and the filter data is transmitted out of the FPGA with uart. The input data to the filter is square wave which is 32768 cycles at value 15000 and then switches to value 55000 for another 32768 calculation cycles.

The data is transmitted out of the uart at the calculation rate, which is set at 100kHz. The gains are chosen more or less arbitrarily to be b0=50 and b1= 300. The filter is calculated using 18 bit integers with radix 17 thus the gains with real valued units are b1 = 300/2^17 = 0.0023, bo = 50/2^17 = 0.00038 and a1 = (2^17-300-50)/2^17 = 130722/2^17 = 0.9973.

Below is shown the rising edge of the filtered signal and as can be seen, the filter has a gain of 1 and with 2000 cycle raster, it can be observed that the filter reaches steady state with roughly 500 cycle risetime from 0 to 63% of the steady value.

Although the given filter code is relatively compact, the application of such a filter can be made significantly simpler by using similar module construction for the first order filter as is used for the multiplier. VHDL allows for nesting function and procedure calls as well as records as parts of records and this has significant implications on the achievable usability of the code. This topic is explored in next time.

Fixed point number representation

Multiplier implementation

Multiplier implementation

Signal processing in fixed point arithmetic

1 thought on “Math : be fruitful and multiply”

Leave a Comment Cancel Reply

About Hardware Descriptions