See the sources and testbench can be found in the fixed point hVHDL math library.
Sine of an angle returns a number which represents the factor that scales vector to y-axis. Consequently cosine can be used to return the x axis component of a vector. This property of switching between a vector and its components is among the most fundamental operations that are used when vectors are manipulated.
Trigonometric functions are almost always used when position, direction or rotation is considered. Many problems related to orientation or position are solved using angles and many problems can be simplified through the use of directions and orientation.
Trigonometry is also used for mapping between coordinates and complex numbers. This property also results in trigonometric functions appearing as solutions to differential equations as well as popping up in all sorts of places where natural logarithms and exponential functions are found.
So after multiplication and division the next most common arithmetic operations are the two trigonometric functions. This time we take a step beyond the elementary as we transcend our ability to calculate with vectors on FPGA using the two fundamental trigonometric functions, sine and cosine.
Series expansion for Sine
Sine and cosine are straightforward to calculate using polynomials obtained from series expansion. The nature of the approximation is that more accuracy is needed the higher is the order of the required polynomial. The series expansion for sine is
From the series expansion we can see that the x^n is divided by nth factorial n!. The factorial gets extremely large extremely fast thus ever higher orders of the polynomial have less and less of an effect to the result. This is even more prominent as the order of factors increase by two in the sine expansion and only odd orders are present. Due to this the next term in succession will have significantly less of an effect on the result than the previous one. Put in another way, the approximation gets very accurate with only a few terms. Thus this humble polynomial approximation is our target for implementing the sine.
Evaluating polynomials
We start developing the algorithm by optimizing the calculation by reordering the terms. In order to efficiently calculate the successively increasing exponents we use Horners method as shown in (2)
With Horners scheme, we evaluate the algorithm from inside out with the innermost term calculated first. When evaluated this way we avoid calculating terms x^n and instead we successively multiply by x^2. Precomputing this and substituting z = x^2 gives the following
This algorithm takes in angle { x\in[0, 2\pi] } and returns the sine of that angle. The input of the sine function is any real number and the output repeats with 2\pi. In order to simplify the input values we next normalize the the angles.
Normalizing input angle
Sine is calculated for the range of {x\in[0, 2\pi] }, which we can represent as { x\in[0, 1]*2\pi }. By moving the 2\pi from the input angle to the sine polynomial factors we get
Substituting the constants with \frac{2\pi}{n!} = s_n the algorithm can be written as
The great benefit of moving the 2\pi into the polynomial terms is that now the sine calculation is easy as the cycle repeats for every integer multiple instead with period of 2\pi . We no longer need to consider modulo operations for determining the sine value for any arbitrary number, just need to remove the integer part. In fixed point implementation this gets even better since we can get the correct sine result for any arbitrary angle by just letting the angle to overflow.
Sincos calculation
This calculation still has the same issue as with the division algorithm. Since we are evaluating successive multiplications of higher order exponents there is no way to avoid waiting for the previous multiplication to be ready before the next one is requested. However, we can minimize the need for calculating the terms thus minimize the number of sequential multiplications by limiting the range at which the polynomial needs to be accurate. Interestingly this is accomplished by evaluating cosine in the pipeline stages of the sine multiplication.
Cosine series
The series expansion to cosine is almost equal to sine with the difference being that only the even orders of the exponent are used instead of odd orders. The series expansion of cosine is
Which can be developed following similar steps to sine to
Cosine produces the exact same waveform that sine does, but in just 90 degree phase shift, we can utilize this to limit the input range for the polynomial from { x\in[0, 1] } to { x\in[0, \frac{1}{4} ] }. Calculating both allows us to have accurate slope from the sine calculation and accurate top part of the waveform from the cosine as seen Figure 1. The resulting sine and cosine waves are then pieced together from the tops and slopes.
By calculating both we are decreasing the number of non pipelineable terms that need to be evaluated when either function is needed. Calculating cosine is doubly useful since in many cases both algorithms are needed.
Sincos algorithm
The sincos calculation can be divided into 3 parts. First is the input angle reduction, followed by the polynomial evaluation and lastly the outputs are updated based on the input angles. With the introduction of cosine in the sine calculation we now have our implementation.
1 : quarter_angle = reduce_angle(x)
2 : z = quarter_angle * quarter_angle
3.1 : sin = (s5) * z
3.2 : sin = (s3 - sin) * z
3.3 : sin = (s1 + sin) * x
3.4 : cos = (c4) * z;
3.5 : cos = (c2 - cos) * z
3.6 : cos = (c1 - cos)
4 : sine_out = get_sine(x, sin, cos)
4 : cosine_out = get_cosine(x, sin, cos)
The constants s_n and c_n can be calculated directly from the the Taylor polynomials, but these are not optimal as the error is not balanced.
Chebyshev approximation
To balance the error as a final touch the Chebyshev polynomials are used for optimizing the sine and cosine terms instead of Taylor directly. Chebyshev function expansion has the property of equalizing the error in the range of evaluation. This means that they limit the maximum error by balancing it over the range of input. The algorithm remains the same, but just the constants are fiddled with. With Chebyshev factors, a 3rd order approximation is sufficient for 16 bits of accuracy for both sine and cosine. A great article on chebyshev polynomials has been published on embedded related which also has a python script which can be used for obtaining the factors.
With the chebyshev polynomials the sincos algorithm utilizes all of the common tricks, we limit the range in which we operate, we utilize the pipeline stages of the multiplication and finally we minimize the calculation by optimizing the constants of the multiplications.
Next the sincos object is designed in VHDL using high abstraction level design.
VHDL implementation of Sincos object
The sincos object is built from a sincos_record. The record shown below has members for the quarter wave sine and cosine calculations, the process counter needed for the sequential calculations, angle and angle squared that are needed for the sincos calculation. The constant init_sincos is used for initializing the signal of sincos_record datatype.
Using this record abstraction, the sincos object can be created in the application code with just 2 lines of code; the signal instantiation of sincos_record type and then the create_sincos procedure call at the start of the process. The int18 data types are defined in the multiplier package and are a subtype of an integer.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
library math_library;
use math_library.multiplier_pkg.all;
package sincos_pkg is
------------------------------------------------------------------------
type sincos_record is record
sincos_process_counter : natural range 0 to 15;
angle_rad16 : unsigned(15 downto 0);
angle_squared : int18;
sin16 : int18;
cos16 : int18;
sin : int18;
cos : int18;
sincos_has_finished : boolean;
end record;
constant init_sincos : sincos_record := (15, (others => '0'), 0, 0, 0, 0, 0, false);
------------------------------------------------------------------------
Sincos object interface
The sincos object interface is also described in the sincos_pkg. All of the functions and procedures are used with the signal of sincos_record type which hides the internals of the record from the application code that uses the sincos module. The interface describes how the module is used and abstracts the internal workings.
-- sincos package continues
------------------------------------------------------------------------
procedure create_sincos (
signal hw_multiplier : inout multiplier_record;
signal sincos_object : inout sincos_record);
------------------------------------------------------------------------
procedure request_sincos (
signal sincos_object : inout sincos_record;
angle_rad16 : in int18);
procedure request_sincos (
signal sincos_object : inout sincos_record;
angle_rad16 : in unsigned);
------------------------------------------------------------------------
function sincos_is_ready ( sincos_object : sincos_record)
return boolean;
------------------------------------------------------------------------
function get_sine ( sincos_object : sincos_record)
return int18;
------------------------------------------------------------------------
function get_cosine ( sincos_object : sincos_record)
return int18;
------------------------------------------------------------------------
end package sincos_pkg;
The sincos implementation logic is created from a signal of sincos_record type with the create_sincos procedure. This procedure has the implementation details of the sincos algorithm inside it, thus it allows the sincos to be made to an object from which the sincos calculation can be requested. The hardware multiplier is given to the create_sincos procedure as argument as this allows us to share the multiplier with other functions in the same process in which the sincos in used.
The request_sincos procedure is used for starting the sincos calculation. Sincos is requested with the input angle and the sincos signal object. The procedure sets the sincos_process counter to zero, which initializes the calculation sequence and sets the input angle to the sincos objects angle member.
------------------------------------------------------------------------
procedure request_sincos
(
signal sincos_object : inout sincos_record;
angle_rad16 : in unsigned
) is
begin
if sincos_object.sincos_process_counter >= 8 then
sincos_object.angle_rad16 <= angle_rad16;
sincos_object.sincos_process_counter <= 0;
end if;
end request_sincos;
------------------------------------------------------------------------
The completion of sincos calculation is detectable by checking if sincos_is_ready(sincos) which returns a boolean ‘true’ at the instant when the calculation complets. The final results are fetched from the sincos object by the get_sine and get_cosine functions.
------------------------------------------------------------------------
function sincos_is_ready
(
sincos_object : sincos_record
)
return boolean
is
begin
return sincos_object.sincos_has_finished;
end sincos_is_ready;
------------------------------------------------------------------------
function get_sine
(
sincos_object : sincos_record
)
return int18
is
begin
return sincos_object.sin;
end get_sine;
------------------------------------------------------------------------
function get_cosine
(
sincos_object : sincos_record
)
return int18
is
begin
return sincos_object.cos;
end get_cosine;
------------------------------------------------------------------------
Sincos algorithm implementation
step 1 : Angle reduction
As the 0 to 1 range can be used for the input of the sine, the angle calculation very simple. For radix16 the full 2\pi angle is just all of the values in 16 bit range [0 to 65535]. The angle reduction algorithm takes in the angle and returns the lowest 14 bits, which represents one the quarter of the 16 bit angle. The bits are not shifted, but instead used as is and the algorithm is left to overflow. As seen in Figure 1 the calculation runs through 4 times in one sine cycle.
Using the full range of the numbers makes all angle calculations simple since all sums and multiplications of the angle can simply left to overflow without any glitches in the waveforms. This makes calculating harmonics and phase shifts extremely simple as there is no need to take the number range into account. Just multiply and sum whatever and the result even with overflows will produce the correct angle and waveform.
package body sincos_pkg is
------------------------------------------------------------------------
function angle_reduction
(
angle_in_rad16 : int18
)
return int18
is
variable sign16_angle : signed(17 downto 0);
begin
sign16_angle := to_signed(angle_in_rad16,18);
return to_integer((sign16_angle(13 downto 0)));
end angle_reduction;
------------------------------------------------------------------------
step 2 : polynomial evaluation
The sincos calculation is embedded in the create_sincos procedure. The procedure has the definitions for the sine and cosine polynomial factors as well as aliases for the members of the sincos object record type.
The state machine of the sincos calculation is written in two case structures. The calculation alternates between sine and cosine terms. We only have the sine calculation waiting for the multiplication to be ready thus every time a multiplication is ready, two multiplications are pipelined.
The calculation is made invariant to pipeline delays by using the multiplier_is_ready function that is provided by the multiplier module. If our multiplier had only 1 pipeline delay, then the whole calculation would run through in 9 consecutive clock cycles and if more than pipeline stage is is required for the multiplier the the calculation waits for the multiplier to signal when it is ready. As the cosine multiplication is always ready on the following clock cycle, this works for any number of pipeline cycles.
------------------------------------------------------------------------
procedure create_sincos
(
signal hw_multiplier : inout multiplier_record;
signal sincos_object : inout sincos_record
) is
alias sincos_process_counter is sincos_object.sincos_process_counter ;
alias angle_rad16 is sincos_object.angle_rad16 ;
alias angle_squared is sincos_object.angle_squared ;
alias sin16 is sincos_object.sin16 ;
alias cos16 is sincos_object.cos16 ;
alias sin is sincos_object.sin ;
alias cos is sincos_object.cos ;
alias sincos_has_finished is sincos_object.sincos_has_finished ;
type int18_array is array (integer range <>) of int18;
constant sinegains : int18_array(0 to 2) := (12868 , 21159 , 10180);
constant cosgains : int18_array(0 to 2) := (32768 , 80805 , 64473);
constant one_quarter : integer := 8192 ;
constant three_fourths : integer := 24576 ;
constant five_fourths : integer := 40960 ;
constant seven_fourths : integer := 57344 ;
------------------------------------------------------------------------
begin
sincos_has_finished <= false;
CASE sincos_process_counter is
WHEN 0 =>
multiply(hw_multiplier, angle_reduction(to_integer(angle_rad16)), angle_reduction(to_integer(angle_rad16)));
sincos_process_counter <= sincos_process_counter + 1;
WHEN 1 =>
if multiplier_is_ready(hw_multiplier) then
angle_squared <= get_multiplier_result(hw_multiplier, 15);
multiply(hw_multiplier, sinegains(2), get_multiplier_result(hw_multiplier, 15));
end if;
increment_counter_when_ready(hw_multiplier,sincos_process_counter);
WHEN 3 =>
if multiplier_is_ready(hw_multiplier) then
multiply(hw_multiplier, angle_squared, sinegains(1) - get_multiplier_result(hw_multiplier, 15));
end if;
increment_counter_when_ready(hw_multiplier,sincos_process_counter);
WHEN 5 =>
if multiplier_is_ready(hw_multiplier) then
multiply(hw_multiplier, angle_reduction((to_integer(angle_rad16))), sinegains(0) - get_multiplier_result(hw_multiplier, 15));
end if;
increment_counter_when_ready(hw_multiplier,sincos_process_counter);
WHEN 7 =>
if multiplier_is_ready(hw_multiplier) then
sin16 <= get_multiplier_result(hw_multiplier,12);
end if;
increment_counter_when_ready(hw_multiplier,sincos_process_counter);
WHEN others => -- do nothing
end CASE;
CASE sincos_process_counter is
WHEN 2 =>
multiply(hw_multiplier, angle_squared, cosgains(2));
sincos_process_counter <= sincos_process_counter + 1;
WHEN 4 =>
multiply(hw_multiplier, angle_squared, cosgains(1) - get_multiplier_result(hw_multiplier, 15));
sincos_process_counter <= sincos_process_counter + 1;
WHEN 6 =>
cos16 <= cosgains(0) - get_multiplier_result(hw_multiplier, 14);
sincos_process_counter <= sincos_process_counter + 1;
WHEN 8 =>
sincos_process_counter <= sincos_process_counter + 1;
sincos_has_finished <= true;
if (to_integer(angle_rad16)) < one_quarter then
sin <= sin16;
cos <= cos16;
elsif (to_integer(angle_rad16)) < three_fourths then
sin <= cos16;
cos <= -sin16;
elsif (to_integer(angle_rad16)) < five_fourths then
sin <= -sin16;
cos <= -cos16;
elsif (to_integer(angle_rad16)) < seven_fourths then
sin <= -cos16;
cos <= sin16;
else
sin <= sin16;
cos <= cos16;
end if;
when others => -- hange here and wait for triggering
end CASE;
end create_sincos;
step 3 : Final Sine and Cosine
The last part of the calculation uses symmetry to calculate the sine and cosine waveforms from the results of the polynomial evaluation. This is done at the when the process_counter is at 8. Since only one quarter of the waveform is calculated, the sine and cosine parts of the waveform are alternated between providing the sine and the cosine wave.
With the sincos module, the minimal amount of code required to implement a sincos is as shown in the following snippet.
-- in architecture part
signal sincos : sincos_record := init_sincos;
signal multiplier : multiplier_record := init_multiplier;
begin
process(clock)
begin
if rising_edge(clock) then
create_multiplier(multiplier);
create_sincos(multiplier, sincos);
if sincos_is_needed then
request_sincos(sincos, angle);
end if;
if sincos_is_ready(sincos) then
sine := get_sine(sincos);
end if;
Test with FPGA
The FPGA test code is written in arch_hack_test_system_components.vhd in the ac_inout_psu repository. The sine calculation is triggered at 100kHz and the sines are transmitted to PC with uart. The target platform for the test code is the intel cyclone 10 lp evaluation kit . See tutorial section on top of the page for instructions how to build the uart pc software.
The FPGA test code instantiates four sincos modules and the multipliers for them. A sine wave with harmonics is calculated using one of the modules and the two others are used to calculate sines of differing frequencies and the fourth sincos module simply requests sincos calculation with all of the 16 bit angles. The resulting waveforms are transmitted with uart and captured with PC.
create_multiplier(sincos_multiplier);
create_sincos(sincos_multiplier, sincos);
create_multiplier(sincos_multiplier2);
create_sincos(sincos_multiplier2, sincos2);
create_multiplier(sincos_multiplier3);
create_sincos(sincos_multiplier3, sincos3);
create_multiplier(sincos_multiplier4);
create_sincos(sincos_multiplier4, sincos4);
uart_transmit_counter <= uart_transmit_counter - 1;
if uart_transmit_counter = 0 then
uart_transmit_counter <= counter_at_100khz;
start_ad_conversion(spi_sar_adc_data_in);
end if;
if ad_conversion_is_ready(spi_sar_adc_data_out) then
request_power_supply_calculation(power_supply_simulation, -grid_duty_ratio, output_duty_ratio);
angle_rad16 <= angle_rad16 + 328;
harmonic_process_counter <= 0;
request_sincos(sincos2, to_integer(angle_rad16)*4+angle_rad16);
request_sincos(sincos3, to_integer(angle_rad16)*8-angle_rad16);
sine_counter <= sine_counter + 1;
request_sincos(sincos4, sine_counter);
CASE uart_rx_data is
WHEN 28 => transmit_16_bit_word_with_uart(uart_data_in, sine_w_harmonics);
WHEN 29 => transmit_16_bit_word_with_uart(uart_data_in, get_cosine(sincos)/4+32768 + get_cosine(sincos2)/16 + get_cosine(sincos3)/32);
WHEN 30 => transmit_16_bit_word_with_uart(uart_data_in, saturate_to(get_cosine(sincos4)+32768, 65535));
WHEN others => -- get data from MDIO
Sine harmonics are calculated with a state machine, which state is incremented every time the calculation is ready and the resulting sine is added to the signal. The harmonic sine calculation is requested by setting state machine is set to 0. This result can be seen in Figure 4.
CASE harmonic_process_counter is
WHEN 0 =>
request_sincos(sincos, angle_rad16);
harmonic_process_counter <= harmonic_process_counter + 1;
WHEN 1 =>
if sincos_is_ready(sincos) then
sine_w_harmonics <= get_sine(sincos) / 2 + 32768;
request_sincos(sincos, to_integer(angle_rad16)*2 + angle_rad16);
harmonic_process_counter <= harmonic_process_counter + 1;
end if;
WHEN 2 =>
if sincos_is_ready(sincos) then
sine_w_harmonics <= sine_w_harmonics + get_sine(sincos) / 16;
request_sincos(sincos, to_integer(angle_rad16)*4 + angle_rad16);
harmonic_process_counter <= harmonic_process_counter + 1;
end if;
WHEN 3 =>
if sincos_is_ready(sincos) then
sine_w_harmonics <= sine_w_harmonics + get_sine(sincos) / 64;
request_sincos(sincos, to_integer(angle_rad16)*8 - angle_rad16);
harmonic_process_counter <= harmonic_process_counter + 1;
end if;
WHEN 4 =>
if sincos_is_ready(sincos) then
sine_w_harmonics <= sine_w_harmonics + get_sine(sincos) / 32;
request_sincos(sincos, to_integer(angle_rad16)*8 + angle_rad16);
harmonic_process_counter <= harmonic_process_counter + 1;
end if;
WHEN 5 =>
if sincos_is_ready(sincos) then
sine_w_harmonics <= sine_w_harmonics + get_sine(sincos) / 32;
request_sincos(sincos, to_integer(angle_rad16)*8 + to_integer(angle_rad16)*2 + angle_rad16);
harmonic_process_counter <= harmonic_process_counter + 1;
end if;
WHEN 6 =>
if sincos_is_ready(sincos) then
sine_w_harmonics <= sine_w_harmonics + get_sine(sincos) / 32;
request_sincos(sincos, to_integer(angle_rad16)*16 + angle_rad16);
harmonic_process_counter <= harmonic_process_counter + 1;
end if;
WHEN 7 =>
if sincos_is_ready(sincos) then
sine_w_harmonics <= sine_w_harmonics + get_sine(sincos) / 8;
harmonic_process_counter <= harmonic_process_counter + 1;
end if;
WHEN others =>
end CASE;
The sine calculated with FPGA with every angle from 0 to 65535 is shown in Figure 2 and the error of the calculation is in Figure 3. Reference sine is calculated with matlab and the difference is shown. As can be seen the maximum error between matlab sine and the one calculated here is 5 in the range of +/-32768.
Resource use
The test code includes all of the parts that have been tested previously. The total logic elements consumed by the entire design is less than 8800 logic units. This is approximately 2500 logic units more than what was used in previously during the division test design. Since 4 sincos modules are used, one takes approximately 750 logic units in an application as well as a multiplier.
Next in line
The PI control will be covered in the future and that will be used to develop the grid inverter and output inverter controls. As the sincos modules is ready and tested with hardware it could be used for developing motor model and field oriented control for FPGA. These will be covered in future posts.
Could you provide the coefficients in float?