hardware descriptions

Gigabit ethernet vol 3: processing protocols from Reasonably Accessible Memory

See the most current sources in the Github repository at ac_inout_psu/source/system_control/system_components/ethernet/

Memory sources are in -/ethernet/common/dual_port_ethernet_ram/

With the gigabit physical layer receiver completed, next task is to create minimal set of rules, or protocols for connecting the FPGA with the ethernet to a computer network.

With most computer operating systems, ethernet is implemented as part of networking functionalities that are implemented at the kernel level of the operating system. This makes using bare ethernet frames for communication directly significantly more difficult than implementing minimal networking functions with the FPGA since no application interface exists for bare ethernet frame transfer.

Although it might seem like a quite daunting task to implement networking functions, only a very small amount of protocol processing is actually needed.

Briefly Computer networking

Networking functions are divided into layers of network communication protocols and different layers are responsible for different networking functions. Ethernet protocol is responsible for data routing between two points in network through an ethernet cable. Next layer up, is the internet protocol which is responsible for routing packets over the network.

How and what rules are intended to be used with the transmitted data is communicated in the ethernet frame with dedicated sections called protocol headers and the data followed by the protocol header is the protocol data. Protocol header contains the information on how long the protocol header and its data sections are as well as options and flags which tells how the protocol should be processed. Protocol data section can contain further protocol headers with their corresponding data sections.

A protocol present in data section of another protocol is encapsulated. The only point of contact between the encapsulated and the encapsulating protocol is in the type field of a protocol. The type field tells the module currently processing a protocol header which operation should be triggered to process its data section. For ethernet frame, this information is in the ethertype field and for IP the info is in the IP type field and similar information is transferred in comparable fields in other headers.

Protocol frame processing

The most lightweight protocol is user datagram protocol. UDP communication requires that the communication partners have along with a MAC address, an IP address and a port number. This information is transmitted as part of the UDP and IP headers. Thus the protocol modules needed for UDP communication are ethernet protocol, IP protocol and UDP protocol. There is no need to do anything with frames that are not known and they are simply ignored upon being received.

The core of the minimal protocol stack design is in the use of dual port embedded RAM to which the captured frame is buffered. With the use of ram, which is embedded in the fpga fabric, the memory contents can be read multiple times by multiple modules simply by sharing the control port of the ram.

The frame headers have varying data lengths from single bit flags to MAC addresses spannin multiple bytes. In order to accommodate variable data widths, data is read one byte at a time to shift register from which a desired length of data is processed.

FPGA RAM control

The RAM is implemented with separate read and write ports and The ram write is clocked with the rx ddr clock and read port is clocked with the fpga core clock. The dual port ram also takes care of the clock domain crossing between frame receiver and protocol modules.

Best way to write code is in a way that it’s layout models the intended use of the code. Thus for the dual port ram the read and write port interfaces are placed in their own packages and given their own separate implementations.

write port control

Writing data into ram is very straight forward. With the use of the init and write procedures, the ram write happens with just a call to the ram write with the desired address. The ram can be written in successive clock cycles, thus there is no need to wait for anything and therefore the pipeline delays do not matter for the ram writes. The init_ram_write simply sets the write_enabled_when_1 to ‘0’ and this is set to ‘1’ with the write ram procedure call.

The write port interface has the following description

				
					package ethernet_frame_ram_write_pkg is
------------------------------------------------------------------------
    type ram_write_control_group is record
        address              : std_logic_vector(10 downto 0);
        byte_to_write        : std_logic_vector(7 downto 0);
        write_enabled_when_1 : std_logic;
    end record; 
------------------------------------------------------------------------
     
    procedure init_ram_write (
        signal ram_write_port : out ram_write_control_group);
------------------------------------------------------------------------
    procedure write_data_from_ram (
        signal ram_write_port : out ram_write_control_group;
        offset : natural;
        address : natural;
        byte_to_write : std_logic_vector(7 downto 0));
------------------------------------------------------------------------ 
end package ethernet_frame_ram_write_pkg;
				
			

With the ram write module, the ethernet receiver has even simpler implementation than with the test code shown in previous post.

				
					------------------------------------------------------------------------
    procedure capture_ethernet_frame
    (
        signal ethernet_rx : inout ethernet_receiver;
        ethernet_ddio_out : ethernet_rx_ddio_data_output_group
    ) is
        alias frame_receiver_state         is  ethernet_rx.frame_receiver_state         ;
        alias rx_shift_register            is  ethernet_rx.rx_shift_register            ;
        alias ram_write_control_port       is  ethernet_rx.ram_write_control_port       ;
        alias toggle_data_has_been_written is  ethernet_rx.toggle_data_has_been_written ;
        alias ram_write_counter            is  ethernet_rx.ram_write_counter            ;
        alias fcs_shift_register           is  ethernet_rx.fcs_shift_register           ;
    begin
        CASE frame_receiver_state is
            WHEN wait_for_start_of_frame =>
                if rx_shift_register = ethernet_frame_preamble and get_byte(ethernet_ddio_out) = ethernet_frame_delimiter  then
                    frame_receiver_state <= receive_frame;
                end if;
            WHEN receive_frame =>
                ram_write_counter <= ram_write_counter + 1; 
                write_data_to_ram(ram_write_control_port, ram_write_counter,  get_byte_with_inverted_bit_order(ethernet_ddio_out)); 
                calculate_fcs(ethernet_rx, ethernet_ddio_out); 
        end CASE;
    end capture_ethernet_frame;
				
			

read port control

The ram read has two points of access. The read control port and the ram data output port. The read control port has the ram address and read enable members. Overloaded “+” function is given for the control port to OR together ram address and read is enabled bits from multiple ram read control ports. This allows multiple modules to control the same RAM reads. The bits of the signals inside the ram read control port are forced to ‘0’ when not accessed by a module.

The read control port has following interface

				
					package ethernet_frame_ram_read_pkg is
------------------------------------------------------------------------
        type ram_read_control_group is record
            address : std_logic_vector(10 downto 0);
            read_is_enabled_when_1 : std_logic;
        end record; 
------------------------------------------------------------------------
        type ram_read_output_group is record
            ram_is_ready: boolean;
            byte_address : std_logic_vector(10 downto 0);
            byte_from_ram : std_logic_vector(7 downto 0);
        end record;
------------------------------------------------------------------------ 
    function "+" ( left, right : ram_read_control_group)
        return ram_read_control_group; 
------------------------------------------------------------------------ 
    procedure init_ram_read (
        signal ram_read_control_port : out ram_read_control_group);
------------------------------------------------------------------------
    procedure read_data_from_ram (
        signal ram_read_control_port : out ram_read_control_group;
        offset : natural;
        address : natural);
------------------------------------------------------------------------
    procedure read_data_from_ram (
        signal ram_read_control_port : out ram_read_control_group;
        address : natural);
------------------------------------------------------------------------
    function get_ram_data ( ram_read_control_port_data_out : ram_read_output_group)
        return std_logic_vector;
------------------------------------------------------------------------
    function ram_data_is_ready ( ram_read_control_port_data_out : ram_read_output_group)
        return boolean;
				
			

RAM IP core does not have a ram_is_ready signal or the byte_address of the data present in the ram output, but these can be created in a similar way to the multiplier_is_ready signal presented in a previous blog post by simply looping back the signals through a pipeline with equal length to the read latency. With these two signals the embedded ram can now be used by reading the desired registers in successive read_data_from_ram calls and then shifting them into shift register once ram_data_is_ready. With the use of the looped address and ready bits, the ram read code is now insensitive towards the latency.

The implementation of the RAM module is just a mapping of the read and write ports to the ram IP component instantiation and adding a loopback for the ram is ready and address signals. The ram module also has separated architecture part from its entity declaration as the memory inside the module is vendor specific IP core instantiated in the architecture.

				
					architecture cyclone_10_lp of ethernet_frame_ram is
    alias ram_write_control_port is ethernet_frame_ram_data_in.ram_write_control_port;
    alias ram_read_control_port is ethernet_frame_ram_data_in.ram_read_control_port;
     
    component dual_port_ethernet_ram IS
    PORT
    (
        data      : IN STD_LOGIC_VECTOR (7 DOWNTO 0);
        rdaddress : IN STD_LOGIC_VECTOR (10 DOWNTO 0);
        rdclock   : IN STD_LOGIC ;
        rden      : IN STD_LOGIC                       := '1';
        wraddress : IN STD_LOGIC_VECTOR (10 DOWNTO 0);
        wrclock   : IN STD_LOGIC                       := '1';
        wren      : IN STD_LOGIC                       := '0';
        q         : OUT STD_LOGIC_VECTOR (7 DOWNTO 0)
    );
    end component dual_port_ethernet_ram;
    signal q         : STD_LOGIC_VECTOR (7 DOWNTO 0);
    signal data_is_ready_to_be_read : boolean := false;
    signal data_is_ready_to_be_read_buffer : boolean := false;
    signal address_buffer : std_logic_vector(10*2+1 downto 0);
begin
    ethernet_frame_ram_data_out <= (ram_read_port_data_out =>(data_is_ready => data_is_ready_to_be_read_buffer                                   ,
                                                              byte_address  => address_buffer(address_buffer'left downto address_buffer'left-10) ,
                                                              byte_from_ram => q)
                                  );
    data_is_ready_pipeline : process(ethernet_frame_ram_clocks.read_clock)
         
    begin
        if rising_edge(ethernet_frame_ram_clocks.read_clock) then
            data_is_ready_to_be_read <= false;
            if ram_read_control_port.read_is_enabled_when_1 = '1' then
                data_is_ready_to_be_read <= true;
            end if;
            address_buffer <= address_buffer(10 downto 0) & ram_read_control_port.address;
            data_is_ready_to_be_read_buffer <= data_is_ready_to_be_read;
        end if; --rising_edge
    end process data_is_ready_pipeline; 
    u_dual_port_ethernet_ram : dual_port_ethernet_ram
    port map(
                wrclock   => ethernet_frame_ram_clocks.write_clock,
                data      => ram_write_control_port.byte_to_write,
                wraddress => ram_write_control_port.address,
                wren      => ram_write_control_port.write_enabled_when_1,
                rdclock   => ethernet_frame_ram_clocks.read_clock,
                rdaddress => ram_read_control_port.address,
                rden      => ram_read_control_port.read_is_enabled_when_1,
                q         => q);
end cyclone_10_lp;
				
			

To simplify the ram read further a ram read controller is designed. This memory and shift register controller allows a required number of ram bytes being shifted in with a single procedure call and this is the main method with which the ethernet frame data is processed from RAM.

RAM read controller implementation

RAM read controller has members for for keeping track of the number of registers read from ram, the address which is requested from ram and ram offset and a signal for indicating when ram buffering is complete. The ram controller has methods for creating the controller, loading specific number of ram bytes to shift register and an interface function which tells when the shift register has shifted in the requested number of bytes. With this controller, a user specified number of ram reads can be requested and buffered to shift register for processing with a single procedure call once the ram controller is created.

				
					----------------------------------------------------------------------
    type ram_reader is record
        number_addresses_left_to_read : natural range 0 to 7;
        ram_read_address : natural range 0 to 7;
        ram_buffering_is_complete : boolean;
        ram_offset : natural range 0 to 2**11-1;
    end record;
------------------------------------------------------------------------
    procedure create_ram_read_controller (
        signal ram_read_port : out ram_read_control_group;
        ram_output_port : in ram_read_output_group;
        signal ram_controller : inout ram_reader;
        signal ram_shift_register : inout std_logic_vector);
------------------------------------------------------------------------
    procedure load_ram_with_offset_to_shift_register (
        signal ram_controller : inout  ram_reader;
        start_address : natural;
        number_of_ram_addresses_to_be_read : natural);
------------------------------------------------------------------------
    function ram_is_buffered_to_shift_register ( ram_controller : ram_reader)
        return boolean;
------------------------------------------------------------------------
end package ethernet_frame_ram_read_pkg;
				
			

Now armed with controllable ram, the protocol headers can be parsed with the ram read controller.

Protocol stack implementation

The protocol stack is implemented with ethernet protocol instantiating ip and ip instantiating udp. With the ram read controller, the protocol stack is simplified to just two procedure calls and an if statement which checks for the next protocol in succession.

For example the ip_header_processor scans 10 bytes from the ram starting from header address and looks for udp protocol present in the protocol field of the ip header. If found, then udp protocol is triggered. Both UDP and ethernet protocols function in a similar manner. The result of the processed protocol is the ram offset from where the ram should be read by the next process or a start trigger for any application that would then further process the ram contents.

				
					------------------------------------------------------------------------
    route_data_out : process(frame_ram_read_control_port, ram_offset, udp_protocol_data_out) 
    begin
        internet_protocol_data_out <= (
                                          frame_ram_read_control => frame_ram_read_control_port + udp_protocol_data_out.frame_ram_read_control ,
                                          ram_offset => udp_protocol_data_out.ram_offset + ram_offset                                          ,
                                          frame_processing_is_ready => frame_processing_is_ready or udp_protocol_data_out.frame_processing_is_ready
                                      );
    end process route_data_out; 
------------------------------------------------------------------------
    ip_header_processor : process(clock)
        type list_of_protocol_processor_states is (wait_for_process_request, read_header);
        variable internet_protocol_state : list_of_protocol_processor_states := wait_for_process_request;
         
    begin
        if rising_edge(clock) then
            create_ram_read_controller(frame_ram_read_control_port, internet_protocol_data_in.frame_ram_output, ram_read_controller, shift_register); 
            init_protocol_control(udp_protocol_control);
            frame_processing_is_ready <= false;
            ram_offset <= 0; 
            CASE internet_protocol_state is
                WHEN wait_for_process_request =>
                    if protocol_control.protocol_processing_is_requested then
                        load_ram_with_offset_to_shift_register(ram_controller                     => ram_read_controller,
                                                               start_address                      => protocol_control.protocol_start_address,
                                                               number_of_ram_addresses_to_be_read => 11);
                        header_offset <= protocol_control.protocol_start_address;
                        internet_protocol_state := read_header;
                    end if;
                WHEN read_header =>
                    if ram_data_is_ready(internet_protocol_data_in.frame_ram_output) then
                    if get_ram_address(internet_protocol_data_in.frame_ram_output) = header_offset+11 then
                        if shift_register(7 downto 0) = x"11" then
                            request_protocol_processing(udp_protocol_control, header_offset + 20);
                        else
                            ram_offset                <= header_offset;
                            frame_processing_is_ready <= true;
                        end if;
                        internet_protocol_state := wait_for_process_request;
                    end if;
            end CASE;
        end if; --rising_edge
    end process ip_header_processor;      
				
			

VHDL allows polymorphism with the use of multiple architectures for a single component declaration. The interface to different protocol modules is the same, thus they have a common header package and individual architectures. This allows reusing the package file for the different protocol implementations, which promotes code reuse and also guarantees that the different protocols have same interfaces.

The protocol header package is defined as follows

				
					library ieee;
    use ieee.std_logic_1164.all;
    use ieee.numeric_std.all;
library work;
    use work.ethernet_frame_ram_read_pkg.all;
package network_protocol_header_pkg is
    type network_protocol_clock_group is record
        clock : std_logic;
    end record;
    type protocol_control_record is record
        protocol_processing_is_requested : boolean;
        protocol_start_address : natural;
    end record;
     
    type network_protocol_data_input_group is record
        frame_ram_output : ram_read_output_group;
        protocol_control : protocol_control_record; 
    end record;
     
    type network_protocol_data_output_group is record
        frame_ram_read_control : ram_read_control_group;
        ram_offset : natural;
    end record;
     
    component network_protocol is
        port (
            network_protocol_clocks : in network_protocol_clock_group; 
            network_protocol_data_in : in network_protocol_data_input_group;
            network_protocol_data_out : out network_protocol_data_output_group
        );
    end component network_protocol;
     
    -- signal network_protocol_clocks   : network_protocol_clock_group;
    -- signal network_protocol_data_in  : network_protocol_data_input_group;
    -- signal network_protocol_data_out : network_protocol_data_output_group
     
    -- u_network_protocol : network_protocol
    -- port map( network_protocol_clocks,
    --    network_protocol_data_in,
    --    network_protocol_data_out);
------------------------------------------------------------------------
    procedure request_protocol_processing (
        signal control : out protocol_control_record;
        protocol_start_address : natural);
     
    procedure init_protocol_control (
        signal control : out protocol_control_record);
------------------------------------------------------------------------ 
end package network_protocol_header_pkg;
				
			

A protocol has methods for requesting protocol processing and protocol initialization. The process request has the memory offset as an argument, which the next protocol up uses as address offset to start processing ram contents. This also takes care of varying header length of the IP header.

Test with FPGA

The protocol stack is triggered when the ethernet frame is received. Some real time protocols like ethercat require instant forward transmission of the received frame while it is being received. For this type of operation the frame processing would be triggered as soons as frame receiving starts.

Since the trigger is coming from a different clock domain compared to the protocol stack implementation, the signal is triple buffered and then a rising edge detector is is used. This does guarantee that a trigger is not missed, but this does not guarantee that several triggers are not produced with one changing edge due to different clock frequencies. This can be thought of being similar to a bounce effect of a mechanical switch and the triggered process should be guarded from it.

				
					protocol_trigger : process(ethernet_clocks.core_clock)
         
    begin
        if rising_edge(ethernet_clocks.core_clock) then
            shift_register <= shift_register(shift_register'left-1 downto 0 ) & ethernet_frame_receiver_data_out.toggle_data_has_been_written;
            frame_is_received <= shift_register(shift_register'left) = '0' AND shift_register(shift_register'left-1) = '1';
        end if; --rising_edge
    end process protocol_trigger;   
------------------------------------------------------------------------ 
    ethernet_protocol_clocks <= (clock => ethernet_clocks.core_clock);
    ethernet_protocol_data_in <= (
                                     frame_ram_output => ethernet_frame_ram_data_out.ram_read_port_data_out,
                                     protocol_control => (
                                                             protocol_processing_is_requested => frame_is_received, 
                                                             protocol_start_address => 0
                                                         )
                                 );
                                    
    u_ethernet_protocol : entity work.network_protocol(ethernet_protocol)
    port map( ethernet_protocol_clocks,
          ethernet_protocol_data_in,
          ethernet_protocol_data_out);
				
			

The different protocols are triggered based on the protocol fields in the headers. Ethernet protocol triggers internet_protocol if 0x0800 is found in the ethertype field and ip triggers udp when 0x11 is found on the protocol field of the IP header and then udp triggers the frame receive. The frame is transferred with uart starting from the memory address set by the triggering protocol.

The test code used with the ethernet frame receiver previously is modified such that it also reads the frame ram and 256 bytes are printed to console in 16 bit chunks. The highlighted parts of the code show the accessing of shared ram due to the use of the ram reader.

				
					-- code snippet from system_components.vhd
        if mdio_data_read_is_ready(mdio_driver_data_out) then
            if test_counter < 128+32 then
                if test_counter < 128 then
                    ram_read_process_counter <= 0;
                else
                    transmit_16_bit_word_with_uart(uart_data_in, get_data_from_mdio(mdio_driver_data_out));
                end if;
            end if; 
        end if;
         
        --------------------------------------------------
        create_ram_read_controller(ethernet_data_in.ram_read_control_port    ,
                                    ethernet_data_out.ethernet_frame_ram_out ,
                                    ram_read_controller                      ,
                                    shift_register); 
        --------------------------------------------------
        if protocol_processing_is_ready(ethernet_data_out.ethernet_protocol_data_out) then
            ram_address_offset <= get_frame_address_offset(ethernet_data_out.ethernet_protocol_data_out);
        end if;
        --------------------------------------------------
        CASE ram_read_process_counter is
            WHEN 0 => 
                load_ram_with_offset_to_shift_register(ram_controller                      => ram_read_controller                 ,
                                                        start_address                      => test_counter*2 + ram_address_offset ,
                                                        number_of_ram_addresses_to_be_read => 2);
                ram_read_process_counter <= ram_read_process_counter +1;
            WHEN 1 =>
                if ram_is_buffered_to_shift_register(ram_read_controller) then
                    transmit_16_bit_word_with_uart(uart_data_in, shift_register(15 downto 0)); 
                    ram_read_process_counter <= ram_read_process_counter + 1;
                end if;
            WHEN others => -- hang here and wait for counter being set to zero
        end CASE;
    end if; --rising_edge
end process test_with_uart; 
				
			

The protocol stack transmits the start address of the triggering protocols data section along with the processing is ready trigger. The different protocols that are read from the data stream can be observed from the printed data. If only ethernet frame is processed, then the entire frame is printed, if IP is processed then the IP protocol is printed and if UDP is processed, then the offset is set such that the UDP ports are leading in the printed data.

Figure 1. Captured UDP frame. The frame receiver pads messages shorter than 256 bytes with 0xEE thus they are not part of the received UDP frame

With the current implementation, the gigabit ethernet communication with protocol header parsing takes less than 500 logic cells of which less than 150 are dedicated to the protocol header processing and the design can meet timing up to 160MHz. So minimal processing is indeed very, very minimal.

Figure 2. Resource usage of the entire design with ethernet underlined. The design includes the test code which was presented in the bandpass filter blog post.

With the minimal protocol implementations, the next thing to do is to establish a bidirectional connection by sending an UDP datagram with valid IP address and port number to computer. After this, the ethernet is up and running and can be used for general communication between computer and the FPGA.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top