且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将for循环转换为FPGA的***方法

更新时间:2023-02-14 10:55:37

要将来自具有循环,条件等的算法的一段代码转换为Verilog的可合成形式,您需要将其转换为FSM.

To convert a piece of code coming from an algorithm with loops, conditionals, et al, into a synthesizable form of Verilog, you need to translate it in to a FSM.

例如,一个for循环来执行您所要求的类似操作将是:

For example, a for loop to do something similar you are asking for would be:

int sample_I[N], sync_I[N]; // assume 32-bit ints, 2-complement numbers.
int sample_Q[N], sync_Q[N];
int i, corsum, abscorsum = 0;

for (i=0;i<N;i++)
{
  corsum = sample_I[i] * sync_I[i] + sample_Q[i] * sync_Q[i];
  abscorsum += abs(corsum);
}

首先,将句子分组到时隙中,这样您就可以看到可以在同一时钟周期(相同状态)中执行哪些操作,并为每个时隙分配一个状态:

First, group sentences into time slots, so you can see which actions can be done in the same clock cycle (same state), and assign a state to each slot:

1)

i = 0
abscorsum = 0
goto 2)


2)


2)

if i!=N    
  corsum = sample_I[i] * sync_I[i]
  goto 3)
else
  goto 5)


3)


3)

corsum = corsum + sample_Q[i] * sync_Q[i]
i = i + 1
goto 4)


4)


4)

if (corsum >= 0)
  abscorsum = abscorsum + corsum
else
  abscorsum = abscorsum + (-corsum)
goto 2)


5)


5)

STOP


状态2和3可以合并为一个状态,但这将迫使合成器推断两个乘法器,此外,所得组合路径的传播延迟可能非常高,从而限制了该设计所允许的时钟频率.因此,我将点积计算分为两个部分,每个部分都使用一个乘法运算.如果指示的话,该合成器可以使用一个乘法器并为两个操作共享它,因为两者都发生在不同的时钟周期中.


States 2 and 3 may be merged into a single state, but that would force the synthesizer to infer two multipliers, and besides, the propagation delay of the resulting combinatorial path could be very high, limiting the clock frequency allowable for this design. So, I have split the dot product calculation into two parts, each one them using a single multiplication operation. The synthesizer, if instructed so, can use one multiplier and share it for the two operations, as both happen in different clock cycles.

转换为以下模块: http://www.edaplayground.com/x/MEG

信号rst用于向模块发出信号以开始运行.模块将finish发出信号以指示操作结束和输出(abscorrsum)的有效性

Signal rst is used to signal the module to start operation. finish is raised by the module to signal end of operation and validness of output (abscorrsum)

sample_Isync_isample_Qsync_Q是使用存储块建模的,其中i是要读取的元素的地址.大多数合成器会推断出这些向量的Block RAM,因为它们中的每一个只能在一种状态下读取,并且始终具有相同的地址信号.

sample_I, sync_i, sample_Q and sync_Q are modeled using memory blocks, with i being the address of the element to read. Most synthesizers will infer block RAMs for these vectors, as each of them is read only in one state, and always with the same address signal.

module corrdotprod #(N=4) (
  input wire clk,
  input wire rst,
  output reg [31:0] i,
  input wire [31:0] sample_i,
  input wire [31:0] sync_i,
  input wire [31:0] sample_q,
  input wire [31:0] sync_q,
  output reg [31:0] abscorrsum,
  output reg finish
);

  parameter
    STATE1 = 3'd1,
    STATE2 = 3'd2,
    STATE3 = 3'd3,
    STATE4 = 3'd4,
    STATE5 = 3'd5;

  reg [31:0] corrsum;
  reg [2:0] state;

  always @(posedge clk) begin
    if (rst == 1'b1) begin
      state <= STATE1;
    end
    else begin
      case (state)
        STATE1:
          begin
            i <= 0;
            abscorrsum <= 0;
            finish <= 1'b0;
            state <= STATE2;
          end
        STATE2:
          begin
            if (i!=N) begin
              corrsum <= sample_i * sync_i; // synthesizer deals with multiplication
              state <= STATE3;
            end
            else begin
              state <= STATE5;
            end
          end
        STATE3:
          begin
            corrsum <= corrsum + sample_q * sync_q; // this product can share the multiplier as above
            i <= i + 1;
            state <= STATE4;
          end
        STATE4:
          begin
            if (corrsum[31] == 1'b0) // remember: 2-complement
              abscorrsum <= abscorrsum + corrsum;
            else
              abscorrsum <= abscorrsum + (~corrsum+1);
            state <= STATE2;
          end
        STATE5:
          finish <= 1'b1;
      endcase      
    end
  end
endmodule

可以使用以下简单的测试台进行测试:

Which can be tested with this simple test bench:

module tb;
  reg clk;
  reg rst;
  reg [31:0] sample_i[0:3];
  reg [31:0] sync_i[0:3];
  reg [31:0] sample_q[0:3];
  reg [31:0] sync_q[0:3];
  wire [31:0] i;
  wire [31:0] abscorrsum;

  corrdotprod #(.N(4)) uut  (clk, rst, i, sample_i[i], sync_i[i], sample_q[i], sync_q[i], abscorrsum, finish);

  integer tb_i, tb_corrsum, tb_abscorrsum;
  initial begin
    $dumpfile ("dump.vcd");
    $dumpvars (0, tb.uut);    

    sample_i[0] = 1;
    sample_i[1] = 2;
    sample_i[2] = 3;
    sample_i[3] = 4;

    sync_i[0] = 2;
    sync_i[1] = -2;
    sync_i[2] = 2;
    sync_i[3] = -2;

    sample_q[0] = -1;
    sample_q[1] = -2;
    sample_q[2] = -3;
    sample_q[3] = -4;

    sync_q[0] = 3;
    sync_q[1] = -3;
    sync_q[2] = 3;
    sync_q[3] = -3;

    clk = 0;

    rst = 1;
    #30;
    rst = 0;
    wait (finish == 1);
    $display ("ABSCORRSUM    = %d\n", abscorrsum);

    // Testing result from module
    tb_abscorrsum = 0;
    for (tb_i = 0; tb_i < 4; tb_i = tb_i + 1) begin
      tb_corrsum = sample_i[tb_i] * sync_i[tb_i] + sample_q[tb_i] * sync_q[tb_i];
      if (tb_corrsum<0)
        tb_corrsum = -tb_corrsum;
      tb_abscorrsum = tb_abscorrsum + tb_corrsum;
    end
    $display ("TB_ABSCORRSUM = %d\n", tb_abscorrsum);

    $finish;
  end

  always begin
    clk = #5 !clk;
  end
endmodule