Pipelined Implementation for a 32-bit Barrel Shifter

Introduction

We have discussed the 32-bit combinational right barrel shifter. Now let's look at the pipelined design method. A barrel shifter takes and rotates the bits right by the specified rotate value of 0 to 31 (five bits). The rotation takes the bit in the MSB position and moves it to the right by the specified value, all the bits to the right of MSB are moved downward with all the values moving to the right until the LSB bit is filled. The remainder of the bits fill the MSB bit downward in their same bit order. The input is a 32-bit binary number Din, a five-bit control word, Cin, and a 32-bit output, Dout. (A diagram below would be helpful if you didn't already know what about barrel shifting). In the pipelined fashion, both input data Din and control value Cin will be fed at each clock cycle. After several (6) clock cycles the result will be in the output register and output as Dout. It is like an assembly line, from the start to the end, there are several stages working simultaneously, and at the end, you get the product/result.

Algorithm

The goal is to take the input values and pass them through the start registers and then through a combinational circuit that reorders the input bits by rotating them all by the specified five bit input control word. A purely combinational barrel shifter can be implemented by using five stages of Mux's. But with pipelined strategy, we need registers to record the output of each stage and the control value for the next stage. The first stage takes the input and either barrel shifts it right by 16 places or passes the value through unchanged. The selected value is passed to the register of the next stage which then shifts by 8, 4, 2, or 1. Thus, the total shift is sum of 16 * Control(4) + 8 * Control(3) + 4 * Control(2) + 2 * Control(1) + 1* Control(0).

RTL Implementation

The RTL block diagram below has 12 blocks. There are seven register blocks. They are used to store and buffer the input, the results from each stage of the pipeline and the final result. This allows us to clock the values in and determine the maximum clock period for a pipelined barrel shift. Register, Reg_Din, can be reset, or loaded, and stores the 32 bit input value, Din. Register, Reg_Cin, can be reset, or loaded, and stores the 5 bit input value, Cin. Registers Reg_Shft16, Reg_Shft8, Reg_Shft4, and Reg_Shft2 are used to store the results after each stage (shifting indicated by the shift value). Register, Reg_Dout, can be reset and stores the 32 bit output value, Dout. Dout is the final barrel shifted value. All the 12 registers can be reset simultaneously. There are five combinational blocks, these are Combo_Shft16, Combo_Shft8, Combo_Shft4, Combo_Shft2, and Combo_Shft1. Each has a one bit input value that controls if the block passes directly through the input 32 value, or performs a shift. The Combo_Shft16 does a 16 bit shift, Combo_Shft8 does an 8 bit, Combo_Shft4 does 4 bit shift, Combo_Shft2 does 2 bit shift, and Combo_Shft1 does a one bit shift. The stages are arranged in order with Combo_Shft16 having as input the 32-bit buffered input signal, Buf_Din. The output of this stage is a 32-bit value, Dout_Shft16. The output from the Combo_Shft16 block is stored in the register Reg_Shft16 and at the next clock, the output of the register Reg_Shft16 becomes the input for the next block, Combo_Shft8. The output from Combo_Shft8 feeds into the next block Combo_Shft4 with the register Reg_Shft8 in between, and so on till you reach the last stage whose output, Dout_Shft1, is buffered by the register block, Reg_Dout. The output from the register block is the final 32-bit barrel shifted value, Dout. The control value for each stage is the buffered value of the input, Buf_Cntrl[4:0] initially. The first bit, Buf_Cntrl(4), is an input to the Combo_Shft16 stage, if the value is one then this stage rotates its input value by 16 bits, otherwise, just pass through. The 4 lower bits of Buf_Cntrl is passed to the register Reg_Shft16 in the Buf_Cntrl[3:0]. The buffered control value now is in the Buf_Cntrl3[3:0]. The most significant bit Buf_Cntrl3(3) is sent to the Combo_Shft8 stage/block as a select signal for the 2-1 MUX. If it is one then this stage rotates its input value by 8 bits, otherwise, just pass through. The 3 lower bits of Buf_Cntrl3 is passed to the register Reg_Shft8 in the Buf_Cntrl3[2:0]. The buffered control value now is in the Buf_Cntrl2[2:0]. The most significant bit Buf_Cntrl2(2) is sent to the Combo_Shft4 stage/block as a select signal for the 2-1 MUX. If it is one then this stage rotates its input value by 4 bits, otherwise, just pass through. The 2 lower bits of Buf_Cntrl2 is passed to the register Reg_Shft4 in the Buf_Cntrl2[1:0]. The buffered control value now is in the Buf_Cntrl1[1:0]. The most significant bit Buf_Cntrl1(1) is sent to the Combo_Shft2 stage/block as a select signal for the 2-1 MUX. If it is one then this stage rotates its input value by 2 bits, otherwise, just pass through. The 1 lower bits of Buf_Cntrl1 is passed to the register Reg_Shft2 in the Buf_Cntrl1(0). The buffered control value now is in the Buf_Cntrl0. This value is sent to the Combo_Shft1 stage/block as a select signal for the 2-1 MUX. If it is one then this stage rotates its input value by 1 bit, otherwise, just pass through. The output of Combo_Shft1, Dout_Shft1 is stored in the last register Reg_Dout, the output of this register Dout[31:0] is the final result which is 6 clock away from the input Din[31:0]. If more Din values are feed in at the following consecutive clock, then, from the Dout, you will get the corresponding results consecutively at each clock. This is the style and process of a pipelined design.

Signal and block summary list

Section 1. Input and Output signals to the entity

Cin[4:0] 5-bit input, encoding the number of bits to shift the input value, Din, to the right. Din[31:0] 32-bit input value that is to be barrel shifted. LoadDin 1 - bit to start the input data register LoadCin 1 - bit to start the shifter control value register Clk 1 - bit to drive the pipelining process Reset 1 - bit to reset all the registers Dout[31:0] 32-bit output, rotated by number of places specified in the input value Cin.

Section 2. Alphabetical List of Register Blocks

Reg_Cin 5-bit register, load, reset. Used to store the input control signal value Cin. Reg_Din 32-bit register, load, reset. Used to store the input value Din. Reg_Shft16 32 + 4 bits register, used to store the output after stage one shift in Dout_Shft16 Reg_Shft8 32 + 3 bits register, used to store the output after stage one shift in Dout_Shft8 Reg_Shft4 32 + 2 bits register, used to store the output after stage one shift in Dout_Shft4 Reg_Shft2 32 + 1 bits register, used to store the output after stage one shift in Dout_Shft2 Reg_Dout 32-bit register, load, reset. Used to store the result output, Dout.

Section 3. Alphabetical List of Combinational Blocks

Combo_Shft1 32-bit 2-1 Mux's. Output is either input passed through or shifted by 1 Combo_Shft2 32-bit 2-1 Mux's. Output is either input passed through or shifted by 2 Combo_Shft4 32-bit 2-1 Mux's. Output is either input passed through or shifted by 4 Combo_Shft8 32-bit 2-1 Mux's. Output is either input passed through or shifted by 8 Combo_Shft16 32-bit 2-1 Mux's. Output is either input passed through or shifted by 16

Section 4. A list of Signals (excludes entity inputs and outputs)

Buf_Din 32-bit output from the buffer register in block Reg_Din Buf_Cntrl 5-bit output from the buffer register in block Reg_Cin Buf_Shft16 32-bit output from the Register Reg_Shft16 block Buf_Shft8 32-bit output from the Register Reg _Shft8 block Buf_Shft4 32-bit output from the Register Reg _Shft4 block Buf_Shft2 32-bit output from the Register Reg _Shft2 block Buf_Cntrl3 4-bit output from the buffer register in block Reg_Shft16 Buf_Cntrl2 3-bit output from the buffer register in block Reg_Shft8 Buf_Cntrl1 2-bit output from the buffer register in block Reg_Shft4 Buf_Cntrl0 1-bit output from the buffer register in block Reg_Shft2 Dout_Shft16 32-bit output from the Combo_Shft16 block Dout_Shft8 32-bit output from the Combo_Shft8 block Dout_Shft4 32-bit output from the Combo_Shft4 block Dout_Shft2 32-bit output from the Combo_Shft2 block Dout_Shft1 32-bit output from the Combo_Shft1 block goes in to Reg_Dout

RTL Block Diagram






VHDL Code

----------------------------------------------------------------------------------
-- Company:        Computer Science
-- Engineer: 	   Guili Liu
-- 
-- Create Date:    19:48:37 02/02/2009 
-- Design Name:    BarrelShPipeline.vhd
-- Module Name:    BarrelShPipeline - Behavioral 
-- Project Name:   BarrelShifterPipeline
-- Target Devices: xc2vp30-6ff1152
-- Tool versions: 
-- Description: 
--
-- Dependencies: 
--
-- Revision: 
-- Revision 0.01 - File Created
-- Additional Comments: 
--
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;

entity BarrelShPipline is
    Port ( Din : in  STD_LOGIC_VECTOR (31 downto 0);
           Cin : in  STD_LOGIC_VECTOR (4 downto 0);
           Dout : out  STD_LOGIC_VECTOR (31 downto 0);
           LoadDin : in  STD_LOGIC;
           LoadCin : in  STD_LOGIC;
           clk : in  STD_LOGIC);
end BarrelShPipline;

architecture Behavioral of BarrelShPipline is
signal 	Buf_Cntrl:STD_LOGIC_VECTOR (4 downto 0);

signal 	Buf_Cntrl3:STD_LOGIC_VECTOR (3 downto 0);
signal 	Buf_Cntrl2:STD_LOGIC_VECTOR (2 downto 0);
signal 	Buf_Cntrl1:STD_LOGIC_VECTOR (1 downto 0);
signal 	Buf_Cntrl0:STD_LOGIC;

signal 	Buf_Din:STD_LOGIC_VECTOR (31 downto 0);

signal   Buf_Shft2:STD_LOGIC_VECTOR (31 downto 0);
signal   Buf_Shft4:STD_LOGIC_VECTOR (31 downto 0);
signal   Buf_Shft8:STD_LOGIC_VECTOR (31 downto 0);
signal   Buf_Shft16:STD_LOGIC_VECTOR (31 downto 0);

signal 	Dout_shft1:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft2:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft4:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft8:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft16:STD_LOGIC_VECTOR (31 downto 0);

begin

RegDin: Process (LoadDin, clk)
		  begin 
		    if  clk'event and clk = '1' then
		    if (loadDin = '1') then 
			     Buf_Din <= Din;
				  else 
				  Buf_Din <= (others => '0');
				  end if;
				  
			  end if;
			end process RegDin;


RegCin: Process (LoadCin, clk)
		  begin 
		    if  (clk'Event and clk = '1') then 
		        if (loadCin = '1') then 
			     Buf_Cntrl <= Cin;
				  else 
				  Buf_Cntrl <= "00000";
				  end if;
			end if;
			end process RegCin;	
			
Combo_Shft16: Process (LoadDin, LoadCin, Buf_Cntrl, Buf_Din)
		  begin 
		    if (Buf_Cntrl(4) = '1') then 
			     Dout_Shft16 <= Buf_Din (15 downto 0)& buf_Din (31 downto 16);
				  else 
				  Dout_Shft16 <= Buf_Din;
				  end if;
			end process Combo_Shft16;
			
Reg_Shft16: Process (clk, Dout_Shft16, Buf_Cntrl)
		  begin 
		    if  clk'event and clk = '1' then
		   
			     Buf_Shft16 <= Dout_Shft16;
				  Buf_Cntrl3 <= Buf_Cntrl(3 downto 0);
				  
			  end if;
			end process Reg_Shft16;			
			
Combo_Shft8: Process (LoadDin, LoadCin, Buf_Cntrl3, Buf_Shft16)
		  begin 
		    if (Buf_Cntrl3(3) = '1') then 
			     Dout_Shft8 <= Buf_Shft16 (7 downto 0)& Buf_Shft16 (31 downto 8);
				  else 
				  Dout_Shft8 <= Buf_Shft16;
				  end if;
			end process Combo_Shft8;			
		
Reg_Shft8: Process (clk, Dout_Shft8, Buf_Cntrl3)
		  begin 
		    if  clk'event and clk = '1' then
		  	     Buf_Shft8 <= Dout_Shft8;
				  Buf_Cntrl2 <= Buf_Cntrl3(2 downto 0);
			  end if;
			end process Reg_Shft8;			
		
Combo_Shft4: Process (LoadDin, LoadCin, Buf_Cntrl2, Buf_Shft8)
		  begin 
		    if (Buf_Cntrl2(2) = '1') then 
			     Dout_Shft4 <= Buf_Shft8 (3 downto 0)& Buf_Shft8 (31 downto 4);
				  else 
				  Dout_Shft4 <= Buf_Shft8;
				  end if;
			end process Combo_Shft4;			
			
Reg_SHft4: Process (clk, Dout_Shft4, Buf_Cntrl2)
		  begin 
		    if  clk'event and clk = '1' then
		  	     Buf_Shft4 <= Dout_Shft4;
              Buf_Cntrl1 <= Buf_Cntrl2(1 downto 0);				  
			  end if;
			end process Reg_Shft4;

			
Combo_Shft2: Process (LoadDin, LoadCin, Buf_Cntrl1,Buf_Shft4)
		  begin 
		    if (Buf_Cntrl1(1) = '1') then 
			     Dout_Shft2 <= Buf_Shft4 (1 downto 0)& Buf_Shft4 (31 downto 2);
				  else 
				  Dout_Shft2 <= Buf_Shft4 (31 downto  0);
				  end if;
			end process Combo_Shft2;	

Reg_Shft2: Process (clk, Dout_Shft2, Buf_Cntrl1)
		  begin 
		    if  clk'event and clk = '1' then
		  	     Buf_Shft2 <= Dout_Shft2;
              Buf_Cntrl0 <= Buf_Cntrl1(0);			  
			  end if;
			end process Reg_Shft2;			
			
			
Combo_Shft1: Process (LoadDin, LoadCin, Buf_Cntrl0, Buf_Shft2)
		  begin 
		    if (Buf_Cntrl0 = '1') then 
			     Dout_Shft1 <= Buf_Shft2(0)& Buf_Shft2 (31 downto 1);
				  else 
				  Dout_Shft1 <= Buf_Shft2 (31 downto 0);
				  end if;
			end process Combo_Shft1;			
			
RegDout: Process (clk, Dout_Shft1)
		  begin 
		    if  clk'event and clk = '1' then
			     
		        Dout <=Dout_Shft1;
				  
			 end if;
			end process RegDout;				
			
end Behavioral;