# Pipelined Implementation for a 32-bit Barrel Shifter

```Introduction
We have discussed the 32-bit combinational right barrel shifter. Now let's look
at the pipelined design method. A barrel shifter takes and rotates the bits
right by the specified rotate value of 0 to 31 (five bits). The rotation takes
the bit in the MSB position and moves it to the right by the specified value,
all the bits to the right of MSB are moved downward with all the values moving
to the right until the LSB bit is filled. The remainder of the bits fill
the MSB bit downward in their same bit order. The input is a 32-bit binary
number Din, a five-bit control word, Cin, and a 32-bit output, Dout.
barrel shifting).

In the pipelined fashion, both input data Din and control value Cin will be
fed at each clock cycle. After several (6) clock cycles the result will be
in the output register and output as Dout.

It is like an assembly line, from the start to the end, there are several
stages working simultaneously, and at the end, you get the product/result.

Algorithm
The goal is to take the input values and pass them through the start registers
and then through a combinational circuit that reorders the input bits by
rotating them all by the specified five bit input control word. A purely
combinational barrel shifter can be implemented by using five stages of Mux's.
But with pipelined strategy, we need registers to record the output of each
stage and the control value for the next stage.  The first stage takes the
input and either barrel shifts it right by 16 places or passes the value
through unchanged. The selected value is passed to the register of the next
stage which then shifts by 8, 4, 2, or 1. Thus, the total shift is sum of
16 * Control(4) + 8 * Control(3) + 4 * Control(2) + 2 * Control(1) + 1* Control(0).

RTL Implementation
The RTL block diagram below has 12 blocks. There are seven  register blocks.
They are used to store and buffer the input, the results from each stage of
the pipeline and the final result. This allows us to clock the values in and
determine the maximum clock period for a pipelined barrel shift. Register,
Reg_Din, can be reset, or loaded, and stores the 32 bit input value, Din.
Register, Reg_Cin, can be reset, or loaded, and stores the 5 bit input value, Cin.
Registers Reg_Shft16, Reg_Shft8, Reg_Shft4, and Reg_Shft2 are used to store the
results after each stage (shifting indicated by the shift value).
Register, Reg_Dout, can be reset and stores the 32 bit output value, Dout.
Dout is the final barrel shifted value.
All the 12 registers can be reset simultaneously.

There are five combinational blocks, these are Combo_Shft16, Combo_Shft8,
Combo_Shft4, Combo_Shft2, and Combo_Shft1. Each has a one bit input value
that controls if the block passes directly through the input 32 value, or
performs a shift. The Combo_Shft16 does a 16 bit shift, Combo_Shft8 does an 8 bit,
Combo_Shft4 does 4 bit shift, Combo_Shft2 does 2 bit shift,  and Combo_Shft1 does
a one bit shift.  The stages are arranged in order with Combo_Shft16 having
as input the 32-bit buffered input signal, Buf_Din.
The output of this stage is a 32-bit value, Dout_Shft16.
The output from the Combo_Shft16 block is stored in the register Reg_Shft16
and at the next clock, the output of the register Reg_Shft16 becomes the input
for the next block, Combo_Shft8.
The output from Combo_Shft8 feeds into the next block Combo_Shft4 with
the register Reg_Shft8 in between, and so on till you reach the last stage
whose output, Dout_Shft1, is buffered by the register block, Reg_Dout.
The output from the register block is the final 32-bit barrel shifted value, Dout.

The control value for each stage is the buffered value of the input, Buf_Cntrl[4:0] initially.
The first bit, Buf_Cntrl(4), is an input to the Combo_Shft16 stage, if the value
is one then this stage rotates its input value by 16 bits, otherwise, just pass through.
The 4 lower bits of Buf_Cntrl is passed to the register Reg_Shft16 in the Buf_Cntrl[3:0].
The buffered control value now is in the Buf_Cntrl3[3:0].  The most significant bit
Buf_Cntrl3(3) is sent to the Combo_Shft8 stage/block as a select signal for the 2-1 MUX.
If it is one then this stage rotates its input value by 8 bits, otherwise, just pass through.
The 3 lower bits of Buf_Cntrl3 is passed to the register Reg_Shft8 in the Buf_Cntrl3[2:0].
The buffered control value now is in the Buf_Cntrl2[2:0].  The most significant bit
Buf_Cntrl2(2) is sent to the Combo_Shft4 stage/block as a select signal for the 2-1 MUX.
If it is one then this stage rotates its input value by 4 bits, otherwise, just pass through.
The 2 lower bits of Buf_Cntrl2 is passed to the register Reg_Shft4 in the Buf_Cntrl2[1:0].
The buffered control value now is in the Buf_Cntrl1[1:0].  The most significant bit
Buf_Cntrl1(1) is sent to the Combo_Shft2 stage/block as a select signal for the 2-1 MUX.
If it is one then this stage rotates its input value by 2 bits, otherwise, just pass through.
The 1 lower bits of Buf_Cntrl1 is passed to the register Reg_Shft2 in the Buf_Cntrl1(0).
The buffered control value now is in the Buf_Cntrl0.  This value  is sent to the
Combo_Shft1 stage/block as a select signal for the 2-1 MUX.
If it is one then this stage rotates its input value by 1 bit, otherwise, just pass through.
The output of Combo_Shft1, Dout_Shft1 is stored in the last register Reg_Dout,
the output of this register Dout[31:0] is the final result which is 6 clock away
from the input Din[31:0].  If more Din values are feed in at the following
consecutive clock, then, from the Dout, you will get the corresponding results
consecutively at each clock.
This is the style and process of a pipelined design.

Signal and block summary list

Section 1. Input and Output signals to the entity

Cin[4:0]		5-bit input, encoding the number of bits to shift
the input value, Din, to the right.
Din[31:0] 		32-bit input value that is to be barrel shifted.
LoadDin			1 - bit to start the input data register
LoadCin 		1 - bit to start the shifter control value register
Clk			1 - bit to drive the pipelining process
Reset			1 - bit to reset all the registers
Dout[31:0]		32-bit output, rotated by number of places specified
in the input value Cin.

Section 2. Alphabetical List of Register Blocks

Reg_Cin		5-bit register, load, reset. Used to store the input control signal value Cin.
Reg_Din		32-bit register, load, reset. Used to store the input value Din.

Reg_Shft16	32 + 4 bits register, used to store the output after stage one shift in Dout_Shft16
Reg_Shft8	32 + 3 bits register, used to store the output after stage one shift in Dout_Shft8
Reg_Shft4	32 + 2 bits register, used to store the output after stage one shift in Dout_Shft4
Reg_Shft2	32 + 1 bits register, used to store the output after stage one shift in Dout_Shft2

Reg_Dout	32-bit register, load, reset. Used to store the result output, Dout.

Section 3. Alphabetical List of Combinational Blocks
Combo_Shft1		32-bit  2-1 Mux's. Output is either input passed through or shifted by 1
Combo_Shft2		32-bit  2-1 Mux's. Output is either input passed through or shifted by 2
Combo_Shft4		32-bit  2-1 Mux's. Output is either input passed through or shifted by 4
Combo_Shft8		32-bit  2-1 Mux's. Output is either input passed through or shifted by 8
Combo_Shft16		32-bit  2-1 Mux's. Output is either input passed through or shifted by 16

Section 4. A list of Signals (excludes entity inputs and outputs)

Buf_Din			32-bit output from the  buffer register in block Reg_Din
Buf_Cntrl		5-bit output from the buffer register in block Reg_Cin

Buf_Shft16		32-bit output from the Register Reg_Shft16 block
Buf_Shft8		32-bit output from the Register Reg _Shft8 block
Buf_Shft4		32-bit output from the Register Reg _Shft4 block
Buf_Shft2		32-bit output from the Register Reg _Shft2 block

Buf_Cntrl3		4-bit output from the buffer register in block Reg_Shft16
Buf_Cntrl2		3-bit output from the buffer register in block Reg_Shft8
Buf_Cntrl1		2-bit output from the buffer register in block Reg_Shft4
Buf_Cntrl0		1-bit output from the buffer register in block Reg_Shft2

Dout_Shft16		32-bit output from the Combo_Shft16 block
Dout_Shft8		32-bit output from the Combo_Shft8 block
Dout_Shft4		32-bit output from the Combo_Shft4 block
Dout_Shft2		32-bit output from the Combo_Shft2 block
Dout_Shft1		32-bit output from the Combo_Shft1 block goes in to Reg_Dout
```

## VHDL Code

```----------------------------------------------------------------------------------
-- Company:        Computer Science
-- Engineer: 	   Guili Liu
--
-- Create Date:    19:48:37 02/02/2009
-- Design Name:    BarrelShPipeline.vhd
-- Module Name:    BarrelShPipeline - Behavioral
-- Project Name:   BarrelShifterPipeline
-- Target Devices: xc2vp30-6ff1152
-- Tool versions:
-- Description:
--
-- Dependencies:
--
-- Revision:
-- Revision 0.01 - File Created
--
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;

entity BarrelShPipline is
Port ( Din : in  STD_LOGIC_VECTOR (31 downto 0);
Cin : in  STD_LOGIC_VECTOR (4 downto 0);
Dout : out  STD_LOGIC_VECTOR (31 downto 0);
clk : in  STD_LOGIC);
end BarrelShPipline;

architecture Behavioral of BarrelShPipline is
signal 	Buf_Cntrl:STD_LOGIC_VECTOR (4 downto 0);

signal 	Buf_Cntrl3:STD_LOGIC_VECTOR (3 downto 0);
signal 	Buf_Cntrl2:STD_LOGIC_VECTOR (2 downto 0);
signal 	Buf_Cntrl1:STD_LOGIC_VECTOR (1 downto 0);
signal 	Buf_Cntrl0:STD_LOGIC;

signal 	Buf_Din:STD_LOGIC_VECTOR (31 downto 0);

signal   Buf_Shft2:STD_LOGIC_VECTOR (31 downto 0);
signal   Buf_Shft4:STD_LOGIC_VECTOR (31 downto 0);
signal   Buf_Shft8:STD_LOGIC_VECTOR (31 downto 0);
signal   Buf_Shft16:STD_LOGIC_VECTOR (31 downto 0);

signal 	Dout_shft1:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft2:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft4:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft8:STD_LOGIC_VECTOR (31 downto 0);
signal 	Dout_Shft16:STD_LOGIC_VECTOR (31 downto 0);

begin

begin
if  clk'event and clk = '1' then
Buf_Din <= Din;
else
Buf_Din <= (others => '0');
end if;

end if;
end process RegDin;

begin
if  (clk'Event and clk = '1') then
Buf_Cntrl <= Cin;
else
Buf_Cntrl <= "00000";
end if;
end if;
end process RegCin;

begin
if (Buf_Cntrl(4) = '1') then
Dout_Shft16 <= Buf_Din (15 downto 0)& buf_Din (31 downto 16);
else
Dout_Shft16 <= Buf_Din;
end if;
end process Combo_Shft16;

Reg_Shft16: Process (clk, Dout_Shft16, Buf_Cntrl)
begin
if  clk'event and clk = '1' then

Buf_Shft16 <= Dout_Shft16;
Buf_Cntrl3 <= Buf_Cntrl(3 downto 0);

end if;
end process Reg_Shft16;

begin
if (Buf_Cntrl3(3) = '1') then
Dout_Shft8 <= Buf_Shft16 (7 downto 0)& Buf_Shft16 (31 downto 8);
else
Dout_Shft8 <= Buf_Shft16;
end if;
end process Combo_Shft8;

Reg_Shft8: Process (clk, Dout_Shft8, Buf_Cntrl3)
begin
if  clk'event and clk = '1' then
Buf_Shft8 <= Dout_Shft8;
Buf_Cntrl2 <= Buf_Cntrl3(2 downto 0);
end if;
end process Reg_Shft8;

begin
if (Buf_Cntrl2(2) = '1') then
Dout_Shft4 <= Buf_Shft8 (3 downto 0)& Buf_Shft8 (31 downto 4);
else
Dout_Shft4 <= Buf_Shft8;
end if;
end process Combo_Shft4;

Reg_SHft4: Process (clk, Dout_Shft4, Buf_Cntrl2)
begin
if  clk'event and clk = '1' then
Buf_Shft4 <= Dout_Shft4;
Buf_Cntrl1 <= Buf_Cntrl2(1 downto 0);
end if;
end process Reg_Shft4;

begin
if (Buf_Cntrl1(1) = '1') then
Dout_Shft2 <= Buf_Shft4 (1 downto 0)& Buf_Shft4 (31 downto 2);
else
Dout_Shft2 <= Buf_Shft4 (31 downto  0);
end if;
end process Combo_Shft2;

Reg_Shft2: Process (clk, Dout_Shft2, Buf_Cntrl1)
begin
if  clk'event and clk = '1' then
Buf_Shft2 <= Dout_Shft2;
Buf_Cntrl0 <= Buf_Cntrl1(0);
end if;
end process Reg_Shft2;

begin
if (Buf_Cntrl0 = '1') then
Dout_Shft1 <= Buf_Shft2(0)& Buf_Shft2 (31 downto 1);
else
Dout_Shft1 <= Buf_Shft2 (31 downto 0);
end if;
end process Combo_Shft1;

RegDout: Process (clk, Dout_Shft1)
begin
if  clk'event and clk = '1' then

Dout <=Dout_Shft1;

end if;
end process RegDout;

end Behavioral;

```