# COMPARATIVE ANALYSIS OF MODIFIED REGISTER EXCHANGE METHOD AND TRACE BACK METHOD OF VITERBI DECODER FOR WIRELESS COMMUNICATION

H. Singh<sup>1</sup>, R. K. Vyas<sup>2</sup> and Deepak Raghuvanshi<sup>3</sup>

<sup>1</sup>Student of Digital Communication Engineering, RTU University, Kota, India <sup>2</sup>Department of Electronics & Comm. Engineering, RTU University, Kota, India <sup>3</sup>Student of Digital Communication Engineering, RGPV University, Bhopal, India

## **ABSTRACT**

The Objective of this paper is to describe the comparative analysis of the two different method of implementation for Viterbi decoder based on the modified Register Exchange and Trace Back Method. The base of the comparison is simulation result. Both the technologies are simulated with Xilinx software9.1i. The simulation result in the form of HDL synthesis report, Timing analysis and Power estimation. The Modified Register Exchange Method is Simpler than the Trace Back (TB) Method but its disadvantage is that every bit in the memory must be read and rewritten for each bit decoded .Using the 'pointer concept; a pointer is assigned to each register with the help of this concept copying the contents of one register to another, the pointers are modified. Power dissipation, Performance, memory size is analyzed for both the modified Register Exchange method and TB method. The new Register Exchange Method shows an average power reduction of 45 percent. The memory is reduced by half.

**KEYWORDS:** Hardware description language (HDL), Register Exchange (RE), Viterbi Decoder (VD), Trace Back (TB)

# I. INTRODUCTION

Code Division Multiple Access (CDMA) spread spectrum technology was introduced in the second generation of mobile systems. The Viterbi Decoder (VD) consumes about one third of the power dissipation of the CDMA mobile modem [3]. The VD is used to optimal decode convolutional codes. In this paper, we are focus on the technique which are beneficial for designing a [4][5]low power VD for wireless communication, with the focus on minimizing the number of operations. This paper presents a new Modified RE method, which enables it to be Used for large constraint length VD's, in addition to being simpler and lower in power dissipation than the TB method.

The RE method is based on successive register –exchange operations among states (two origin states and two destination states), which construct a butterfly unit as shown in Fig. 3 [12]. The left shifting of the current state and appending the decoded bit (the bit that causes the transition) to the LSB of the current state, results in the destination state. The TB approach **is** acceptable for trellises with a large number of states. This paper is divided into three sections. First section describes the background of the VD and gives the basic introduction of the TB and Modified RE method. The second section describes the comparative analysis of both techniques based on the experimental result generated by Xilinxbsoftware9.1i.The third section shows the simulation result of Modified RE method and TB method. This paper divided into three main sections. First section describes the background of

Modified register exchange and Trace back method. Second section describes the comparative analysis of Modified RE method & TB method according to the simulation result. Third section shows the final results of both the techniques. Fourth section shows the future work of both techniques. Final section concludes the whole paper.

## II. BACKGROUND

In 1967, Viterbi developed the Viterbi Algorithm (VA) as a method to decode convolutional codes [6][7]. A simple Convolutional encoder with K=3, rate=1/3, and the generator Polynomials: G0=101, G1=111 and G2=111, has the trellis Diagram shown in Fig. 1. To demonstrate the functionality of the VA, a sample input data to the encoder is tracked until it is In Fig. 1, a node is assigned to each state for each time stage. Each branch represents the transition between two states; each branch is assigned a weight, referred to as a branch metric (BM), which is a measure of the likelihood of the transition, given the noise observations. The BM's accumulated along a path form a path metric (PM). For the two branches entering the same state, the branch with the smaller partial PM survives, and the other one is discarded. The BM in the simple hard decision demonstrated in Fig. 2 is simply the number of bits, in which the received and the expected signals differ. The calculation of the BM's and the PM's is required for both the TB and the new RE methods.

#### 2.1. Trace Back

Basically in the Trace Back Method, a trace back module is placed in the decoder which is connected by the ACS unit and RAM Module. The decoding is performed by traversing through the present state and the previous state and finding the potential input that have caused that transition from the previous state to this present state.

At the last stage of the trellis diagrams (see Fig. 2), a trace back, to extract the decoded bits, begins from the state with the minimum PM. Beginning at state SO and going backwards in time, a unique path, shown by the bold line in Fig. 2, is identified. For the TB, if each state from a current time is followed backwards through its maximum likelihood path, all of the paths converge at a point somewhere previous in time [8].

## 2.2. Modified Register Exchange Method

The RE method is based on successive register –exchange Operations among states (two origin states and two destination states), which construct a butterfly unit as shown in Fig. 3 [12].to the LSB of the current state, results in the destination state. The new algorithm proposed in this study uses the 'pointer' concept. Instead of moving the contents of one register to a second register, the pointer to the second register is altered to point to the first register. In the VD, the pointer to the register simply carries the current state. For example, if  $(PM_{t-1}^i + BM_{t-1}^i)$  is greater than  $(PM_{t-1}^j + BM_{t-1}^j)$  then the path from j to p is the survivor path for p. The pointer to register j, which carries the value j, is shifted to the left, and the bit, which causes the survivor path transition (in this case 'O'), is appended to the right. Thus, the pointer that carries the value j will carry the value p, thereby pointing to register p; the decoded bit is appended to the contents of the register, whose pointer value is changed from j to p. It is noted that the register has a fixed physical location; only the value of its pointer changes for  $PM_{t-1}^i + BM_{t-1}^i$  each code word received.

In the new method, the path is handled from the origin states' perspective, unlike the TB method, which handles the path from the destination states' perspective. There can be two survivor paths originating from the same origin state and entering both destination states, as is the case in Fig. 4 at time t=3, where the survivor path for both S0 and S1 originate from S2. This is not a problem for the TB that monitors the path from the Destination states. But for the RE, with a pointer definition, a question arises: which of the two paths should be the survivor path, and which path should terminate? The answer is a new decision bit the ACSU has to produce. It is called the termination bit. For example, if both paths from state j are considered survivor paths for the destination states p and q, then the one with the higher BM receives a termination high signal. This indicates that it is terminated, and no decision bits are appended to its register anymore. The pointer of j changes to the pointer, whose transition contributed to the smaller BM, and the pointer of i changes to the other

destination pointer. By time t= 1 l, three paths terminate, whereas only one survives; this one comes the decoded bits in the last stage. Fig. 5 shows the successive values for the pointers and registers over the time. The issuing of the additional termination bit for each state seems to overload the ACSU conventional operation, but this is not the case. According to the symmetric characteristics of the generator polynomials for the VD used in CDMA systems (G0=557 G1=663, G2=711, K=9) [10]  $BM_t^{i,p}$ , equals  $BM_t^{i,q}$ , and  $BM_t^{j,p}$ , equal  $BM_t^{j,q}$ . By using the reformulation [13] of the traditional ACSU, the comparison of the BM's, which is used to produce the termination bit, is already included in the comparison of the PM'S. In addition, the reformulation presented in [13] has been proven to have lower power dissipation. Thus, the new method mainly modifies the SMU.



Figure 1: Trellis diagram for the convolutional encoder



Figure 2: Trellis Diagram for Trace Back Method



Figure 3: Butterfly structure of the ACS

©IJAET ISSN: 2231-1963



Figure 4: Register content For Register Exchange Method



Figure 5: Modified RE approach with pointer implementation

#### 2.3 Viterbi Decoder

The VD is composed of three functional units

- 1. The Branch Metric Unit (BMU), which calculates the BM's.
- 2. The Add Compare Select Unit (ACSU), which adds the BM's to the corresponding **PM's**, compares the updated PM's, and then stores them in the path metric memory (PMM), while storing the associated survivor path decisions

in the survivor memory unit (SMU).

3. The Survivor Memory Unit (SMU), which stores the survivor path decisions; then a trace-back mechanism, is applied to it, in the case of the TB method.

The SMU is the hottest spot in the VD due to the frequent memory accesses [10]; the ACSU has been considered to be the bottleneck for the speed limitations of the VD [11]. But in the intervening four years memory access speed has become even more of a bottleneck. This research work focuses on modifying the SMU to improve both speed and power dissipation.

## III. COMPARATIVE ANALYSIS OF NEW RE METHOD AND TB METHOD

## 3.1. Simulation Result

In this section we are compared both methods according to simulation result. Simulation result contains device utilization summary, timing diagram, memory occupied and power estimation report. In the comparative analysis of technique according to simulation result, firstly we compare according to the device utilization summary of software result. Secondly, according to timing diagram of both the techniques. Third one according to the memory occupied and power estimation of both the technique.

# 3.1.1. According to the HDL Synthesis Report

This synthesis report is generated after the compilation of both techniques. This report contains about component used and no. of related and unrelated logic. With the help of this report both the techniques are being compared.

 Table 1: Device utilization using trace back method

| Logic Utilization            | Used | Available | Utilization |
|------------------------------|------|-----------|-------------|
| Number used as Flip<br>Flops | 97   | 7,168     | 1%          |

**©IJAET** ISSN: 2231-1963

| Number used as<br>Latches                            | 1     |       |      |
|------------------------------------------------------|-------|-------|------|
| Number of 4 input<br>LUTs                            | 182   | 7,168 | 40%  |
| Logic Distribution                                   |       |       |      |
| Number of occupied Slices                            | 2,395 | 3,584 | 66%  |
| Number of Slices<br>containing only<br>related logic | 2,395 | 2,395 | 100% |
| Number of Slices<br>containing unrelated<br>logic    | 0     | 2,395 | 0%   |
| Total Number of 4 inputs LUTs                        | 3,064 | 7,168 | 42%  |

Table 2: Device utilization using register exchange method

| Logic Utilization                                    | Used  | Available | Utilization |
|------------------------------------------------------|-------|-----------|-------------|
| Number of Slice<br>Flip Flops                        | 1,910 | 7,168     | 26%         |
| Number of 4 input<br>LUTs                            | 2,922 | 7,168     | 40%         |
| Logic Distribution                                   |       |           |             |
| Number of occupied Slices                            | 2,395 | 3,584     | 66%         |
| Number of Slices<br>containing only<br>related logic | 2,395 | 2,395     | 100%        |
| Number of Slices<br>containing<br>unrelated logic    | 0     | 2,395     | 0%          |
| Total Number of 4 inputs LUTs                        | 3,064 | 7,168     | 42%         |

## 3.1.2. According to Timing Diagram

After the synthesis report, the timing diagram generated according to the given input. With the help of timing diagram we are calculating the speed grade, Minimum period, Maximum Frequency, Maximum output required time after clock. Speed grade shows the internal delay due to this outputs are delayed. It shows in the negative form like -4ns. Minimum period means a minimum period for precharge all commands. Maximum frequency shows that the difference between the next period and previous period. Maximum output required after clock means at how much times of clock the highest output get.

For Trace Back Unit

- Speed Grade: -4
- Minimum period: 2.911ns
- Maximum Frequency: 343.525MH
- Maximum output required time after clock: 5.953ns
- Maximum output required time after clock: 35.829ns

For Modified Register Exchange Method

- Speed Grade: -4
- Minimum period: 6.290ns
- Maximum Frequency: 158.983MHz)
- Minimum input arrival time before clock: 3.074ns

• Maximum output required time after clock: 23.829ns

## 3.1.3. According to Memory Occupied and Power Estimation Cost

The new RE implementation reduces the memory requirements of the SMU by half. The number of memory read operations is reduced to one third. Table shows a comparison of the register and memory access operations completed by each method, the TB and the proposed RE in the SMU. The operations listed in Table are the operations needed to decode 48 code words. Modified RE method and TB method. We are compared on the bases of simulation result. The simulation result

**Table 3:** Comparison of the memory/register operations to decode 48 codeword

| Operation                            | ТВ     | New RE      |
|--------------------------------------|--------|-------------|
| Writing decision bits into memory    | 256*48 | ~(256*48)/2 |
|                                      | 12288w | ~6144W      |
| Reading from memory(for trace-       | 48*3   | 48          |
| back)                                | 1444r  | 48r         |
| Writing into the MSB of the          |        | 256*48      |
| pointers                             |        | 12288rg     |
| Writing termination bits into        |        | 256         |
| termination registers                |        | 256rl       |
| 8-bit register shifting (shift right | 48*3*8 |             |
| and append a bit to the LSB)         | 1152rg |             |

Table 4: Estimate cost function for the memory/register operations to decode 48 code words

| Operation                            | ТВ     | New RE      |
|--------------------------------------|--------|-------------|
| Writing decision bits into memory    | 256*48 | ~(256*48)/2 |
|                                      | 12288w | ~6144W      |
| Reading from memory(for trace-       | 48*3   | 48          |
| back)                                | 72w    | 24w         |
| Writing into the MSB of the          |        | 256*48      |
| pointers                             |        | 1536w       |
| Writing termination bits into        |        | 256         |
| termination registers                |        | 32w         |
| 8-bit register shifting (shift right | 48*3*8 |             |
| and append a bit to the LSB)         | 144w   |             |
| Total                                | 12504w | 7736w       |

Where w, r and rg represent the power dissipation cost function for writing one bit into the memory block, reading one bit from the memory block, and writing one bit into a register. The power consumed by the memory cells during the pre charge consumes the dominant part of a cell's power dissipation. The power consumption of one memory cell is also proportional to The number of memory cells in the memory blocks [14].

# IV. RESULTS & DISCUSSION

In this paper we are seeing the comparative analysis for the Modified RE method and TB method. We are compared on the Bases of simulation result. The simulation result consist of device utilization, timing parameters, memory occupied and power estimation cost .This simulation result helps us to provide the correct specification for Viterbi decoder due to this user understand the requirement of his project is fulfilled with this specification .This specification saves the user time and cost effective otherwise without knowing specification user send his data and if it is lost then user information is not received correctly..The simulation results are generated by xilinx9.1i software. According to software result, Modified RE method is best in the form of less memory occupied and less power consumption because it reduced the power and memory occupation just half than the other technique. With the help of this simulation result we can compare the other techniques also with this technique. This

©IJAET ISSN: 2231-1963

comparison is also beneficial because when we generate any new technique you can compare the simulation result of the latest technique with previous and shows the benefits of the latest technique.

## V. CONCLUSION

This study has proposed the effective comparative analysis of Modified Register Exchange Method and Trace Back Method. A power dissipation comparison between both architectures shows an average power reduction of 45 percent for the total Viterbi decoder. The new RE method requires only half of the memory needed by the TB method. The maximum frequency achievable for the SMU is at least three times that of the SMU in the TB method presented in [10], because the read and write operations of the SMU are operating at the data rate frequency; but with the TB, the read operation is at least three times faster than the write operation.

## VI. FUTURE WORK

This paper contains the comparative analysis of the Modified Register Exchange method and Trace back method for Viterbi decoder. With the help of this simulation result in the future we are design the best technique for Viterbi decoder with less power consumption, highly efficient and less memory occupied. By using this simulation result we are providing the correct specification for the next generated highly efficient Viterbi decoder.

## REFERENCES

- [1]. Swati Gupta, Rajesh Mehra, "LowFPGA Implementation of Viterbi Decoder using Trace back Architecture," International Journal of Engineering Trends and Technology- May to June Issue 2011.
- [2]. Xilinx 9.1i, Altera Product "Used for coding the digital system," software released on January 2009.
- [3]. I. Kang and A.N. Willson Jr, "Low-power Viterbi decoder for CDMA mobile terminals," IEEE JSSC, vol.33, no. 3, March 1998, pp. 473-482.
- [4]. A.P. Chandrakansan and R.W. Brodersen, Low power digital CMOS design, Kluwer Academic Publishers, 1995.
- [5]. A. Bellaouar and M. Elamsry, Low power digital VLSI design, Kluwer Academic Publishers, 1995.
- [6]. A.J. Viterbi, "Error bounds for convolutional codes and asymptotically optimum decoding Algorithm," IEEE Transactions IT, vol. It-13, no.2, April 1967, pp. 260-269.
- [7]. G.D. Fomey, "The Viterbi algorithm," Proc. IEEE, vol. 61,no.3, March 1973, pp. 268-278.
- [8]. H.-L. Lou, "Viterbi decoder design for the IS -95 CDMAforward link," Proc. VTC, April 1996, pp. 1346 -1 350.
- [9]. S.B. Wicker, Error Control Systems for Digital Communication and Storage, Prentice Hall, 1995.
- [10]. J.A. Ryu, S.C. Kim, J.D. Cho, H.W. Park and Y.H. Chang, Lower Power Viterbi decoder Architecture with a new clock-gating trace-back unit," Proc. 6th International Conference on VLSI and CAD, Oct. 1999, pp. 297-300.
- [11]. G. Fettweis and H. Meyr, "High-rate Viterbi processor: a systolic amy solution," IEEE Journal on Selected Areas in Communications, Oct.1990, pp. 1520-1534.
- [12]. Y-N. Chang, H. Suzuki, and K.K. Parhi, "A -Mb/s 256-state 10- mW rate- 1/3 Viterbi Decoder," IEEE JSSC, vol. 35, no.6, June 2000, pp 826-834.
- [13]. Chi-Ying Tsui, R.S. -K. Cheng, and C. Ling, "Low power ACS unit design for the Viterbi Decoder," Proc. IEEE ISCS, 1999, pp. 137-140.
- [14]. D. Liu and C. Svensson, "Power consumption estimation in CMOS VLSI chips," EEE JSSC, vol.29, no.6, June 1994,pp. 663-670.
- [15]. Anubhuti Khare, Manish Saxena, Jagdish Patel "FPGA Based Efficient Implementation of Viterbi Decoder," International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 8958, Volume-1, Issue-1, October 2011,

# **AUTHOR'S BIOGRAPHY**

Hardeep Singh was born in Kota, India, in 1989. He received First class Bachelor degree in Electronics & Communication from the University of Rajasthan, Jaipur, in 2009 and currently pursuing his Master degree in Digital Communication from the Rajasthan Technical University; Kota. He has published many papers in international and national conferences. His research interests include VHDL Language, information theory & coding, Digital logic design, wireless communication and mobile communication.



Ramakant Vyas was born in Sikar, India, in 1979. He received First class Bachelor degree in Electronics & Communication from the University of RGPV, Bhopal, in 2000 and the Master degree in Digital Communication from the Jai Narayan Vyas University, Jodhpur, in 2005. Presently pursuing Ph.D. from JNIT Jaipur. He has a 10 years of teaching experience. He has published many papers in international and national conferences. His research interests include DSP with Mat lab, VHDL Language, information theory & coding, Digital logic design, wireless communication and mobile communication.



**Deepak Raghuvanshi** was born in Kota, India, in 1984. He received First class Bachelor degree in Electronics & Communication degree from the University of Rajasthan, Jaipur, in 2008 and currently pursuing his Master degree in Digital Communication from the RGPV, Bhopal. He has published many papers in national conferences. His research interests include VHDL Language, information theory & coding, Digital logic design, wireless communication and mobile communication, Image Processing.

