

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING INDIAN INSTITUTE OF INFORMATION TECHNOLOGY, DESIGN AND MANUFACTURING KANCHEEPURAM CHENNAI - 600127

Synopsis Of

# An Analytical Approach to Design Energy -Efficient Arithmetic Circits for Error -Tolerant Multimedia Applications

A Thesis

To be submitted by

### S SKANDHA DEEPSITA

For the award of the degree

Of

DOCTOR OF PHILOSOPHY

# 1 Abstract

Approximate hardware for arithmetic operations is observed to be a promising solution for obtaining energy efficiency with restricted resource budgets for error tolerant applications. This work propose solutions for approximate arithmetic circuits and multiplyaccumulate units using analysis of critical signals in binary arithmetic functions. The proposed designs have advantage of noteworthy energy - error trade off without any error detection/correction logic. This work proposes three kinds of approximate adders each with a different strategy. Approximate multiplier AXM1 based on constant-0 or constant-1 v : 2 carry compressors are proposed. Approximate multiplier AXM2 is based on best-fit constant for v : 2 compressors basing on the column height in the tree. This work also proposes three variants of approximate Sum-of-Products (AXSOP) and Product-of-sums (AXPOS) and MAC (AXMAC) circuits with only approximate adder, only approximate multiplier and both approximate adder and multiplier.

The proposed 32 - bit AXHA with k = 12 has 52% energy savings for a mean error distance of only -3.2. The proposed approximate multiplier (AXM2) has at least 11% to 40% energy savings for the DA2 approach and 92% for the DA1 approach. The  $8 - bit AXSOP_3$  with k = 8 and  $AX_L = 2$  has energy savings of 70% for MED of 24. Similarly,  $8 - bit AXPOS_3$  with k = 4 and  $AX_L = 2$  has 47% energy savings with MED of 99 and  $8 - bit AXMAC_3$  has 60% energy savings and MRED of 0.029. The proposed AXMAC applied on Gaussian filtering of images leads to SSIM of 0.93 and the PSNR reduction is only 0.5dB compared to accurate filtering.

### 2 **Objectives**

- Analyze the behavior of various signals in binary arithmetic functions to understand the bottlenecks for achieving high speed and low energy.
- Speculate the scope of approximation in binary arithmetic functions and circuits by discerning the critical and non-critical parts in each of them.
- Design of approximate binary arithmetic circuits by considering the insights from analysis and careful tweaking into the noncritical parts of the circuits.

# 3 Existing Gaps Which Were Bridged

The gaps in the existing work are of two folds as follows

- The existing approximate arithmetic circuits mostly belong to either 'fail rare' or 'fail small' domains. On the other hand, the applications where the approximate circuits are employed decides the error metrics to be considered for fair evaluation of the designs.
- In order to meet the error bounds, the existing approximate circuits have errordetection, error-correction circuits along with the approximate circuits. This can be an impediment to the amount of energy saved from the approximation.

This brings the opportunity wherein, the accuracy lost can be reasonable enough with noticeable energy savings without error detection/correction logic circuits.

# 4 Contributions

This work focuses on design of energy efficient binary arithmetic circuit with the aid of the observations drawn from the analysis of binary arithmetic functions. The proposed theoretical approximation strategy can be applied to design any approximate circuit with satisfactory accuracy and energy savings.

### 4.1 Devised Approach for Designing the Approximate Circuits

To approximate any arithmetic function *i*) Analyze the power hungry and speed limiting signals of the arithmetic function. *ii*) Analyze and tap the signals that contribute very minimal to the golden outputs of the arithmetic function. *iii*) The common patterns of such two categories of signals shall be drawn and finalize the approximable signals to bring the perfect trade-off between energy reduction and accuracy.

### 4.2 Binary Arithmetic Functions - Key Observations

#### 4.2.1 Binary Addition

The *carry* signal is important for obtaining the near accurate result and also reason for high delay in the circuits and is the major focus of analysis of binary addition.

- Total number of unique carry vectors obtained for an n bit binary addition are  $2^n$  i.e., all the even numbers in range of  $[0 \ (2^{n+1}-2)]$ .
- In general for an n bit binary addition, the number of unique carrys  $N_{dc}$  that repeat  $3^x$  number of times given by  ${}^nC_x$ .
- The carry vector of all zeros occurs has maximum probability of occurrence and the number of occurrences is given by  $3^n \forall n$ . The most critical carry is zero; this should never be compromised during approximation of addition.

#### 4.2.2 Binary Multiplication

In multiplication, the partial product tree and the traversal of carry signal through the tree is the major source of delay. The Partial Product Bit Array [PPA] matrix for n-bit multiplier can be represented with  $n \times 2n$  dimension.

• For any column j of n - bit multiplier, the number of non zero bits in that column  $h_j$  or the column height is denoted as  $bi_c[j]$  is given by Equation (1).

$$bi_c[j] = \begin{cases} j+1 & \forall (0 \le j \le Piv_{col})\\ 2n - (j+1) & \forall (Piv_{col} < j \le 2n-1) \end{cases}$$
(1)

- The number of unique carry vectors in multiplication for any n can be given by  $N_{ucm} = 1.266n \times 3^{\frac{2n-5}{2}}$ .
- The maximum number of repetitions or most probable carry vector ψ is only one and that is found to be carry vector of all zeros i.e., ψ(j) = 0 ∀j and ψ(j) = even > ψ(j) = odd ∀j for any n. This needs to be retained while approximating the multiplier so that the error can be in control to obtain good output quality.

# 4.3 Design and Implementation of Energy-Efficient, Error-Resilient Architectures for Binary Addition

Optimization of energy with error in bounds are the major focus of this work.

#### 4.3.1 Accuracy Configurable Prefix Adders (AXPA)

The **design idea** of AXPA is to exploit the property of symmetrical nature of prefix trees to build circuit that can operate as n - bit approximate adder and n/2 - bit accurate adder as shown in Figure 1a. The carry vector of all zeros which is the **most critical carry** for binary addition has the probability of occurrence as 0.079102 for AXPA. The proposed approximate prefix structures have at least 50% energy savings. The **Mean error distance** for the proposed approximate prefix adder of any bit size N is given by  $-2^{\frac{n-4}{2}}$ . The **minimum error distance** is  $-2^{n/2} - 2$  and the **maximum error distance** is  $2^{n/2}$  for AXPA.

#### 4.3.2 Approximate Constant Adder (AXCA)

The **design idea** is to select a constant carry vector instead of carry computation on the go and hence the carry logic is completely eliminated. This work explored and analyzed the error behavior of constant carry vectors for approximation from which the selection of constant carry is done. The average **energy savings** are atleast 75% with **minimum** error of  $-2^n$  and **maximum error** of  $2^{n+1} - 2$  with a constant MED of -1/2.

#### 4.3.3 Approximate Hierarchical Structure Based Adders

The **proposed approximate full adders** are designed by analysis with the target that any 3 out of 8 combinations can be erroneous is performed. One of the best approximations is selected from the 56 designs, which has low power and has fewer gates in the critical path and few errors. The **approximate multi-bit adder AXHA** design aims to obtain both positive and negative error distances and circuit is shown in Figure 1b. Total **number of unique carry vectors** reduced from  $2^n$  to  $2^{n-4}$  for k = n. The carry vector of all zeros which is the **most critical carry** has the probability of occurrence as 0.316 for AXHA. The (energy savings, MED) pairs for ApproxADDv1, ApproxADDv2 of Dutt *et al.*, HERLOA Seo *et al.*, COREA Seok *et al.* are(42%, 1023), (15%, -4), (50%, -1024), (47.7%, -702) and proposed AXHA is (52%, -3.2).



Figure 1: Proposed Approximate Adder Circuits

The proposed AXPA has acceptable minimum and maximum error, medium energy savings; AXCA has constant MED, high energy savings; AXHA is the trade-off with efficiency in range of error distances, non-constant MED, medium energy savings.



Figure 2: 4 - bit approximate multi- Figure 3: 8 - bit multiplication with<br/>plier AXM1 - Type-1AXM1 - Type-1,  $Ax_L = 2$ 

### 4.4 Design and Implementation of Energy-Efficient, Error-Resilient Architectures for Binary Multiplication

This work proposes novel *approximate constant carry compressors* which can be utilized for column reduction in multipliers. Two approximate multipliers (AXM1, AXM2) based on the constant carry are proposed. The approximation factor  $AX_L$  is considered while building recursive multipliers.

#### **4.4.1 Approximate Multiplier** *AXM*1

The **design idea** is that approximate carry from any compressor is considered to be constant irrespective of the input value. The utilization of these novel compressors in the multiplier will lead to zero propagation length i.e., there will be no carry propagation from LSB to MSB. Since the value of constant carry can be zero or one, this work proposes 4 - bit approximate multipliers of Type - 1 with carry chain of all zeros as shown in Figure 2 and Type - 2 with carry chain of all ones. The 8 - bit multiplier can be built from the accurate/approximate 4 - bit multipliers (R1, R2, R3, R4), where the intermediete results are added using CSA reduction in this work as shown in Figure 3.

The Type - 2 with  $Ax_L = 1$  has **MED** of 40 with 30% energy savings; However, the proposed Type - 1 has MED of only 7 with 27% energy savings and same with other  $AX_L$ ; so Type - 1 is preferable over Type - 2. The Type - 1 with  $Ax_L = 4$  has 49% energy savings and MED as 1919. Similar configuration of existing M1, M2Ansari *et al.* has only 21%, 30% energy savings with MED of 1081 and 2794 respectively. The UBAM1 design of Fang *et al.* has only 16.45% energy savings MED of 7.6 i.e. less energy savings for similar error metrics. Application of integer DCT and the PSNR reduction, SSIM are found to be 7dB, 0.99 for  $AX_L = 1$  and 11dB, 0.85 for  $AX_L = 4$  respectively.

#### **4.4.2** Approximate Multiplier *AXM*2

The **design idea** is to cater to the need of devising an approximation strategy that does not change the effect of each bit/row/column on the multiplication result after approxi-





Figure 4: Proposed Approximate Multiplier Figure 5: Recursive Multiplication (DA1 Approach). (DA2 Approach)

mation and deliver acceptable error profiles as well. The proposed approximate multiplier aims at non-exponential increase of error and energy as the n increases.

The design idea for **DA1 approach** is that, if the actual PP bits are approximated, then the carry bits are also inherently approximated, if only the carry bits obtained in each column are approximated, this does not affect the actual PP bits. The approximation in carry bits is done in such a way that property of even/odd carry bits is retained after approximating the number of carry bits. The computation of approximate carry effect is purely based on multiplier size(n) and column heights  $(bi_c(j))$  which are not dynamic. The **DA2 approach** is that the n - bit multiplier is built from recursively using four n/2 - bit multipliers. The DA2 approach gives the flexibility for tuning of accuracy required based on approximation of any one/two/three or all of the four sub-multipliers (R1,R2,R3,R4). The (energy savings- MRED) pairs for 8 - bit designs M1, M2 of Ansari *et al.*, Waris *et al.*, PM1,PM2, PM3 of Ahmadinejad and Moaiyeri are (51%,0.06), (57%,0.08), (53%,0.3), (47%,0.02), (96%, 0.32), (75%, 0.04). The proposed DA1, DA2 with  $AX_L = 4$  has (94%, 0.3),(71%, 0.08). The proposed AXM2 with DA1 and DA2 approaches are utilised for **integer DCT** and it is observed that the PSNR is 29.35 for DA1 approach whereas 32.4 for DA2 approach.

### 4.5 Design and Implementation of Energy-Efficient, Error-Resilient Architectures for MAC, SOP, POS

All DSP operations like convolution for filtering, dot product, matrix operations require either Multiply-Accumulate (MAC) or Sum-of-Products (SOP) / Products-of-Sum (POS) circuits to realize the operations. SOP is the four input and single output function with two multiplications and one addition i.e., SOP(out) = ab + cd. POS is the four input and single output function with one multiplication and two additions i.e., POS(out) = (a + b) \* (c + d). MAC is the three input single output function with previous state output as third input i.e.,  $MAC(out_i) = ab + out_{i-1}$ . The main problem while designing approximate circuits that combine approximate adders and multipliers is that the error can get added up leading to high quality degradation of the outputs. So, the main target of the proposed arithmetic circuits is to have low error profiles even after cascading them to build MAC/SOP/POS.



Figure 6: Conventional n - bit Recursive Multiplier Circuit



Figure 7: Modified n-bit Recursive Multiplier Circuit

#### 4.5.1 Modified Recursive Multiplication Scheme

The **design idea** is to optimise the conventional recursive multiplication circuit by modifying the arrangement of adders and retaining the computations as much parallel as possible to reduce the path length and also low complexity at the same time as shown in Figure 6, Figure 7

This modified reduction circuit is used for the design of approximate SOP, POS, MAC circuits because, the proposed approximate multipliers have only single stage in the circuit and this can be taken advantage for reducing complexity of this recursive multipliers. The combination of proposed approximate adders, multipliers and modified recursive multipliers are utilized in designing MAC, SOP, POS.

#### 4.5.2 Approximate Sum of Products (AXSOP), Product of Sums (AXPOS)

The  $AXSOP_1$  &  $AXPOS_1$  circuits have accurate multiplication sub circuits and proposed approximate addition circuit (AXHA) along with modified reduction scheme. The energy saving for  $AXSOP_1$  is atleast 45% with the MED less than 100 and energy savings are at least 60% for MED of 3000. Among all the variants, the least positive MED is found to be 5 for an energy savings of 30%. The energy savings for  $AXPOS_1$  are atleast 30% with the MED less than 100 and energy savings are atleast 45% for MED of -226 when compared to accurate POS. Among all the variants, the least positive MED is found to be 2 for an energy savings of 27%.

The  $AXSOP_2$  and  $AXPOS_2$  are the circuits with accurate adder and approximate Multiplier. The approximation factor  $AX_L$  of the multiplier can effect the architecture of the SOP/POS circuit. The approximate SOP/POS circuits possible are 14 due to  $AX_L$  having four values and all variants are explored in this work. The optimization obtained due to the  $AX_L = 4$  of approximate multipliers are that, the two 2-op adders are eliminated, and instead of the 4-op adder, 2-op adder is enough for the computation of intermediate results r1, r2, r3. This huge reduction of adders reduces the hardware complexity, switching activity and also the delay in the second stage of the circuit. The  $AXSOP_2$  design has energy savings of 50% for the MED less than 100 and 68% for MED of 1306 which is better than  $AXSOP_1$  with k = 16. The  $AXPOS_2$  design has average energy savings of 30% for MED less than 100 and 54% for MED of 991. The  $AXSOP_3$ ,  $AXPOS_3$  are the approximate SOP, POS architectures with proposed approximate adders and proposed approximate multipliers plugged in together. The approximation factor of the adder only modifies the inner circuitry of the adder but does not effect the connecting circuits as it has the same number of output vectors as the accurate counterparts. The approximate multiplier has one vector output unlike the two vectors for accurate counterparts. So, the  $AX_L$  also effects the other parts of the circuitry and hence can be taken advantage to reduce the overall complexity and delay.  $AXSOP_3$  has energy savings of 55% for MED of -87. The maximum energy savings is 85% for MED of 4423; and energy savings of 75% for MED of 24.  $AXPOS_3$  has minimum energy savings of 38% for MED of -3. The maximum energy savings is 69% for MED of 2681; and energy savings of 50% for an MED of 101.

#### 4.6 Proposed Approximate MAC AXMAC

This thesis proposes an energy-efficient approximate multiply-accumulate (MAC) unit by utilizing the approximate multipliers and adders as building blocks along with modified reduction scheme and conditional incrementor. AXMAC with i) only approximate adder  $(AXMAC_1)$ , ii) only approximate multiplier  $(AXMAC_2)$  iii) both approximate adder and multiplier  $AXMAC_3$  are designed and evaluated. The n - bit inputs a and bare given to the n - bit multiplier. Then the adder will have one input as 2n - bit and other input as  $2n + (log_2m_c) - bits$ . The conditional +1 is the incrementor that acts as buffer if the carry input is zero and increment if the carry input is one. This addition can be performed by considering one 2n - bit adder and  $log_2m_c - bit$  incrementor.



Figure 8: Approximate MAC with approximate adder and approximate multiplier  $AXMAC_3$ 

The proposed and existing designs are simulated for 1million random inputs with  $m_c = 8$ . For a MAC with  $m_c = 8$  and 8 - bit inputs, the maximum possible value out of this MAC would be 520200. The proposed  $AXMAC_3$  with any k, has energy savings ranging from 40% to 60% for different  $Ax_L$  when compared to accurate with MED between -358 and 4305. The deviation i.e., MED for  $AXMAC_3$  with k = 8,  $AX_L = 4$  is only 0.7% of the maximum possible value for 60% energy savings. The deviation i.e., MED for  $AXMAC_3$  with k = 12s,  $AX_L = 4$  is only 0.8% of the maximum possible value for 57% energy savings. The (Energy savings, MRED, MED) tuple 8 - bit MAC for design of Adams *et al.* with k = 8 is (26.9%, 0.1, -15048), for proposed  $AXMAC_3$  with similar configuration has (57%, 0.01, 3663).

The sign errors for 8 - bit MAC with k = 12 and  $AX_L = 2$  are evaluated and there is a probability of 0.3 for negative results to be interpreted as positive when opposite

signed inputs are given. Similarly, there is very low probability of 0.007 for negative values shown as zeros in the final results of MAC when inputs are opposite signed. However, when inputs are same signed and negative, there is a possibility of positive numbers shown as negative and this probability is found to be 0.3.

The proposed  $AXMAC_3$  is utilized for Gaussian smoothing of images and evaluated the output image quality. The initial input image is inserted with Gaussian noise, and then the noisy image is filtered using  $5 \times 5$  Gaussian kernel. The approximately filtered images have a very high SSIM of 0.93. Also, the reduction of PSNR compared to accurate is 0.5 to 1dB only.

# 5 Conclusions

This work presented the scope and methodology of approximating binary adders, multipliers and MAC for image processing applications. The delay of proposed binary adder (AXHA) is independent of bit size, and hence the order of delay is constant. The proposed design's energy savings over ApproxADDv1, ApproxADDv2, and HERLOA are approximate 15.9%, 40.1%, and 2.5%, respectively. The proposed AXHRCA has power and energy savings of 22% and 52%, respectively, with a very low Mean Error Distance of 3.2, which depicts the perfect trade off between the error tolerance for almost half energy reduction. The proposed 4 - bit multiplier consumes only 1.3 fJ of energy which is has 87.9%, 78%, 94%, 67.5% less when compared to M1, M2, LxA, MxA designs respectively. The increment in delay, power and energy are not exponential with increment of multiplier size (n) and the maximum error combinations lie in the error distance is within 5% of the maximum value possible for a particular multiplier of size n. These error metrics for proposed designs are within the acceptable ranges and the energy savings are also prominent i.e., 51% to 96%.  $AXSOP_3$  has energy savings of 55% for MED of -87. The maximum energy savings is 85% for MED of 4423; and energy savings of 75% for MED of 24.  $AXPOS_3$  has minimum energy savings of 38% for MED of -3. The maximum energy savings is 69% for MED of 2681; and energy savings of 50% for an MED of 101. The deviation i.e., MED for  $AXMAC_3$ with k = 8,  $AX_L = 4$  is only 0.7% of the maximum possible value for 60% energy savings. The deviation i.e., MED for  $AXMAC_3$  with k = 12,  $AX_L = 4$  is only 0.8%of the maximum possible value for 57% energy savings. However, for existing design with same configuration, it is 7.6% of maximum value. The approximately filtered images with  $AXMAC_3$  have a very high SSIM of 0.93. Also, the reduction of PSNR compared to accuracy is 0.5 to 1dB only. The proposed approximate circuits which are error resilient and energy efficient are useful in futuristic application specific hardware for video applications.

# 6 Organization of the Thesis

The proposed outline of the thesis is as follows:

- Chapter 1: Introduction
- Chapter 2: Related Work
- Chapter 3: Binary Arithmetic Functions Analysis and Scope for Approximate Circuit Design

- Chapter 4: Design and Implementation of Energy-Efficient, Error-Resilient Architectures for Binary Addition
- Chapter 5: Design and Implementation of Energy-Efficient, Error-Resilient Architectures for Binary Multiplication
- Chapter 6: Design and Implementation of Energy-Efficient, Error-Resilient Architectures for Multiply-Accumulate MAC, Sum-of-Products (SOP), Product-of-Sums (POS)
- Chapter 7: Conclusion and Future Scope

# 7 List of Publications

### I. REFEREED JOURNALS BASED ON THE THESIS

- 1. Skandha Deepsita S., Noor Mahammad Sk., *Low power, high speed approximate multiplier for error resilient applications* Elsevier Integration, *Volume* 84, Pages 37-46, (2022).
- 2. Skandha Deepsita S, Dhayalakumar M, and Noor Mahammad SK., *Energy Efficient Error Resilient Multiplier using Low Power Compressors* ACM Transactions on Design Automation of Electronic Systems, *Volume 27, Pages 1-26, (2022).*
- 3. Skandha Deepsita S, Noor Mahammad SK., *Design and Analysis of Binary Hierarchical Adders for Error-Tolerant Applications* Under Review - Journal of Circuits, Systems and Computers
- 4. Skandha Deepsita S, Noor Mahammad SK., *Energy Efficient Logarithmic Delay Adders for Error-Tolerant Applications* Under Review Wiley International Journal of Circuit Theory and Applications
- 5. Skandha Deepsita S, Noor Mahammad SK., *Energy Efficient Constant Delay and Constant MED Adders for Error-Tolerant Applications* Submitted to Wiley International Journal of Circuit Theory and Applications
- 6. Skandha Deepsita S, Noor Mahammad SK., *Energy Efficient Multiply-Accumulate Unit using novel recursive multiplication for Error-Tolerant Applications* Under Review Elsevier Integration Journal

# II. PRESENTATIONS/PUBLICATIONS IN CONFERENCES BASED ON THE THESIS

1. S. S. Deepsita and N. Mahammad Sk Energy Efficient Binary Adders for Error Resilient Applications 2019 IEEE Conference on Modeling of Systems Circuits and Devices (MOS-AK), 23-28, (2019).

#### III. PRESENTATIONS/PUBLICATIONS IN CONFERENCES (Others)

- 1. SS Deepsita, K Divya, Noor Mahammad Sk. Energy Efficient and Multiplierless Approximate Integer DCT Implementation for HEVC 29th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI SOC), Page 1-6,, (2021).
- 2. R. Nirosha, S Skandha Deepsita, Noor Mahammad Sk. An Approximate Discrete Hadamard Transform for Energy Efficient Multimedia Processing *IEEE 1st International Conference on Energy, Systems and Information Processing (ICESIP)*, Page 1-5, (2019).

# 8 Chip Tapeouts

- 1. Successfully Taped out *Approximate Multiplier* using Skywater130nm for Efabless MPW3 Tapeout Shutte on 25/07/2022.
- 2. Submission of *Approximate Hierarchical Adder* using Skywater130nm for Efabless MPW8 Tapeout Shuttle.
- 3. Submission of *Approximate SOP, POS* using Skywater130nm for Efabless MPW8 Tapeout Shuttle.

# References

- 1. Adams, E., S. Venkatachalam, and S.-B. Ko (2019). Energy-efficient approximate mac unit. *In 2019 IEEE International Symposium on Circuits and Systems (ISCAS)*. IEEE.
- 2. Ahmadinejad, M. and M. H. Moaiyeri (2021). Energy-and quality-efficient approximate multipliers for neural network and image processing applications. *IEEE Transactions on Emerging Topics in Computing*.
- 3. Ansari, M. S., H. Jiang, B. F. Cockburn, and J. Han (2018). Low-power approximate multipliers using encoded partial products and approximate compressors. *IEEE journal on emerging and selected topics in circuits and systems*, **8**(3), 404–416.
- 4. Dutt, S., S. Nandi, and G. Trivedi (2018). Analysis and design of adders for approximate computing. *ACM Transactions on Embedded Computing Systems (TECS)*, **17**(2), 40.
- 5. Fang, B., H. Liang, D. Xu, M. Yi, Y. Sheng, C. Jiang, Z. Huang, and Y. Lu (2021). Approximate multipliers based on a novel unbiased approximate 4-2 compressor. *Integration*.
- 6. Seo, H., Y. S. Yang, and Y. Kim (2020). Design and analysis of an approximate adder with hybrid error reduction. *Electronics*, **9**(3), 471.
- 7. Seok, H., H. Seo, J. Lee, and Y. Kim (2021). Corea: Delay-and energy-efficient approximate adder using effective carry speculation. *Electronics*, **10**(18), 2234.
- 8. Waris, H., C. Wang, W. Liu, J. Han, and F. Lombardi (2020). Hybrid partial product-based high-performance approximate recursive multipliers. *IEEE Transactions on Emerging Topics in Computing*.