### [Public] **F-TFM: Accelerating Total Focu**

#### INTRODUCTION

The Total Focusing Method (TFM) is a specialized ultrasound imaging algorithm employed for nondestructive testing in various industries. It finds applications in material science, aerospace, and beyond. The TFM Imaging System utilizes a 01D/2D ultrasonic phased array, incorporating Full/Half Matrix Capture for comprehensive data acquisition. The system also includes a dedicated post-processing processor to enhance the quality of imaging results. This integrated approach makes TFM an effective and versatile solution for precise imaging in non-destructive testing across different sectors, ensuring its relevance in fields such as material science and the aerospace industry.



| cusing Method                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | l on FPGA                                                                                                                                                                  | <b>OpenHW20</b>                                                                                                                | 23                                                                                        |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| <ul> <li>Workflow organization</li> <li>Real-time pixel-wise DLC         <ul> <li>Reduce more than 90% memory access</li> <li>DPP-PIT Dataflow</li> </ul> </li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | $\begin{tabular}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                   |                                                                                                                                |                                                                                           |
| Reduce the context switching overhead      Colculate once & Look-up Table     Configs     Configs     Configs     Configs     GPU Kernel 1     GPU Kernel 2     GPU Kernel 1     GPU Kernel 2     GPU Kernel | c)<br>Configs Unformed Configs<br>Configs Unformed Configs<br>Display<br>Phased Array Dataflow Stage 1 Stream in Image Out<br>pipeling Preload Config                      | C R R Summ<br>E E S of its a<br>depth<br>encon                                                                                 | • <b>TFM (F</b><br>narized<br>algorith<br>nanalys<br>npassin                              |
| <ul> <li>F-IFM SO</li> <li>Overall accelerator architecture des</li> <li>Parallel compute units in DPP and PIT mc</li> <li>In-DDR ping-pong buffer.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | ign<br>bdules.<br>Workflow Design<br>Configs Uniter Real-time Colculate<br>(released<br>Phased Array Doteflow Stage 2 Disp<br>Phased Array Doteflow Stage 2 Stream N Image | L design<br>L high-p<br>syster                                                                                                 | າ, and p<br>ວerforn<br>n with                                                             |
| Data Pre-processing (DPP) Module<br>Dispatcher<br>FFT+ELE+IFFT+<br>SUR P-way Compute Units STW<br>Ultrasonic Input Data<br>Ping-pong                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Pixel Interpolation (PIT) Module FPGA<br>Combination Batch Controller<br>S-sway Proc. Units J<br>HR<br>BHRW<br>BHRW<br>Adder Tree<br>CR<br>BHRW<br>Configuration<br>DDR    | DES<br>I<br>100<br>100<br>100<br>100<br>100<br>100<br>100                                                                      | 65.3                                                                                      |
| <ul> <li>Accelerator architecture design</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                            | N                                                                                                                              | Throu                                                                                     |
| Data Pre-processing (DPP) Module                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | row-wise<br>Adder tree BPRW<br>Pixel Interpolation (PT) Module FPGA<br>Combination Bach Controller<br>5-wey Proc. Units                                                    | G<br>NORMAL<br>Format Platform<br>RTX 3080Ti<br>FMC Jetson TX1<br>F-TFM (Ours)<br>RTX 3080Ti<br>HMC Jetson TX1<br>F-TFM (Ours) | JZED EI<br>alu-1d<br>1.00×<br>1.49×<br><b>34.13</b> ×<br>1.00×<br>2.13×<br><b>67.77</b> × |
| TK <sup>2</sup> TK <sup>2</sup> Pway Compute Units                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                            | E.                                                                                                                             | norau                                                                                     |

F-TFM Solution ——Hardware Design

**Ping-pong Buffer** 

Ultrasonic Input Data

Output Image Configuration

Bizhao Shi, Jieran Zhang, and Guojie Luo

Peking University, Beijing City

## **OpenHW2023**



On board test by AMD KU19P

# R Ξ S U

The F-TFM (Full Total Focusing Method) is

summarized through a comprehensive examination of its algorithm and workflow. This includes an indepth analysis of its efficient accelerator design, encompassing workflow organization, dataflow design, and parameter optimizations. The result is a high-performance and high-efficiency TFM imaging system with notable scalability.



#### NORMALIZED ENERGY EFFICIENCY COMPARISONS

| Format | Platform     | alu-1d         | com-1d          | obl-1d            | imm-1d             | alu-2d            | ani-2d            |
|--------|--------------|----------------|-----------------|-------------------|--------------------|-------------------|-------------------|
| FMC    | RTX 3080Ti   | $1.00 \times$  | 1.00 	imes      | $1.00 \times$     | 1.00 	imes         | $1.00 \times$     | $1.00 \times$     |
|        | Jetson TX1   | $1.49 \times$  | $0.69 \times$   | $1.55 \times$     | $0.85 \times$      | $1.98 \times$     | $1.86 \times$     |
|        | F-TFM (Ours) | 34.13 	imes    | f 47.15	imes    | <b>19.54</b> imes | 46.63 	imes        | <b>14.26</b> imes | <b>15.32</b> imes |
| нмс    | RTX 3080Ti   | $1.00 \times$  | $1.00 \times$   | $1.00 \times$     | $1.00 \times$      | $1.00 \times$     | $1.00 \times$     |
|        | Jetson TX1   | 2.13 	imes     | $0.96 \times$   | 1.90 	imes        | $0.98 \times$      | 1.93 	imes        | $1.85 \times$     |
|        | F-TFM (Ours) | $67.77 \times$ | $108.20 \times$ | $39.95 \times$    | <b>147.21</b> imes | <b>26.48</b> imes | $26.86 \times$    |

**Energy efficiency of F-TFM**