After exploring multiple potential system architectures, we decided to base our platform on the cluster computer model popularized by the Beowulf cluster concept (http://www.beowulf.org/). This choice was made as Beowulf clusters have proven to support low-cost, scalable, high performance computing applications for over ten years. They also enjoy widespread use in academia. Given this decision, we needed to determine how to add the “reconfigurable” component to the cluster. To address this issue we needed to select both the FPGA hardware and the programming environment.
The Beowulf philosophy of using low-cost commodity computing components led us to select standard FPGA development boards as the platform of choice. These boards are typically mass produced by FPGA vendors to allow developers to gain experience with their latest FPGAs. They are often sold at, or below the bill of materials cost. They typically contain FPGAs with embedded processors, a variety of memory and interface options, and a high speed Ethernet interface; all of which are needed for our application. We chose the Xilinx University Program (XUP) Virtex 2 Pro development board as our reconfigurable computing node. The board is low cost and ubiquitous so other universities can easily duplicate our configuration and research. When we purchased these boards the FPGAs represented the state of the art.The board is versatile enough to support many potential research projects. The FPGA contains two PowerPC processors that can be used to run the operating system and application software. The board also supports up to 2 GB of Double Data Rate (DDR) SDRAM and various Flash ROM options. The board supports multiple interfaces including low-speed parallel, high-speed differential parallel, and multi-gigabit serial interfaces that utilize low cost IDE and SATA cables. These can be used to explore alternatives for directly connecting multiple development boards together. We plan to upgrade to newer FPGA development boards periodically in the future. As later noted, the Impulse C programming tools we selected are intrinsically device independent so migrating to Virtex-4 or -5 boards will be an incremental effort rather than a restart.
The XUP boards are the computing nodes of the cluster. The remaining hardware components, shown in Figures 1 and 2, are typical for a cluster computer. They include:

The resulting cluster is homogeneous, featuring identical architectures at each node. There is also an argument for heterogeneous computing where reconfigurable cluster become part of a larger grid comprised of standard microprocessors. In this configuration the reconfigurable cluster would act as a configurable resource on the grid.
System Software
While one group assembled the hardware, other researchers worked on developing the software environment and test applications for the cluster. The software environment consists of three major pieces:
We are currently working with both the QNX (http://www.qnx.com/) and MonteVista Linux (http://www.mvista.com/) for the operating system. We plan to use a minimal version of MPI as our initial parallel programming environment. After experimenting with various system design languages, we have chosen the Impulse C language (http://www.impulsec.com/) to develop our reconfigurable applications. Our choices for the operating system and MPI are based on both our familiarity with them and the fact that they are used by a wide variety of standard clusters. We chose Impulse C for several reasons:
Results of initial applications are very encouraging. To test the performance of the cluster, we implemented a sonar application that we had previously used on the SRC-6e computer. On the SRC-6e, we used a mix of Carte C code and hand-coded VHDL to obtain a speedup of 65 times faster than was available using a 1.8 GHz Pentium processor. Using hand-coded VHDL, we were able to match this speedup with just one XUP board. Using three boards nearly tripled the speedup.
Using Impulse C the preliminary results show that we should be able to obtain the same speedup. Though pending refinement we’re using more boards to do so. Currently we have the application split between two boards, but we are using a combination of hardware and software to communicate between the boards. This communication method will need to change to achieve the desired speedup. The inter-board communication will need to be implemented at the hardware level. Impulse C supports this by allowing the user to define custom interfaces. We are in the process of implementing several communication methods in VHDL. Once these interfaces have been defined, the rest of the application, and other similar applications, should require no additional VHDL knowledge.
The reconfigurable computing cluster was assembled for a cost of less than $10,000. Initial testing demonstrated that for applications with significant non sequential processing elements, large I/O requirements and less need for closely coupled memories, the cluster can provide performance matching or exceeding that obtained from a much more expensive commercial reconfigurable computer. As more hardware and software components are developed for the cluster, it is anticipated that it will be easier to port existing parallel programs to the cluster and take advantage of the significant increases in computing power provided by the FPGA computing elements.
About the Author
Associate Professor Russ Duren of Baylor University. Russell W. Duren (S’76–M’78–SM’96) received the B.S. degree in electrical engineering from the University of Oklahoma, Norman, in 1978 and the M.S. and Ph.D. degrees in electrical engineering from Southern Methodist University, Dallas, TX, in 1985 and 1991, respectively. He spent 17 years in industry. The majority of this time was spent designing avionics at the Lockheed Martin Aeronautics Company, Fort Worth, TX. After that, he spent seven years teaching and performing research in the fields of avionics and reconfigurable computing at the Naval Postgraduate School, Monterey, CA. Currently, he is an Associate Professor in the Department of Electrical and Computer Engineering, Baylor University, Waco, TX. He is the author of over 30 publications. His research interests include avionics, embedded systems, FPGA digital design, and reconfigurable computing. Dr. Duren is the recipient of the 1991 Frederick E. Terman Award for Outstanding Electrical Engineering Graduate Student from Southern Methodist University, the 1991 Myril B. Reed Outstanding Paper Award from the 34th IEEE Midwest Symposium on Circuits and Systems, and the 2002 Naval Postgraduate School Award for Outstanding Instructional Performance.
December 9, 2008