## A Corner Turner Architecture by Wilbert Lynn Urry

A corner turner is a device used to turn serial digital data into parallel data. It is useful in extending the number of channels in a fast Fourier transform<sup>1</sup> machine or distributing data from a large array of antennas in an FX or direct imaging system. It can drastically reduce the cabling requirements for an FX architecture<sup>2</sup>. The usual implementation of a corner turner consists of an array of shift registers configured at 90 degrees to each other consisting of  $2*N^2$  cells where N equals the size of the corner turner. Each cell is large enough to hold one word. I describe an architecture that can use as little as NlogN switches and no storage to accomplish the turn.

The general approach will be clearer if we consider a small eight by eight corner turner. The traditional approach is illustrated in the figure.



Corner Turner

Serial data enters from the left from, for instance, antennas a through h. Each antenna produces a sequence of frequency samples 0 through 7 that enter the horizontal shift registers. (The data is serial so each shift register cell is long enough to hold a frequency word width.) When the shift register is filled the data is transferred to the vertical shift registers below. The vertical shift registers shift out a sequence of antenna values for each frequency. All of the frequency 0 values may now be sent to an individual processor to process, for instance, the frequency 0 image plane of an eight element array of antennas.

Consider the storage requirements for a simple eight by eight array. Each shift register cell could contain 10 bits of real and 10 bits of imaginary number for a total of 20 bits per cell. There are 64 horizontal cells and 64 vertical cells for a total of 128 cells. At 20 bits per cell, the machine requires 2560 bits of storage. A 512antenna array would require over 10 million bits of storage all organized as shift registers in, possibly, many field programmable gate arrays.

A different approach, using no storage but only switches is proposed.



The configuration has the appearance of the fast Fourier transform butterfly arrangement. At the right end of each butterfly, a switch is located rather than the phase rotation and add of the fast Fourier transform. When the switch is in the straight through position it will be called the zero position. When it is in the diagonal position it will be called the one position. The data enters the array from the left in the diagram. The switches all change position after the passage of one cell time. In order for the scheme to work, the data must be skewed in time from one antenna to the next. When the data emerges from each port, at the right, the order of the antennas will be the same but they will be skewed in time. This data skewing should not, in general, present a problem. It takes eight cell times, t=0-7, for the data to complete one turn. The switch positions for each cell time are indicated in the following table.

|   | 0 |   |   | 1 |   |   | 2 |   |   | 3 |   |   | 4 |   |   | 5 |   |   | б |   |   | 7 |   |   |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| а | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| b | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| С | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| d | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| е | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| f | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| g | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| h | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |

The numbers across the top of the table represent the cell times. The letters to the left represent the antenna row. The pattern of ones and zeros represents the switch state for each of the switch locations.

Compare the above corner turner with the previous example. The switch array has no storage so the 2560 bits of storage required in the previous example is not required. 24 switches are required in the 8-bit switch array. A 3 bit counter is required in both versions to count the cell position. The shift register array must transfer the data at the end of 8 counts and the switch array must control the switches according to the cell count. The word length from each antenna may change with no change in the switch array design. All that is required is an adjustment of the cell switch time. The shift register array would require a complete redesign. In a 512 antenna system only 4608 switches would be required compared to 10 million bits of storage in the shift register array.

Lattice semiconductor currently makes a 160 I/O field programmable cross-point switch that can do 250 MHz. 52 of these devices in 208 fine pitch ball grid array packages could fit on a VME sized card and provide all of the corner turning required for a 512 antenna system with 512 channels and 12.5 MHz bandwidth. At the end of the year they promise a 240 I/O device in a 388 fine pitch ball grid array package.



Above is a diagram for an abbreviated modest 512 antenna imaging system. Considerable savings are realized by using modern high-speed logic to implement a relatively low bandwidth system. A bandwidth of 12.5 MHz is used with an FFT of 512 channels producing a bandwidth of 24.4 kHz per channel. Each channel has a word length of 8 bits real and 8 bits imaginary. The data emerges from the FFT machine as a serial stream of 16 bits per cell operating at a bit rate of 200 Mbits through a single coax or fiber. The corner turner converts the data stream from a sequence of channels for each antenna to a sequence of antennas for each channel. The 16 bit cells enter the processors at a 24.5 kHz rate per antenna where they are correlated. A 512antenna array has 130816 distinct antenna pairs or baselines. If we use 200 MHz field programmable gate arrays for the processors, then 8196 multiplies can be done by a single multiplier per sample interval. Only 16 multipliers will be needed to accomplish the 130816 multiplies required to correlate all antenna pairs. Each processor may be made up of one reasonable sized field programmable gate array and some memory chips. All 512 processors will be identical.

An alternative arrangement could be a bandwidth of 25 MHz and 4 bit word lengths for the real and the imaginary terms for a cell size of 8 bits. The corner turner design would remain exactly the same requiring only a change in cell counter timing. The processor would have to be reprogrammed to have 32 multipliers but they would be 4 by 4 multiplies rather than 8 by 8. The bandwidth could be

increased to 50 MHz if 2 bit word lengths could be tolerated. Once again, no design change would be required for the corner turner. The cell counter would only need a change in timing. The processors would need reprogramming for 64 multipliers but they would be simpler 2 by 2 multiplies. If a system with finer resolution is required, the FFT for each of the antennas could be increased to 1024 channels. Since the channel bandwidth is now one half, the sample rate is halved. The same corner turner can be used but this time the switches must switch every two cells. Two frequency channels emerge from each cable of the corner turner to enter each processor. Since there are now two channels for the processor to deal with, twice as many multiplies must be accomplished. But since the sample rate for each channel is only half of what it was previously, the processor has twice as long to do all of them. The net result is that no increase in hardware is required except for data memory. Increased resolution is a matter of reprogramming the system as long as there is sufficient processor memory and capacity in the FFT chip.

<sup>1</sup> Chikada, Y.; Ishiguro, M.; Hirabayashi, H.; Morimoto, M.; Morita, K.-I.; Kanzawa, T.; Iwashita, H.; Nakazima, K.; Ishikawa, S.-I.; Takahashi, T.; Handa, K.; Kasuga, T.; Okumura, S.; Miyazawa, T.; Nakazuru, T.; Miura, K.; Nagasawa, S. "A 6\*320-MHz 1024-channel FFT cross-spectrum analyzer for radio astronomy." Proceedings of the IEEE, vol.75, (no9), Sept.1987. p.1203-10.

 $^2$  Escoffier, R. "The MMA Correlator" NRAO Millimeter Array Memo Series No. 166