A lot of thoughts have been shared lately in the wireless telecommunication industry about multiple-input/multiple-output (MIMO) systems of massive scale (referred to as massive MIMO systems). The principle behind massive MIMO system is that when you reach a critical number of antennas (around 60), the system’s energy and spectral efficiency increase significantly. For details about the theoretical principles behind massive MIMO systems, see Nutaq’s six-part blog series:
1. Massive MIMO – Part 1. Introduction: From theory to implementation
2. Massive MIMO – Part 2: A few lessons learned from asymptotic information theoretic analysis
3. Massive MIMO – Part 3: Capacity, coherence time and hardware capability
4. Massive MIMO – Part 4: Massive MIMO and small cells, the next generation network
5. Massive MIMO – Part 5: The need for practical prototyping and implementation
6. Massive MIMO – Part 6: Estimation and capacity limits due to transceiver impairments
Systems containing around 100 transceivers achieve significant gains in efficiency. As energy and spectral efficiency are top priorities in the development of next-generation wireless networks, the research community is looking to develop this technology. The hardware industry now needs to be able to provide systems that enable its physical implementation.
This blog post describes some of the hardware challenges behind massive MIMO systems and explains how Nutaq is responding to them.
The development of future wireless technologies faces several challenges, one of which is a very efficient, low-latency, high-throughput data interface between the central processing unit and the multiple transceivers. To achieve such a scale in a MIMO system, one must raise the frequency to very high values in order to reduce the size of the antennas. To increase the data throughput over the RF link, a wide bandwidth is also necessary. The frequency coverage is defined by the RF front-end. This challenge will be addressed by designing front-ends with the required band pass frequency coverage. However, a wide bandwidth coverage implies a very high data throughput within the system, thus representing a second challenge to be addressed. The following section will explain more thoroughly this issue as well as how Nutaq addresses it.
Rapid data throughput
A challenging issue within a massive MIMO system is that the data must be accessible by every processing unit in order to compute inverse matrices and other such algorithmic functions. If the system is equipped with a central processing unit, then all the data must be routed to it. If the configuration involves distributed processing, every processor must have access to all of the data all of the time, which involves the same requirements for data throughput as for the central processor.
To understand the throughput required, let’s calculate what would be the minimal data throughput to a central processing node for a 100×100 MIMO system with a 28 MHz bandwidth using a standard LTE sampling clock speed of 30.72 MHz and 12-bit samples.
The calculations are as follows: for a coverage of 28 MHz, a sampling rate of 2×30.72 MHz, 61.44 MSPS is required according to the Nyquist theorem (when based on an LTE standard clock of 30.72 MHz). Each sample has 12 bits, therefore the data rate for one radio is 737.28 Mbits/s. Multiplying this by 100 to cover all the radios, we see that the throughput to the central processor is 73.7 Gbps when only covering a bandwidth of 28 MHz. A 100 MHz bandwidth target would require at least 320 Gbps of throughput.
Most telecommunication systems are based on FPGAs. With the exception of the Virtex-7, Xilinx’s FPGAs only support PCIe Gen 1. Nutaq’s RTDEx IP core provides support for PCIe gen 1 4x on the Virtex-6 based Perseus AMC. This interface allows a tested sustained throughput of around 6 Gbps. If we built a system using a serial architecture like the one shown below, a bottleneck would appear.
Using Nutaq’s PicoSDR, with PCIe support and 4 transceivers, we see in Figure 1 that even with only 16 radio channels covering a 20-MHz bandwidth, a bottleneck arises because the required 7.7 Gbps throughput isn’t met by the PCIe interface implemented by the Virtex-6 FPGA. You can imagine what happens with 100 channels covering a 100MHz bandwidth!
Figure 1. Arrangement of PicoSDRs to cover 16×16
One possible solution is to parallelize the data routing. In other words, one could either aim for a mesh architecture between the distributed processing nodes or have multiple links routing data from subgroups of radios within the system to the processing unit in parallel. Figure 2 shows such a solution for the 16×16 system studied in the previous section.
Figure 2. Parallelizing the data links
Parallelizing the data links requires you to change the interface from PCIe to another, more adapted, interface. In the case of Nutaq’s hardware, Aurora is used through miniSAS disk connectors, allowing up to 7x 16 Gbps per each subgroup of 4 radio channels. The miniSAS disk connectors are on a Rear Transition Module (RTM) and installed in an MTCA.4 chassis. To raise the number of channels, Nutaq creates subgroups of radio channels resembling the one in Figure 2.
Figure 3. Many subgroups for a 96×96 system
The following table compares the requirements to meet a 28 MHz bandwidth coverage against a system based on Figure 3.
|Link||28 MHz req.||Proposed system|
|Blue link necessary throughput||2950 Mbps||16000 Mbps|
|Red link necessary throughput||14745 Mbps||16000 Mbps|
|Full central perseus input throughput||73728 Mbps||112000 Mbps|