Doug Pulley, co-founder and Chief Technology Officer at picoChip 3G technologies are critical for the evolution of the mobile industry and are rapidly changing the face of wireless. However, it is presenting the " /> The challenges of W-CDMA and TD-SCDMA basestation design >
首页 » 市场趋势 » The challenges of W-CDMA and TD-SCDMA basestation design

The challenges of W-CDMA and TD-SCDMA basestation design

作者:Doug Pulley  时间:2005-09-16 13:06  来源:本站原创

Doug Pulley, co-founder and Chief Technology Officer at picoChip

3G technologies are critical for the evolution of the mobile industry and are rapidly changing the face of wireless. However, it is presenting the mobile industry with significant technical, financial and business challenges. Only by reducing the cost-per-Erlang, and simultaneously and dramatically increasing the efficiency of data will they continue the transformation.
There are already tens of millions of 3G users, with Asia leading the way in high-speed data usage, and China is clearly recognised as the largest and most important wireless market in the world. It is there where multiple standards usage is encouraging new IP (intellectual property) and research is driving innovation. However, carriers must cope cost-effective basestation development and future upgrades to cope with these ever-increasing demands.
The business models that proved so successful in 2G are increasingly under threat, forcing operators to look to new solutions for future success. WCDMA and other 3G technologies such as TD-SCDMA, are extremely complex, providing significant problems when it comes to design, development and deployment. It demands a design that does not just pass conformance but will work under a variety of real-world conditions.
The outlook for the traditional approach to design, development and deployment is not good. Costs are rising, time to market is lengthening and the prospect of early revenues and rapid return (RoI) on investment are slim. The technical issues are exacerbated by the fact that they come at a time when money for capital and operational expenditure is extremely tight. The architecture of typical basestation and implementation method as figure 1 shows.
If 3G is to succeed this mould must be broken: the design and development cycle must be dramatically shortened and the cost of the process reduced significantly. Enter the new philosophy of the “software-defined basestation” - new thinking and a new approach - features or improved performance can be achieved simply with code-downloads. Devices now exist with a new processor architecture that make this possible by delivering an order-of-magnitude improvement in price-performance.
This articles describes many of the techniques used in 3G basestations, and discusses the trade-offs and implementations involved. It then discusses in more detail the architectural issues for WCDMA and TD-SCDMA and concludes with thoughts on the “multi-lingual” or “software-defined” basesatation. It looks at how a new basestation philosophy could radically improve the economics for carriers, and for aggressive / nimble original equipment manufacturers (OEMs) taking advantage of such a shift.
A large number of variables affect the amount of processing number needed to accommodate the expected number of simultaneously active channels in a W-CDMA or a TD-SCDMA basestation. They range from the type of basestation itself – whether a picocell, microcell or a long-range macrocell – through to the implementation choices made by the engineering team over the design’s ability to deal with adverse conditions and traffic patterns. Such considerations will greatly affect the cost of carrying each channel.
Estimates of processing power needed have changed substantially as experience with W-CDMA design and deployment has shown where compute resources need to be deployed. Continued R&D work will alter this picture again, particularly for the younger and more advanced TD-SCDMA standard, and again as implementers learn the trade-offs that provide the most cost-effective solution in the real-world environment.
A discussion of key parts of the basestation will demonstrate the challenges that face designers. In a Release 99 W-CDMA or a Release 4 TD-SCDMA basestation, the receive path represents the most complex part of the basestation design as the subsystems within it need to be able to handle a complex set of interactions between them to be able to decode incoming signals effectively. However, increased complexity will need to be dealt with for later releases of the 3GPP standards, such as Release 5 which provides significantly higher download speeds and which increases the complexity of the transmit path. The increased uplink datarate included the follow-on Release 6 of the 3GPP standards will put further strain on the receive path. This indicates how much flexibility will play a huge role in allowing equipment vendors.
Design flexibility is an important part of design even with a fixed standard, as implementation choices can greatly affect network performance. In the receive path of every basestation, one of the most important parts is the rake receiver. And there are many choices over implementation. The algorithm of the rake receiver is comparatively simple in structure, but is complex to implement and can be computationally intensive in a multi-antenna system.
The rake receiver is used to improve reception performance. In W-CDMA or TD-SCDMA, the spreading code given to each mobile terminal’s transmitting signal makes it possible to combine information contained in the echoes, which will have a distinct code signature, to augment the overall signal for each active channel, improving performance.
How many echoes the basestation uses to boost the signal is an implementation decision, based on the number of fingers employed by the rake receiver algorithm. A rake receiver for a basestation can contain just three rake fingers or as many as eight.
In practice, 90% of the signal energy from echoes can be picked up using just three or four fingers. Much will depend on the target environment. A picocell, where mobile terminals will be so close that there will be little to no resolvable multipaths may not need any more than two rake fingers. A basestation destined for a larger cell will have a number of resolvable echoes that can be combined usefully into a better aggregate signal. As a result, the use of more rake fingers will be beneficial.

figure 1、The Architecture of Typical Basestation and Implementation Method

figure 2、The Advantage of picoArray to Compare with Traditional DSP.

figure 3、A Real HSDPA Implementation in picoArray Devices

However, other factors in basestation receive-path design can negate the benefit of using a large number of rake fingers. The tracker, which is controlled by the rake finger manager, plays a pivotal role in determining the efficiency of the rake receiver. The tracker is designed to watch how existing signal paths change over time. It works out the delay of each echo that is to be processed by each rake finger. Some ASIC-based designs use brute-force algorithms that attempt to correlate signals within a range of delays. Others employ a combination of heuristic algorithms and correlation to find the paths.
A related function is the searcher, which identifies paths that have vanished and new paths that have appeared. The searcher and the tracker are managed by the rake finger manager but operate independently. However, it is possible to design a basestation that dispenses with the tracker altogether in favor of a more sophisticated searcher. Implementation details and results will govern those choices.
Inner-loop power control presents a potentially difficult problem for the basestation design team to resolve. The faster the power control subsystem can respond, the better the quality of the received signal in each channel. However, faster processing implies greater processing power, so there is a clear trade-off. But how that trade-off affects real-world performance may only be seen in field trials.
Power control is desirable because it helps to reduce the interference between mobile terminals operating in the cell. A mobile terminal that is close to the basestation will, without some form of power control, provide a much stronger signal than a user situated further away, possibly producing an unacceptable level of interference. However, other considerations come into play, such as data rate, service type and quality of service parameters, which complicate the power-control algorithm.
The power control scheme is based on a closed loop that operates with every slot, giving the update loop a potential maximum frequency of 1.6kHz. At first sight, this seems to be a low update frequency that does not demand high-speed processing. However, there is a serious latency constraint. The maximum delay from receiving the information at the basestation to providing power control signals back to each of the mobiles is only around 600µs. If this deadline is missed, then information in subsequent slots may be missed. Potentially, mobiles may move too quickly to be adequately monitored using the slot-based power-control system if the response from the basestation is not fast enough. Although the loop could be implemented on a DSP, the latency constraints mean that dedicated support may be essential.
The large number of feedback loops and interfaces in 3G systems implies the widespread use of buffers to even out the differences in time that it takes to process each algorithm. This buffering can grow to be quite large in parts of the system where there scheduling latency is comparatively high. For example, the buffer in the interface between an ASIC and a DSP may need to be sized to account for worst-case scheduling delays of a number of milliseconds and can easily grow to be megabytes in size. If scheduling latency can be guaranteed to be lower, the amount of memory in the buffer can be reduced.
If the buffer needs to be large, it will demand off-chip memory which brings with in additional bandwidth and latency penalties when the DSP tries to access the data. The overhead may increase the time that it takes to fetch each piece of data or demand additional hardware support in the form direct memory access controllers to bring data on-chip.
The later development of the TD-SCDMA standard has introduced a number of additional technological improvements that complicate basestation design but which overcome effects such as intra-cell interference – a problem encountered by all CDMA technologies – and inter-cell interference. The techniques include the use of smart antennas, joint detection and dynamic channel allocation.
Smart antennas are beam-steering antennas that track mobile usage through the cell and distribute the power to cell areas that have active mobile subscribers. Smart antennas reduce multiuser interference and minimize intra-cell interference. However, the technique demands high compute power, as there are typically eight antenna channels that need to be supported in the receive path, together with support for high-precision fixed-point mathematics functions.
Joint detection allows the receiver to estimate the radio channel for all signals simultaneously. By processing individual traffic streams in parallel, joint detection can significantly reduce multiple-access interference and minimize intra-cell interference. Dynamic channel allocation builds on these techniques to allow the basestation to allocate radio resources based on the interference scenario, minimizing intercell interference.
For network operators, TD-SCDMA can bring capital expenditure savings. For example, one of the benefits of the technology is its reduction of the cell-breathing effect, caused by intra-cell interference. With conventional CMDM technologies, as the number of active users grows, the effective area of the cell reduces, demanding that more basestations are deployed to allow peak usage. By reducing the effects of intra-cell interference, TD-SCDMA avoids much of the impact of the cell-breathing effect encountered in technologies such as W-CDMA.
As operators have the freedom to deploy W-CDMA and TD-SCDMA basestations with the same network, taking advantage of their different characteristics in each location, equipment builders are likely to have to support both types of design. This will demand great flexibility in the hardware architecture, which will be difficult to achieve using conventional architectures based on a mixture of DSPs and FPGAs.
In a Release 99 W-CDMA basestation design or a Release 4 TD-SCDMA implementation, the transmit path is somewhat more straightforward than the receive path. Most of the processing needed is in the form of coding and shaping algorithms without the need for the sophisticated detection and decoding algorithms needed in the receive path. However, the evolution of the 3G standards will bring greater degrees of complexity to basestation design.
As with the receive path, the various types of traffic that the basestation’s transmit path symbol-rate section has to deal with will demand different buffering, framing and types of error-correction code. Although the processing needed is more straightforward than that of the receive path, good budgeting is needed to ensure that the system is able to deal with a wide range of traffic types and not those that flatter the strengths of a particular DSP instruction set.
The situation becomes more complex with the introduction of High Speed Downlink Packet Access (HSDPA). This protocol will support the evolution of 3G into a medium that will allow the cost-effective deployment of high-speed Internet access services, which will have a more asymmetric profile than voice and data communications.
Under HSDPA, there are number of key changes needed to optimise the radio interface and packet structure for the higher-speed IP traffic. The high user data rates are achieved by applying higher-level modulation schemes, based on Quadrature Amplitude Modulation (QAM), and by including an adaptive coding scheme based on Turbo Codes. Further, the Node B basestation becomes responsible for scheduling decisions that would previously have been implemented in the Radio Network Controller (RNC).
The Release 99 system and its equivalent in the TD-SCDMA system normally carry user data over dedicated transport channels, designed to carry continuous user data. HSDPA introduces a new transport channel type, the High-Speed Downlink Shared Channel (HS-DSCH). This shared channel can be used more efficiently where a number of users receive bursts of data, typical of Internet access. The coding rate of this channel can vary, on a per-user basis. Under poor reception conditions, the modulation can vary as well, possibly reverting to QPSK from the higher-order modulation of 16QAM. Link adaptation ensures the highest possible data rate is achieved both for users with good signal quality, who are typically close to the base station, and for more distant users at the cell edge, who may receive data with a lower coding rate.
A more important and fundamental change is the movement of the scheduling function from the RNC into the Node B basestation itself. This is to allow the use of fast-scheduling algorithms where users are served data under constructive fading conditions, based on the channel-quality estimates, rather than risking high error rates that would be experienced by users in poor reception conditions using a conventional user-priority or round-robin scheme. This scheduling works hand-in-hand with the algorithms used to select modulation and coding schemes. The use of a short frame length also helps improve the responsiveness of the fast-scheduling approach.
This greatly increases the responsiveness of the Node B. The 16QAM coding change increases the peak speed, in the same way that a high-powered engine can boost the performance of a car; but it is the MAC change that makes HSDPA deliver a real-world speed increase, much like replacing a learner driver with a Formula One racing driver. It demonstrates how a shift in the 3G architecture from a traditional “dumb pipe with intelligent centre” towards a more data-com like “smart edge” can yield better results.
There are many high-speed feedback loops needed to implement HSDPA efficiently and provide users with the best data rates possible. For example, the latency needed for modulation and coding selection for individual frames in the shared channel is just 2ms compared with a typical time of 10ms (and up to 80ms) for the interval used for power control in the Release 99 shared-channel specification. Further, the algorithms needed to make good use of the possibilities provided by fast scheduling will be more complex than those implemented by existing RNC software, but those decisions have to be made within a millisecond.
When link errors occur, data packets can be retransmitted quickly at the request of the mobile terminal. In existing WCDMA networks, these requests are processed by the RNC. As with fast scheduling, better responsiveness is provided by HSDPA by processing the request in the Node B. The Hybrid Automatic Repeat Request (HARQ) protocol developed for HSDPA allows efficient retransmission of dropped or corrupted packets. In addition to fast retransmissions, a number of techniques are used to provide the mobile terminal with a better chance of receiving the data correctly. For users with a high coding rate, simple chase combining may be used, which simply repeats the packet. For users with a low coding rate, incremental redundancy can be used. In this scheme, parity bits are sent to allow the mobile terminal to combine the information from the first transmission with subsequent retransmissions.
The consequence of these design decisions is that the scheduler and re-transmission manager require large buffers to hold all the packets that might need to be resent. This function was not present in earlier functions, and the hardware to support it needs to have been designed in readiness for it for existing implementations to support HSDPA at sufficiently high datarates.
A number of factors will control how well scheduling will work in the field. It is relatively simple to devise a scheduling algorithm that will work well in the laboratory with artificially generated constructive fading conditions but there are many circumstances that will affect real-world systems. Not least of those are the evolving capabilities of the terminals themselves, whether they are handsets or data cards inserted into PCs. The latency demands of HSDPA mean that designs will react differently to changing fading conditions and packet delivery speeds. Similar problems were seen in the early days of the Internet where interactions between the different layers of the protocol stack led to less efficient bandwidth utilisation than expected. Numerous techniques were developed to overcome the problems and inserted into terminal equipment and infrastructural systems to bring performance back up to their expected levels.
If a scheduler is not designed to react to problems, operators may see some users with terminals that are able to handle high-speed transfers starved of bandwidth while other users with less capable systems use up too much of the HS-DSCH bandwidth. Such a situation will see much lower data utilisation than expected, reducing the ARPU that can be derived from that basestation. A more intelligent scheduler that watches for changes to channel and terminal conditions – and schedules packets for terminals that are able to receive at higher datarates – will improve the overall revenue that can be derived.
As well as allowing for evolution in scheduler design, in many cases, it will be desirable to have different scheduling policies in action at different times of the day or tuned for certain types of location, such as an airport waiting lounge. To test this requires multiple scenarios to be evaluated under different loading conditions. As a result, architectures that maximise flexibility will be key to efficient HSDPA implementation.
Processing granularity will be a major consideration for the efficient implementation of a HSDPA-compliant basestation. Systems based on a small number of high-performance DSPs tend to demand large buffers and, to reduce the overhead of switching between tasks, will tend to work on large groups of data at any one time. However, such a coarse-grained approach to task scheduling is a poor fit for algorithms such as scheduling that need low latency to work effectively.
Most basestations today use a combination of DSP software and FPGA hardware in the baseband. DSP software might be the preferred option (as was the case in earlier generations), but with the processing budget being some 100 times that of the previous generation, peripheral hardware accelerators for the DSP host are essential to make up for the lack of processing power in the DSP.
Although the trend was initially towards ASICs to provide dedicated processing power, the rapid evolution of 3G standards coupled with the broad scope of implementation options is driving hardware development back towards flexibility. This is largely due to spiralling development costs, the need for differentiating features and risk of specification change.
To develop a complex ASIC for a base station in contemporary silicon technology requires a large team of engineers, and $100m plus development cost and a 36-month design gestation are not untypical. If there is a design problem, the re-spin of the device will add cost and verification time, increasing time-to-market and reducing the expected return on investment (ROI). The long development time of a custom device does not matter in some markets, but it clearly is a concern in a fast-paced environment such as W-CDMA or TD-SCDMA. Indeed, it is not unrealistic to say that the development cycle of an ASIC may be longer than the release life of the product.
FPGAs on the other hand have used the increasing performance of silicon technology to reach million gate densities and gigabit per second interface speeds, so high-end devices can now be used for chip-rate processing. FPGAs are attractive in that they can be re-programmed at the engineer’s workbench within a design flow similar to that of an ASIC, using high-level language such as Verilog and synthesis tools. Although FPGAs have improved in density more rapidly than ASICs this is largely due to increasing use of dedicated blocks (multipliers, processors and so on). FPGAs have traditionally been extremely versatile. This implies a trade-off with optimisation for a particular function; the gate level granularity they offer is ill-suited to efficiently and quickly implementing complex tasks.
General-purpose DSP software delivers design flexibility, but has traditionally not been able to deliver the sheer processing power required in these systems, with processing capability only rising with Moore’s Law, not with system requirements. Historically, this came primarily through increased clock rate, and consequently increased power. This is now being supplemented in high-end devices with exploitation of instruction level parallelism. This is implemented as very long instruction word (VLIW), such as the TI C6xxx or the ADI TigerSHARC. In order to achieve such performance, complex pipelines and multiple instruction execution units are employed – requiring a sophisticated compiler to exploit them. Although contemporary DSP compilers are more efficient than previous generations, and claim to allow a significant proportion of the programming task to be written in C, there is a performance impact. Some critical tasks will still be written in assembler and producing code on such a complex device is hard. Even at 1GHz-plus clock rates the performance is insufficient to replace the FPGA or ASIC, and they burn considerable power.
In a 3G basestation, it is important to ensure that the granularity of the compute elements is well aligned to the tasks within a communications system, striking a balance between the very fine granularity of a universal FPGA, or the ‘big chunks’ of a powerful DSP. In general, there are two distinct classes of operations. The first is dataflow – where operations will be regular and predictable (whether stream or block) and may be fast (such as chip-rate processing in the receive or transmit paths). This will typically require many elements ‘clumped’ together, and it is important that interconnect arrangements are both fast and deterministic. There is a large degree of parallelism both within algorithms and across multiple instances, which suits the FPGA architecture more.
Then there are control, which are ‘diffused’ across the entire system and must interact with many individual blocks. Typically these tasks are individually quite simple, but can be aggregated together. This code will be serial, and will need to support many different options or switches for specific cases or modes. This will often suit a DSP-oriented approach.
An underappreciated problem is that the interaction between the DSP and FPGA in a 3G basestation is complex as processing resources are distributed across multiple channels. Arbitration is particularly difficult as it is important that the system does not stall or lose data when contention for processing resource arises. As a consequence, verifying that data integrity is maintained under every interrupt or contention scenario is extremely difficult, and typically consumes many months of exhaustive testing.
Baseband processing algorithms do not map well onto the typical heterogeneous combination of DSP and FPGA. At the root of the problem lies an imbalance between the data processing and control schemes that each class of device supports, and restrictive communication between each device. Marrying the orthogonal requirements of device control, communication and data processing within the cluster is complex, and requires significant software overhead to manage the scheduling of processing resource, and arbitrate effectively when contention for resource arises.
The challenge for the system developer is to guarantee that when contention for resource arises as a result of, say, a new subscriber entering the cell and demanding bandwidth for a high speed data call, information is not lost or corrupted whilst accommodating the additional demand. Given that the availability of resources at any given time cannot be guaranteed, the system verification process requires exhaustive testing of every loading scenario.
Exhaustive testing is prohibitively complex and time-consuming, however more practical performance testing cannot guarantee that the specified performance can be achieved in every scenario. So, while the use of accelerators and DSP coprocessors improve the performance of the individual devices, the inherent problem of guaranteeing that processing resource will be available for every conceivable loading scenario still remains. With all the different design environments it is very difficult to verify that the system works until it is actually built, and this is a multi-million dollar bet. The design objective, therefore, is to check, as far as possible, the proper functioning of hardware and software before the system is built. This reduces the time in testing and trials, and increases the quality of the end system.
A better approach is to reflect the inherently parallel and flexible nature of the task. The picoArray architecture takes the straightforward approach of integrating these two approaches into a single, unified toolset. Each element can be programmed in either C or assembly code, while a hardware-description language is used to describe the inherently parallel, inter-processor interconnect and timing. This approach allows the algorithms to be efficiently partitioned and mapped onto specific processing elements at a relatively high level. It also allows the use of new or existing C code to add functions, optimising code re-use and exploiting existing programming skills for rapid prototyping.
The picoArray approach uses the same development process for the data path and control functions within the system. Thus, the homogeneous picoArray development environment avoids significant product integration risks; the danger is that products developed using the older heterogeneous design methods require significant re-design during the integration of the FPGA and DSP product elements. This concern intensifies when considering design across generations. While it is the case that migrating from one version of a product is a manageable effort, the need to re-architect, re-integrate and co-develop across the different architectures and devices does not scale, increasing the effort and risk of moving to new generations. In contrast, when using a unified design environment like the picoArray, code and architectures will port from one generation to the next, without requiring significant effort or repeated co-development. The advantage of picoArray to compare with traditional DSP as figure 2 shows.

Secondly, the granularity/scalability of the architecture means that tasks are decomposed to manageable “chunks” which are statically mapped to discrete elements. Not only are these elements small enough to test and validate, but because they are static and only interact in controlled ways, that validation is trustworthy.
Fine-grained control will be necessary to implement features such as fast scheduling and per-user coding and modulation adaptation in the HSDPA upgrade to 3G. With a large number of processing elements, it becomes possible to dedicate processing and buffer resources almost on a per-user or per-function basis. For example, one processor may collate information for a processor that just runs an advanced scheduling algorithm. This allows the processor to perform scheduling decisions all the time. This will yield much lower latencies than a system where scheduling is shared with other tasks on a general-purpose processor or DSP. A real HSDPA implementation in picoArray devices as figure 3 shows.
The deterministic architecture of the picoArray eliminates scheduling and arbitration in the underlying architecture, so system loading is entirely fixed and does not have to use statistical multiplexing to use the processing resources. This is true within processors (no interrupts or complex pipelines with interlocks or bubbles) and between them (a deterministic interconnect). The factors outlined above combine to simplify verification and testing before the system is built; performance is deterministic and fixed at compile time, unlike a conventionally complex DSP where performance is determined at run-time. Consequently, a designer can accurately predict the final performance from cycle-accurate deterministic (rather than statistical) simulations. Further, A flexible, software-based design will be vital for future improvements to the 3G service offering. HSDPA is an unbalanced system, with a maximum of 14Mbit/s on the downlink and 2Mbit/s on the uplink, from the terminal to the network.That can be a concern, as TCP can easily be “uplink choked” if acknowledgments are slow, reducing the downlink rate. Release 6 of the 3GPP specification will change that by introducing High-Speed Uplink Packet Access (HSUPA). This allows users to take advantage of faster uplinks with lower latency when sending large files or emails. That in turn improves the efficiency of the link, increasing effective throughput, even though the modulation has not changed.
HSUPA puts even more strenuous demands on the Node B design and will mean that the processing electronics will have to deal with a much more complex decode environment in the same way that HSDPA demands much more of the terminals in terms of decoding. HSUPA means moving further control functions from the RNC to the Node B.
Higher datarate services allow operators to achieve significantly higher ARPU than will be possible using a WCDMA network based on Release 99-compliant equipment. But the changes that come in Releases 5 and 6 call for attention to implementation issues that will break many existing designs. Operators who choose an architecture that allows them to tune Node B designs for different areas and maximise datarates will achieve faster payback and see the benefits of higher ARPU that will come with the increasing use of high-speed, wireless data access. The presence of a unified, high-performance software-based fabric will make it easier to support protocol additions such as these in both the W-CDMA and TD-SCDMA environments.
Furthermore, the availability of multilingual Node Bs would offer operational and financial advantages. The technology enables improvements in performance, new services all of which combine to radically improve RoI for carriers and aggressive manufacturers.
This new software-defined basestation technology is attractive because it delivers significantly lower cost of ownership through:
•reduced development costs and accelerated time to market
•reducing manufacturing costs through lower bills of materials
•reduced field operating costs through the elimination of costly upgrades and obsolescence.

相关推荐

中国TD强芯之旅:从无芯到强芯的飞跃

TD  4G  2014-01-17

4G时代国产芯片“使命必达”

4G  TD-LTE  2014-01-07

“芯实力”引4G未来 中兴“芯”未雨绸缪

中兴  TD-LTE  2013-12-23

调查高通 或意在促TD-LTE产业良性发展

高通  TD-LTE  2013-11-28

中国电子报:TD-LTE使TDD技术更有生命力

TD-LTE  TDD  2013-11-21

四核当道 国产芯片瞄准TD-SCDMA新商机

TD-SCDMA  四核  2013-08-20
在线研讨会
焦点