首页 » 解决方案 » Integrate high-quality audio into mobile design(Part 1)

Integrate high-quality audio into mobile design(Part 1)

作者:  时间:2009-02-19 16:40  来源:52RD手机研发

Multimedia functionality is becoming more important in handheld products, and consumers are demanding higher fidelity audio in their wireless devices, PCs, games, software applications, and music synthesizers.

In order to meet this demand and enable high-quality music playback on a wide range of consumer devices, product designers must trade-off performance, memory, and power consumption to optimize for user preferences and expectations. As a result, there are some key considerations when integrating audio synthesis into an embedded platform, including the functional components of the systems, related standards, design options, memory issues, and performance trade-offs.

Differentiating to Compete
As consumers continually demand more features and functionality from their mobile handsets, mobile operators continue to look for new ways to differentiate product offerings and expand revenue streams. In turn, these operators drive handset/terminal manufactures to provide them with products that offer the consumers more capabilities, features, and options. At the same time, they look to source handsets from manufactures that can offer their network some differentiation over their wireless competition.

MIDI Audio synthesis is one such feature that has provided an opportunity to both differentiate products and increase revenue by way of polyphonic ringtone capabilities. Therefore, it is becoming more important for embedded designers to understand the major issues and design techniques when integrating a high-quality audio synthesis solution into a mobile platform.

MIDI — The Format Standard
There are dozens of different file formats that have been developed over the years for storing data for audio synthesis. The most widely accepted standards-based format is the Standard MIDI File (SMF), a standard jointly overseen by the MIDI Manufacturers Association (MMA) and the Association of Musical Electronics Industry (AMEI).

The MIDI standard originated as a protocol to transmit a musical performance over a 31.25Kbits/s serial cable. With limited bandwidth available, the protocol comprises primarily control data such as "Note-On," signifying the moment when a performer presses a note on a musical keyboard, and "Note-Off," signifying when the note is released.

The SMF format was created in the 1980’s as a means to capture the MIDI data stream into a file for editing on a computer. Events are stored in a stream format with the addition of delta timestamps to mark the amount of time since the last event. The file format is very compact, with a typical files size between 10 and 100KB. A similar file stored in a perceptive audio coder format such as MP3 would be 4MB. While MIDI behaves similar to a codec at the receiver end, an encoder that can encode an arbitrary audio stream is impractical today.

Synthesizing Audio
MIDI is able to achieve these levels of "compression" because the audio data itself is not stored in the file, only the actions of the performer, similar to the concept of a player piano roll with its coded instructions of what notes to play when, how hard to play them and for what duration. The audio synthesis engine then interprets this data into audio producing actions using a synthesis algorithm. There are two common forms of synthesis used in mobile platforms today: frequency modulation (FM) and sample-based synthesis or more simply, sampling synthesizer.

FM synthesis uses a purely algorithmic technique of modulating a carrier signal with a modulator. The resulting output is a rich spectrum of sound created by the sums and differences of the two frequencies. By varying the amount of modulation applied to the carrier over time, the spectrum can be manipulated to imitate real instruments, or create new synthetic sounds. This is the synthesis technique popularized by Yamaha, and which is incorporated into their MA series of MIDI synthesis ICs.

In contrast, a sampling synthesizer utilizes recordings of actual instruments as well as synthetic sounds. By varying the playback speed through interpolation, a single recording can be used to synthesis a range of frequencies. The sound is often further manipulated using filters to dynamically vary the output spectrum.

Generally, sampling synthesizers produce more realistic sounding instruments but the realism comes at the cost of additional read-only memory (ROM) for the sample library. FM synthesizers require less memory to store the algorithm parameters for their sounds, but signal processing requirements tend to be much higher.

Audio Synthesis Components
The process of reading a MIDI file and synthesizing an audio output stream from it can be broken into three distinct components: the file parser, the MIDI interpreter, and the synthesis engine.

The file parser reads MIDI data from a file or input stream and reconstructs the timeline from the delta timestamps stored in the file. Timestamps are generally specified relative to the tempo of the musical piece although they can also be specified relative to the Society of Motion Pictures and Television Engineers (SMPTE) time code. The file parser converts the relative timestamps in the file to absolute time so that events can be fed to the MIDI interpreter at the appropriate time.

The MIDI interpreter acts on the performance data in the MIDI stream. For example, when a "Note-On" event is received, the MIDI interpreter must locate the algorithm parameters that characterize the musical instrument to be synthesized, allocate resources ("voice") to synthesize the note and start the process of synthesizing the note.

The performance data may occasionally request more voices than are available, in which case the MIDI interpreter must determine which notes have priority. "Voice stealing" occurs when an active voice is reallocated to synthesize a new note.

The synthesis engine receives control data from the MIDI interpreter and synthesizes the audio based on the supplied parameters ("program") and, in the case of a sampling synthesizer, the sample data. The output of all the voices is mixed together based on the MIDI controls to render the final audio output.

Sweetening the Audio Output
In addition to the basic algorithmic processing required to synthesize a note, other signal processing may take place including audio filters, chorus and reverb (typically called "side-chain effects"), audio exciter, compressor/limiter, and equalization (EQ) (typically called "post-processing effects"). These effects are often referred to as "audio sweeteners" and they can greatly enhance the quality of the audio. This is another opportunity for manufacturers to differentiate and add more value to their audio offerings.

Audio filters are used to vary the spectrum of the synthesizer to simulate changes in brightness, such as the natural decay of a piano string. A chorus is a delay line with a variable tap used to simulate multiple voices, providing a richer tone to brass and string section sounds. A reverb is a combination of delay lines and all-pass filters used to simulate the reverberation of different environments such as a concert hall or stadium. All of these effects are normally controlled on an individual instrument level. For example, the brass section can have chorus effects applied without affecting the piano.

An audio exciter brightens the audio by adding harmonics to fill in the upper frequency range, an effect that can help make up for harmonics that may be lacking in the original samples. A compressor/limiter maximizes the output signal level by increasing the output gain when the overall volume of the synthesizer drops, which is a useful effect for a ringtone that needs to be heard in a noisy environment. EQ can be used to compensate for characteristics of the transducer and acoustics of the mobile device itself.

Performance Optimization
Software-based audio synthesis requires a considerable amount of processor bandwidth. The actual bandwidth required is highly influenced by the polyphony of the synthesizer (the number of simultaneous notes that can be synthesized), specifics of the algorithm (such as the sample rate), the need for additional signal processing stages (including individual voice filters, chorus or reverb effects), and post-processing enhancements (such as an audio exciter or compressor/limiter).

The specifications of the processor architecture are very important. The number of registers, availability of zero wait-state memory such as cache or tightly-coupled memory (TCM), and signal processing capability (such as multiply-accumulate operation [MAC] pipelines and saturating arithmetic) can all significantly influence performance.

The bulk of the code in a software-based synthesizer is the control logic in the file parser and MIDI interpreter. This code represents 5 to 20% of the overall execution time of the synthesizer, runs well on a 32-bit general purpose processor, and benefits from both instruction and data cache.

The code for the synthesizer engine is usually much smaller than the control code, but represents 80 to 95% of the overall execution time of the synthesizer. The synthesizer engine should be one that is designed specifically for embedded applications and consists of small loops of a few hundred bytes executing tens to hundreds of cycles at a time. Due to its small size, it is not significantly impacted by nor does it contribute to cache pollution. If no cache is available, locating the synthesizer engine code into TCM will likely double the performance of the synthesizer engine.

Due the nature of signal processing in the engine code, it will also benefit from a MAC pipeline and saturating arithmetic. If DSP bandwidth is available, it may make sense to offload this code to a DSP, which is usually more efficient at executing signal processing algorithms. If the control code is to run on a separate general-purpose processor, some consideration will have to be given to moving the processed control data down to the DSP to control the synthesis engine.

Sampling synthesizers also access a large amount of sample data, which is typically stored in ROM, from inside the synthesizer engine inner loop. Access tends to occur in periodic sequential reads. Making the sample data cacheable can result in a significant performance increase, as a typical 32-byte cache line holds enough data to keep the inner loop running at zero wait-states for many iterations. Assuming that instruction and read-write data are already cached, enabling cache for sample data may nearly double the performance. While sample data is not very susceptible to cache pollution, it does contribute to it, as sample data is typically used once or twice in a loop and then it may be many more iterations before it is used again.

 

相关推荐

u-blox为专业IoT平台提供蜂巢式通讯连接技术

u-blox  iot  无线通信  2018-01-26

u-blox发表具备四频2G向后兼容的全球最小 LTE Cat M1和 NB-IoT多模模块

u-blox  iot  lte  2018-01-23

通用测试仪器大全之电子负载仪

2017-11-16

u-blox推动全球第一款NB-IOT智能路灯系统的实现

2017-09-01

ercogener采用u-blox LTE Cat M1蜂巢式技术 开发EMEA地区的首款工业4.0调制解调器

2017-11-03

u-blox与Atoll Solutions携手为印度的智慧城市提供易于使用的LPWA技术

u-blox  IoT  LTE  智慧城市  2017-08-12
在线研讨会
焦点