How to Add No-Sweat, Low-Power
Audio to Your Next SOC Design
By Steve Leibson
Technology Evangelist
Tensilica, Inc.
If you're facing the design
of an SOC with on-chip digital audio, this article will help you make
the engineering tradeoffs and will take you to a successful project
tape out. To do that, we'll look at your alternatives, key decision
factors you should be evaluating, and the consequences of your decisions.
The Audio Codec
The core element in all digital-audio
applications is the codec. Short for coder/decoder, the codec defines
how analog audio is digitized and compressed into a bit stream and how
the bit stream is later decompressed to reproduce the analog audio channels.
The first compression algorithm to see widespread use in consumer products
was MP3, first developed in 1991. Since then, many other audio standards
have been introduced for better quality sound.
There are four types of choices
for implementing audio codecs.
You can run the
codec as firmware on a general-purpose processor. A PC running an MP3
player is an example of this alternative.
You can implement
a codec as a piece of hardware. The simple, early portable MP3 players
used this approach.
You can implement
the codec as firmware running on a DSP.
Or, you can use
an audio-specific processor, which is a general-purpose processor that's
been adapted to be especially efficient at running digital-audio codecs.
Alternative 1, using one processor
to perform all system functions including the user interface, I/O, and
running the digital-audio codec, has several advantages. First, there's
most likely a general-purpose processor available, so the digital-audio
codec is just another task running on that processor. The only incremental
cost is perhaps a bit of additional instruction memory. Second, the
processor can implement multiple audio codecs using firmware, so you
can create a multifunction product. Finally, this design approach accommodates
new codecs as they're invented.
There are some disadvantages
to this approach however. Digital audio performance is extremely sensitive
to glitches. The ear picks up every audio imperfection. Processor multitasking,
as employed for this design alternative, increases the probability of
audio glitches because the processor's bandwidth is not fully devoted
to audio playback.
In addition, most general-purpose
processors lack audio-specific features, so they execute audio codecs
inefficiently. The consequence is increased clock rate. General-purpose
processors need to execute more instructions per second to compensate
for the inefficiency of their general-purpose instructions in audio
applications.
Hardware Codecs
The second design alternative
pairs a relatively low-performance processor with a hardware codec and
lets the codec hardware handle all of the audio processing. The processor
can feed audio samples to the hardware audio codec over the bus or the
codec might DMA audio samples directly out of memory.
Using a hardware codec has
advantages. It is the most efficient way to implement one codec
in terms of silicon area and energy consumption. However, each new codec
requires an additional hardware block. So if your product must support
three audio-codec standards, you must add three hardware blocks to the
design as shown in Figure 1.
Figure 1.
A design with three hardware codec blocks.
Next, if there's a change
in the codec specification or a bug in the codec algorithm, you must
respin the chip to fix the problem because a hardware audio codec isn't
programmable. Also, you can't change a hardware codec to support a
new digital-audio codec standard. You must design a new block, add it
to the system design, and respin the chip.
Another approach to implementing
digital audio is to run codec firmware on a general-purpose DSP, under
the direction of the system's host processor. Most DSPs have integral
hardware multipliers that greatly improve DSP execution efficiency on
digital-audio firmware. Also, DSPs run firmware so they easily accommodate
multiple digital-audio codec standards with relatively modest increases
in memory size and therefore silicon.
Using DSPs to run audio codecs
has disadvantages as well. Most DSPs are very poor targets for C compilers,
so software codecs written in C will not easily run on a DSP. Also,
16- and 32-bit DSPs are not ideal for audio processing. Although most
audio codecs today work with 16-bit audio samples, intermediate calculations
need headroom to avoid round-off errors so 16-bit DSPs have problems
with complex audio algorithms (they clip and distort the sound) unless
the audio algorithms use double-precision integer math, which is inefficient
and thus increases clock rate. Conversely, 32-bit DSPs are overkill.
Audio algorithms don't need and can't make use of 32-bit multipliers;
24-bit DSPs are really optimum for audio algorithms.
The Audio-Specific RISC
Processor
The fourth alternative is an
audio-specific processor, which is a general-purpose processor with
audio-specific extensions that make the processor especially efficient
at executing audio-codec firmware while retaining the characteristics
that make the general-purpose processor a good compiler target. Figure
2 shows a system that uses an audio-specific processor to run the audio
codecs.
Figure 2.
An Audio-Specific Processor system block diagram
This design approach has several
advantages. The audio-specific extensions let the processor execute
audio algorithms and deliver the required general-purpose performance
while running at a much lower clock rate, which drastically cuts energy
consumption. This implementation easily supports multiple audio standards
so it's a good approach for multi-standard audio products. It also
supports new codecs as they appear, through the addition of firmware.
The one disadvantage of this approach is that the concept of an audio-specific
processor is somewhat unfamiliar, so let's remedy that situation right
now.
Tensilica's HiFi 2 Audio
Engine exploits the extensibility and configurability of Tensilica's
Xtensa 32-bit RISC processor architecture to create a general-purpose
processor that's very efficient at executing audio firmware. One of
the key extensions in the HiFi 2 Audio Engine is a pair of 24-bit hardware
multipliers, which really speed the audio calculations.
However, multipliers alone
won't get the cycle count down for audio algorithms so the HiFi 2
Audio Engine can also execute one or two operations per cycle; some
of its audio-specific extensions perform two operations simultaneously.
This feature further cuts cycle count. Wide 48- and 56-bit registers
that can store 24-bit stereo sample pairs are another important extension.
With the addition of these registers, the processor handles stereo audio
data as a native data type. In all, Tensilica added 300 audio-specific
instructions to the Xtensa RISC processor to create a more efficient
audio-algorithm execution engine.
Hardware alone is not sufficient
to make a processor into an attractive audio component for SOC design.
Your product needs audio codecs and you don't want to develop these
codecs yourself. (You don't have the time.) Although you can find
some audio codecs on the Internet, they're not optimized so they're
not efficient. In addition, you will not find code for licensed algorithms
such as Dolby digital audio codecs on the Internet. Over 30 audio codecs
are available for Tensilica's HiFi 2 Audio Engine. Note that all of
these codecs are written in C. The processor's general-purpose RISC
instructions combined with the audio-specific instructions allow the
firmware writers to keep the codecs in C, which improves code maintainability
while keeping clock rates low.
The HiFi 2 Audio Engine is
an extension to Tensilica's configurable, extensible Xtensa LX2 processor
core. Tensilica has taken that set of extensions and predefined a processor
core called the Diamond 330HiFi Audio Engine. This processor core is
preconfigured and it runs all of the HiFi 2 audio codecs. Tensilica's
HiFi Audio Engines have already shipped in tens of millions of products
from a variety of end product and semiconductor manufacturers. The largest
current application, of course, is in cell phones. Future product applications
will include video products, consumer radios, and ultra-mobile PCs.
System-Design Considerations
The first consideration for
designing audio into portable applications is whether the product will
support just playback or playback with audio enhancements. These enhancements
include multiband spectral equalization, bass enhancements, 3D synthesized
audio, MIDI synthesis, and so on. HiFi 2 Audio Engine clock rates for
various audio codecs are well below 100 MHz. Adding audio enhancements
to the processing load can add 100 to 200 MHz to the required processor
clock rate, which can have unforeseen consequences.
For example, if you need to
synthesize the processor core for 200MHz instead of 50MHz operation,
the logic-synthesis tool will meet the speed constraints by using more
and bigger buffers and by adding redundant logic to speed signals along
critical logic paths. In doing so, the synthesis tool creates a processor
that consumes more energy. Running this processor at the higher clock
rate also increases energy consumption. So, even though it sounds counter-intuitive,
consider using a separate processor for the audio effects, to keep the
clock rates down. This approach also lets you power down the effects
processor when playback is all that's required.
You can also influence energy
consumption by carefully selecting a memory strategy for the SOC's
audio processor. Simple audio designs with just one or only a few supported
audio codecs may need only local memory. If you use memory caches, the
memories will be bigger because caches require tag arrays. Without caches,
you need not power up the processor's cache-control logic, so there's
energy to be saved by using local memory instead of cache.
More complex audio applications
that use more sophisticated audio codecs may require so much memory
that cache is a better choice. With a configurable processor like Tensilica's
Xtensa LX2 with the HiFi 2 Audio Engine extensions, you can elect to
have or not have both caches and local memory. Tensilica's preconfigured
Diamond 330HiFi processor core has both caches and local memory. You
can use both or you can leave the cache disabled for audio applications.
Paths to Low Power Audio
The key paths to lowest-power
on-chip audio are low processor clock rate, low bus traffic, and optimal
use of local memories and cache. You want to keep the audio traffic
off of the main bus to avoid turning the bus into an artificial bottleneck
and to minimize energy consumption. And finally, you want to make sure
you can perform 24-bit audio processing on a comprehensive set of available
audio codecs, to ensure good-quality audio and a timely product release.
Steven Leibson is the Technology Evangelist for Tensilica, Inc. He formerly served as Editor in Chief of the Microprocessor Report, EDN magazine, and Embedded Developers Journal. He holds a BSEE from Case Western Reserve University and worked as a design engineer and engineering manager for leading-edge system-design companies including Hewlett-Packard and Cadnetix before becoming a journalist. Leibson is an IEEE Senior Member.