Introduction - Basics - The RIFF Format - The WAVE Format - The Canonical WAVE Format - Compression - Recording and Playing - Writing Data - Reading Data - Pitch Extraction - More Information - StripWav
Find this useful? Tips are welcome!
The WAVE file format is a subset of Microsoft's RIFF spec, which can include lots of different kinds of data. It was originally intended for multimedia files, but the spec is open enough to allow pretty much anything to be placed in such a file, and ignored by programs that read the format correctly.
This description is not meant to be exhaustive, but to suggest simple ways of doing common tasks with waveform audio, and give some pointers to other sources of information.
First, some basics. Sound is air pressure fluctuation. Digitized sound is a graph of the change in air pressure over time. That's all there is to it.
For a good picture of this, open up Windows Sound Recorder and record a short sound, then look at the green bars it shows. When they're wide, there's a lot of air pressure, which your ear detects as a loud noise. When they're flat in the middle, there's no change in air pressure, which your ear detects as silence. The faster they go up and down, the higher the sound you hear.
When you record a sound, your microphone changes the air pressure fluctuations into electrical voltage fluctuations, which your sound card measures every so often and changes into numbers, called samples. When you play a sound back, the process is reversed, except that the voltage fluctuations go to your speakers instead of your microphone, and are converted back into air pressure by the speaker cone.
The speed with which your sound card samples the voltage is called the sample rate, and is expressed in kilohertz (kHz). One kHz is a thousand samples per second.
It's important to note that digitized audio stores nothing directly about a sound's frequency, pitch, or perceived loudness. You can run certain algorithms on the samples to determine these values approximately, but you can't just read them from the file.
RIFF is a file format for storing many kinds of data, primarily multimedia data like audio and video. It is based on chunks and sub-chunks. Each chunk has a type, represented by a four-character tag. This chunk type comes first in the file, followed by the size of the chunk, then the contents of the chunk.
The entire RIFF file is a big chunk that contains all the other chunks. The first thing in the contents of the RIFF chunk is the "form type," which describes the overall type of the file's contents. So the structure of a RIFF file looks like this:
Offset Contents (hex) 0000 'R', 'I', 'F', 'F' 0004 Length of the entire file - 8 (32-bit unsigned integer) 0008 form type (4 characters) 000C first chunk type (4 character) 0010 first chunk length (32-bit unsigned integer) 0014 first chunk's data ... ...
All integers are stored in the Intel low-high byte ordering (usually referred to as "little-endian").
A more detailed description of the RIFF format can be found in the Microsoft Win32 Multimedia API documentation, which is supplied as a Windows Help file with many Windows programming tools such as C++ compilers.
The WAVE format is a subset of RIFF used for storing digital audio. Its form type is "WAVE", and it requires two kinds of chunks:
WAVE can also contain any other chunk type allowed by RIFF, including LIST chunks, which are used to contain optional kinds of data such as the copyright date, author's name, etc. Chunks can appear in any order.
The WAVE file is thus very powerful, but also not trivial to parse. For this reason, and also possibly because a simpler (or inaccurate?) description of the WAVE format was promulgated before the Win32 API was released, a lot of older programs read and write a subset of the WAVE format, which I refer to as the "canonical" WAVE format. This subset basically consists of only two chunks, the fmt and data chunks, in that order, with the sample data in PCM format. For a detailed description of what this format looks like, and a description of the contents of the fmt chunk, look here.
The WAVE specification supports a number of different compression algorithms. The format tag entry in the fmt chunk indicates the type of compression used. A value of 1 indicates Pulse Code Modulation (PCM), which is a "straight," or uncompressed encoding of the samples. Values other than 1 indicate some form of compression. For more information on the values supported and how to decode the samples, see the Microsoft Win32 Multimedia API documentation.
The simplest way to write data into WAVE files produced by your own programs is to use the canonical format. This will be compatible with any other program, even the older ones. You can also use my WAVE and RIFF C++ classes for this (see the next section).
There are four methods you could use to read the sample data from a WAVE file:
If all you want to do is play a WAVE file, use the Win32 function PlaySound(). If you're writing for 16-bit Windows, use sndPlaySound() instead. You can use either of these functions without knowing anything about the internal format of the file.
To give your users a little more control over pausing, stopping, and rewinding longer WAVE files, use the MCI functions or an MCI control provided by your programming language. You can also record audio using MCI.
For more precise control than the MCI functions provide, you'll need to either use the Microsoft Win32 Multimedia API (which should be documented with whatever programming language you're using), or find a commercial or shareware library for your language that does. If you find one you like, please let me know so I can list it here, since I'm often asked for advice on this!
One way is to use a Fast Fourier Transform. Other ways involve doing things like examining statistical correlations between samples at different offsets, to determine the full wavelength at a given timepoint. It's hairy stuff.
For more references, try the following links:
Find this useful? Tips are welcome!
Back to TJW's home