Note: Transcribed and distributed with permission from the author, Jim Heckroth
of Crystal Semiconductor.  This text refers to some diagrams, please refer
to the original application note #AN27REV3, in the Crystal Semiconductor
Databook, 1993.  Please direct your inquiries to:

	Crystal Semiconductor Corporation
	4210 S. Industrial Dr.
	Austin, Texas  78744
	(512) 445-7222
	Fax (512) 445-7581
-------------------------------------------------------------------------------


A Tutorial on MIDI and Wavetable Music Synthesis

by Jim Heckroth

Introduction

The Musical Instrument Digital Interface (MIDI) protocol has been 
widely accepted and utilized by musicians and composers since its 
conception in the 1982/1983 time frame.  MIDI data is a very efficient 
method of representing musical performance information, and this makes 
MIDI an attractive protocol for computer applications which produce 
sound, such as multimedia presentations or computer games.  However, 
the lack of standardization of synthesizer capabilities hindered applications 
developers and presented MIDI users with a rather steep learning curve 
to overcome.  Fortunately, thanks to the publication of the General 
MIDI System specification, wide acceptance of the most common PC/MIDI 
interfaces, support for MIDI in Microsoft WINDOWS, and the evolution 
of low-cost high-quality wavetable music synthesizers, the MIDI protocol 
is now seeing widespread use in a growing number of applications.  This 
paper gives a brief overview of the standards and terminology associated 
with the generation of sound using the MIDI protocol and wavetable 
music synthesizers.

Use of MIDI in Multimedia Applications

Originally developed to allow musicians to connect synthesizers together, 
the MIDI protocol is now finding widespread use in the generation 
of sound for games and multimedia applications.  There are several 
advantages to generating sound with a MIDI synthesizer rather than 
using sampled audio from disk or CD-ROM.  The first advantage is storage 
space.  Data files used to store digitally sampled audio in PCM format 
(such as .WAV files) tend to be quite large.  This is especially true 
for lengthy musical pieces captured in stereo using high sampling 
rates.  MIDI data files, on the other hand, are extremely small when 
compared with sampled audio files.  For instance, files containing 
high quality stereo sampled audio require about 10 MBytes of data 
per minute of sound, while a typical MIDI sequence might consume less 
than 10 KBytes of data per minute of sound.  This is because the MIDI 
file does not contain the sampled audio data, it contains only the 
instructions needed by a synthesizer to play the sounds.  These instructions 
are in the form of MIDI messages, which instruct the synthesizer which 
sounds to use, which notes to play, and how loud to play each note.  The 
actual sounds are then generated by the synthesizer.

The smaller file size also means that less of the PCs bandwidth is 
utilized in spooling this data out to the peripheral which is generating 
sound.  Other advantages of utilizing MIDI to generate sounds include 
the ability to easily edit the music, and the ability to change the 
playback speed and the pitch or key of the sounds independently.  This 
last point is particularly important in synthesis applications such 
as karaoke equipment, where the musical key and tempo of a song may 
be selected by the user.

MIDI Systems

The Musical Instrument Digital Interface (MIDI) protocol provides 
a standardized and efficient means of conveying musical performance 
information as electronic data.  MIDI information is transmitted in 
"MIDI messages", which can be thought of as instructions which tell 
a music synthesizer how to play a piece of music.  The Synthesizer 
receiving the MIDI data must generate the actual sounds.  The MIDI 
1.0 Detailed Specification, published by the International MIDI Association, 
provides a complete description of the MIDI protocol.

The MIDI data stream is a unidirectional asynchronous bit stream at 
31.25 kbits/sec. with 10 bits transmitted per byte (a start bit, 8 
data bits, and one stop bit).  The MIDI interface on a MIDI instrument 
will generally include three different MIDI connectors, labeled IN, 
OUT, and THRU.  The MIDI data stream is usually originated by a MIDI 
controller, such as a musical instrument keyboard, or by a MIDI sequencer. 
A MIDI controller is a device which is played as an instrument, and 
it translates the performance into a MIDI data stream in real time 
(as it is played). A MIDI sequencer is a device which allows MIDI 
data sequences to be captured, stored, edited, combined, and replayed.  The 
MIDI data output from a MIDI controller or sequencer is transmitted 
via the devices' MIDI OUT connector.  

The recipient of this MIDI data stream is commonly a MIDI sound generator 
or sound module, which will receive MIDI messages at its MIDI IN connector, 
and respond to these messages by playing sounds.  Figure 1 shows a 
simple MIDI system, consisting of a MIDI keyboard controller and a 
MIDI sound module.  Note that many MIDI keyboard instruments include 
both the keyboard controller and the MIDI sound module functions within 
the same unit.  In these units, there is an internal link between 
the keyboard and the sound module which may be enabled or disabled 
by setting the "local control" function of the instrument to ON or 
OFF respectively.

The single physical MIDI channel is divided into 16 logical channels 
by the inclusion of a 4 bit channel number within many of the MIDI 
messages.  A musical instrument keyboard can generally be set to transmit 
on any one of the sixteen MIDI channels.  A MIDI sound source, or 
sound module, can be set to receive on specific MIDI channel(s).  In 
the system depicted in Figure 1, the sound module would have to be 
set to receive the channel which the keyboard controller is  transmitting 
on in order to play sounds.

Information received on the MIDI IN connector of a MIDI device is 
transmitted back out (repeated) at the devices' MIDI THRU connector.  Several 
MIDI sound modules can be daisy-chained by connecting the THRU output 
of one device to the IN connector of the next device downstream in 
the chain.

Figure 2 shows a more elaborate MIDI system. In this case, a MIDI 
keyboard controller is used as an input device to a MIDI sequencer, 
and there are several sound modules connected to the sequencer's MIDI 
OUT port.  A composer might utilize a system like this to write a 
piece of music consisting of several different parts, where each part 
is written for a different instrument. The composer would play the 
individual parts on the keyboard one at a time, and these individual 
parts would be captured by the sequencer.  The sequencer would then 
play the parts back together through the sound modules.  Each part 
would be played on a different MIDI channel, and the sound modules 
would be set to receive different channels.  For example, Sound module 
number 1 might be set to play the part received on channel 1 using 
a piano sound, while module 2 plays the information received on channel 
5 using an acoustic bass sound, and the drum machine plays the percussion 
part received on MIDI channel 10.  

In the last example, a different sound module is used to play each 
part.  However, sound modules which are "multi-timbral" are capable 
of playing several different parts simultaneously.  A single multi-timbral 
sound module might be configured to receive the piano part on channel 
1, the bass part on channel 5, and the drum part on channel 10, and 
would play all three parts simultaneously. 

Figure 3 depicts a PC-based MIDI system.  In this system, the PC is 
equipped with an internal MIDI interface card which sends MIDI data 
to an external multi-timbral MIDI synthesizer module.  Application 
software, such as Multimedia presentation packages, educational software, 
or games, send information to the MIDI interface card over the PC 
bus.  The MIDI interface converts this information into MIDI messages 
which are sent to the sound module.  Since this is a multi-timbral 
module, it can play many different musical parts, such as piano, bass 
and drums, at the same time.   Sophisticated MIDI sequencer software 
packages are also available for the PC.  With this software running 
on the PC, a user could connect a MIDI keyboard controller to the 
MIDI IN port of the MIDI interface card, and have the same music composition 
capabilities discussed in the last paragraph.

There are a number of different configurations of PC-based MIDI systems 
possible.  For instance, the MIDI interface and the MIDI sound module 
might be combined on the PC add-in card.  In fact, the Microsoft Multimedia 
PC (MPC) Specification states that a PC add-in sound card must have 
an on-board synthesizer in order to be MPC compliant.  Until recently, 
most MPC compliant sound cards included FM synthesizers with limited 
capabilities and marginal sound quality. With these systems, an external 
wavetable synthesizer module might be added to get better sound quality.  Recently, 
more advanced sound cards have been appearing which include high quality 
wavetable music synthesizers on-board, or as daughter-card options.  With 
the increasing use of the MIDI protocol in PC applications, this trend 
is sure to continue.

MIDI Messages

A MIDI message is made up of an eight bit status byte which is generally 
followed by one or two data bytes.  There are a number of different 
types of MIDI messages.  At the highest level, MIDI messages are classified 
as being either Channel Messages or System Messages.  Channel messages 
are those which apply to a specific channel, and the channel number 
is included in the status byte for these messages.  System messages 
are not channel specific, and no channel number is indicated in their 
status bytes.  Channel Messages may be further classified as being 
either Channel Voice Messages, or Mode Messages.  Channel Voice Messages 
carry musical performance data, and these messages comprise most of 
the traffic in a typical MIDI data stream.  Channel Mode messages 
affect the way a receiving instrument will respond to the Channel 
Voice messages.  MIDI System Messages are classified as being System 
Common Messages, System Real Time Messages, or System Exclusive Messages.  System 
Common messages are intended for all receivers in the system.  System 
Real Time messages are used for synchronization between clock-based 
MIDI components.  System Exclusive messages include a Manufacturer's 
Identification (ID) code, and are used to transfer any number of data 
bytes in a format specified by the referenced manufacturer.  The various 
classes of MIDI messages are discussed in more detail in the following 
paragraphs.

Channel Voice Messages

Channel Voice Messages are used to send musical performance information.  The 
messages in this category are the Note On, Note Off, Polyphonic Key 
Pressure, Channel Pressure, Pitch Bend Change, Program Change, and 
the Control Change message.  

In MIDI systems, the activation of a particular note and the release 
of the same note are considered as two separate events.  When a key 
is pressed on a MIDI keyboard instrument or MIDI keyboard controller, 
the keyboard sends a Note On message on the MIDI OUT port.  The keyboard 
may be set to transmit on any one of the sixteen logical MIDI channels, 
and the status byte for the Note On message will indicate the selected 
channel number.  The Note On status byte is followed by two data bytes, 
which specify key number (indicating which key was pressed) and velocity 
(how hard the key was pressed).  The key number is used in the receiving 
synthesizer to select which note should be played, and the velocity 
is normally used to control the amplitude of the note.  When the key 
is released, the keyboard instrument or controller will send a Note 
Off message.  The Note Off message also includes data bytes for the 
key number and for the velocity with which the key was released.  The 
Note Off velocity information is normally ignored. 

Some MIDI keyboard instruments have the ability to sense the amount 
of pressure which is being applied to the keys while they are depressed.  This 
pressure information, commonly called "aftertouch", may be used to 
control some aspects of the sound produced by the synthesizer (vibrato, 
for example). If the keyboard has a pressure sensor for each key, 
then the resulting "polyphonic aftertouch" information would be sent 
in the form of Polyphonic Key Pressure messages.  These messages include 
separate data bytes for key number and pressure amount.  It is currently 
more common for keyboard instruments to sense only a single pressure 
level for the entire keyboard. This "channel aftertouch" information 
is sent using the Channel Pressure message, which needs only one data 
byte to specify the pressure value. 

The Pitch Bend Change message is normally sent from a keyboard instrument 
in response to changes in position of the pitch bend wheel.  The pitch 
bend information is used to modify the pitch of sounds being played 
on a given channel.  The Pitch Bend message includes two data bytes 
to specify the pitch bend value.  Two bytes are required to allow 
fine enough resolution to make pitch changes resulting from movement 
of the pitch bend wheel seem to occur in a continuous manner rather 
than in steps.  

The Program Change message is used to specify the type of instrument 
which should be used to play sounds on a given channel.  This message 
needs only one data byte which specifies the new program number.

MIDI Control Change messages are used to control a wide variety of 
functions in a synthesizer.  Control Change messages, like other MIDI 
channel messages, should only affect the channel number indicated 
in the status byte.  The control change status byte is followed by 
one data byte indicating the "controller number", and a second byte 
which specifies the "control value".  The controller number identifies 
which function of the synthesizer is to be controlled by the message.  

Controller Numbers 0 - 31 are generally used for sending data from 
switches, wheels, faders, or pedals on a MIDI controller device such 
as a musical instrument keyboard.  Control numbers 32 - 63 are used 
to send an optional Least Significant Byte (LSB) for control numbers 
0 through 31, respectively.  Some examples of synthesizer functions 
which may be controlled are modulation (controller number 1), volume 
(controller number 7), and pan (controller number 10).  Controller 
numbers 64 through 67 are used for switched functions. these are the  sustain/damper 
pedal (controller number 64), portamento (controller number 65), sostenuto 
pedal (controller number 66), and soft pedal (controller number 67).  Controller 
numbers 16-19 and 80-83 are defined to be general purpose controllers, 
and controller numbers 48-51 may be used to send an optional LSB for 
controller numbers 16-19.  Several of the MIDI controllers merit more 
detailed descriptions, and these controllers are described in the 
following paragraphs.

Controller number zero is defined as the bank select.  The bank select 
function is used in some synthesizers in conjunction with the MIDI 
Program Change message to expand the number of different instrument 
sounds which may be specified (the Program Change message alone allows 
selection of one of 128 possible program numbers). The additional 
sounds are commonly organized as "variations" of the 128 addressed 
by the Program Change message.  Variations are selected by preceding 
the Program Change message with a Control Change message which specifies 
a new value for controller zero (see the Roland General Synthesizer 
Standard topic covered later in this paper). 

Controller numbers 91 through 95 may be used to control the depth 
or level of special effects, such as reverb or chorus, in synthesizers 
which have these capabilities.  

Controller number 6 (Data Entry), in conjunction with Controller numbers 
96 (Data Increment), 97 (Data Decrement), 98 (Registered Parameter 
Number LSB), 99 (Registered Parameter Number MSB), 100 (Non-Registered 
Parameter Number LSB), and 101 (Non-Registered Parameter Number MSB), 
may be used to send parameter data to a synthesizer in order to edit 
sound patches.  Registered parameters are those which have been assigned 
some particular function by the MIDI Manufacturers Association (MMA) 
and the Japan MIDI Standards Committee (JMSC). For example, there 
are Registered Parameter numbers assigned to control pitch bend sensitivity 
and master tuning for a synthesizer. Non-Registered parameters have 
not been assigned specific functions, and may be used for different 
functions by different manufacturers.  Parameter data is transferred 
by first selecting the parameter number to be edited using controllers 
98 and 99 or 100 and 101, and then adjusting the data value for that 
parameter using controller number 6, 96, or 97.

Controller Numbers 121 through 127 are used to implement the MIDI 
"Channel Mode Messages".  These messages are covered in the next section.

Channel Mode Messages

Channel Mode messages (MIDI controller numbers 121 through 127) affect 
the way a synthesizer responds to MIDI data.  Controller number 121 
is used to reset all controllers.  Controller number 122 is used to 
enable or disable Local Control (In a MIDI synthesizer which has it's 
own  keyboard, the functions of the keyboard controller and the synthesizer 
can be isolated by turning Local Control off).  Controller numbers 
124 through 127 are used to select between Omni Mode On or Off, and 
to select between the Mono Mode or Poly Mode of operation.

When Omni mode is On, the synthesizer will respond to incoming MIDI 
data on all channels.  When Omni mode is Off, the synthesizer will 
only respond to MIDI messages on one channel.  When Poly mode is selected, 
incoming Note On messages are played polyphonically.  This means that 
when multiple Note On messages are received, each note is assigned 
its own voice (subject to the number of voices available in the synthesizer).  The 
result is that multiple notes are played at the same time.  When Mono 
mode is selected, a single voice is assigned per MIDI channel. This 
means that only one note can be played on a given channel at a given 
time.  Most modern MIDI synthesizers will default to Omni On/Poly 
mode of operation.  In this mode, the synthesizer will play note messages 
received on any MIDI channel, and notes received on each channel are 
played polyphonically.  In the  Omni Off/Poly mode  of operation, 
the synthesizer will receive on a single channel and play the notes 
received on this channel polyphonically.  This mode is useful when 
several synthesizers are daisy-chained using  MIDI THRU.  In this 
case each synthesizer in the chain can be set to play one part (the 
MIDI data on one channel), and ignore the information related to the 
other parts.

Note that a MIDI instrument has one MIDI channel which is designated 
as its "Basic Channel".  The Basic Channel assignment may be hard-wired, 
or it may be selectable. Mode messages can only be received by an 
instrument on the Basic Channel. 

System Common Messages

The System Common Messages which are currently defined include MTC 
Quarter Frame, Song Select, Song Position Pointer, Tune Request, and 
End Of Exclusive (EOX).  The MTC Quarter Frame message is part of 
the MIDI Time Code information used for synchronization of  MIDI equipment 
and other equipment, such as audio or video tape machines.  

The Song Select message is used with MIDI equipment, such as sequencers 
or drum machines, which can store and recall a number of different 
songs.  The Song Position Pointer is used to set a sequencer to start 
playback of a song at some point other than at the beginning.  The 
Song Position Pointer value is related to the number of MIDI clocks 
which would have elapsed between the beginning of the song and the 
desired point in the song.  This message can only be used with equipment 
which recognizes MIDI System Real Time Messages (MIDI Sync).

The Tune Request message is generally used to request an analog synthesizer 
to retune its' internal oscillators.  This message is generally not 
needed with digital synthesizers.

The EOX message is used to flag the end of a System Exclusive message, 
which can include a variable number of data bytes.

System Real Time Messages

The MIDI System Real Time messages are used to synchronize all of 
the MIDI clock-based equipment within a system, such as sequencers 
and drum machines.  Most of the System Real Time messages are normally 
ignored by keyboard instruments and synthesizers.  To help ensure 
accurate timing, System Real Time messages are given priority over 
other messages, and these single-byte messages may occur anywhere 
in the data stream (a Real Time message may appear between the status 
byte and data byte of some other MIDI message).  The System Real Time 
messages are the Timing Clock, Start, Continue, Stop, Active Sensing, 
and the System Reset message.  The Timing Clock message is the master 
clock which sets the tempo for playback of a sequence.  The Timing 
Clock message is sent 24 times per quarter note.  The Start, Continue, 
and Stop messages are used to control playback of the sequence.  

The Active Sensing signal is used to help eliminate "stuck notes" 
which may occur if a MIDI cable is disconnected during playback of 
a MIDI sequence.  Without Active Sensing, if a cable is disconnected 
during playback, then some notes may be left playing indefinitely 
because they have been activated by a Note On message, but will never 
receive the Note Off.  In transmitters which utilize Active Sensing, 
the Active Sensing message is sent once every 300 ms by the transmitting 
device when this device has no other MIDI data to send.  If a receiver 
who is monitoring Active Sensing does not receive any type of MIDI 
messages for a period of time exceeding 300 ms, the receiver may assume 
that the MIDI cable has been disconnected, and it should therefore 
turn off all of its' active notes.  Use of Active Sensing in MIDI 
transmitters and receivers is optional.

The System Reset message, as the name implies, is used to reset and 
initialize any equipment which receives the message.  This message 
is generally not sent automatically by transmitting devices, and must 
be initiated manually by a user. 

System Exclusive Messages

System Exclusive messages may be used to send data such as patch parameters 
or sample data between MIDI devices.  Manufacturers of MIDI equipment 
may define their own formats for System Exclusive data.  Manufacturers 
are granted unique identification (ID) numbers by the MMA or the JMSC, 
and the manufacturer ID number is included as the second byte of the 
System Exclusive message.  The manufacturers ID byte is followed by 
any number of data bytes, and the data transmission is terminated 
with the EOX message.  Manufacturers are required to publish the details 
of their System Exclusive data formats, and other manufacturers may 
freely utilize these formats, provided that they do not alter or utilize 
the format in a way which conflicts with the original manufacturers 
specifications.

There is also a MIDI Sample Dump Standard, which is a System Exclusive 
data format defined in the MIDI specification for the transmission 
of sample data between MIDI devices.

Running Status

MIDI data is transmitted serially.  Musical events which originally 
occurred at the same time must be sent one at a time in the MIDI data 
stream, and therefore these events will not actually be played at 
exactly the same time.  However, the resulting delays are generally 
short enough that the events are perceived as having occurred simultaneously.  The 
MIDI data transmission rate is 31.35 kbit/s with  10 bits transmitted 
per byte of MIDI data.  Thus, a 3 byte Note On or Note Off message 
takes about 1 ms to be sent.  For a person playing a MIDI instrument 
keyboard, the time skew between playback of notes when 10 keys are 
pressed simultaneously should not exceed 10 ms, and this would not 
be perceptible.  However, MIDI data being sent from a sequencer can 
include a number of different parts.  On a given beat, there may be 
a large number of musical events which should occur simultaneously, 
and the delays introduced by serialization of this information might 
be noticeable.

To help reduce the amount of data transmitted in the MIDI data stream, 
a technique called "running status" may be employed. It is very common 
for a string of consecutive messages to be of the same message type.  For 
instance, when a chord is played on a keyboard, 10 successive Note 
On messages may be generated, followed by 10 Note Off messages.  When 
running status is used, a status byte is sent for a message only when 
the message is not of the same type as the last message sent on the 
same channel. The status byte for subsequent messages of the same 
type may be omitted (only the data bytes are sent for these subsequent 
messages). The effectiveness of running status can be enhanced by 
sending Note On messages with a velocity of zero in place of Note 
Off messages.  In this case, long strings of Note On messages will 
often occur.  Changes in some of the the MIDI controllers or movement 
of the pitch bend wheel on a musical instrument can produce a staggering 
number of MIDI channel voice messages, and running status can also 
help a great deal in these instances.

MIDI Sequencers and Standard MIDI files

MIDI messages are received and processed by a MIDI synthesizer in 
real time.  When the synthesizer receives a MIDI "note on" message 
it plays the appropriate sound.  When the corresponding "note off" 
message is received, the synthesizer turns the note off.  If the source 
of the MIDI data is a musical instrument keyboard, then this data 
is being generated in real time.  When a key is pressed on the keyboard, 
a "note on" message is generated in real time.  In these real time 
applications, there is no need for timing information to be sent along 
with the MIDI messages.  However, if the MIDI data is to be stored 
as a data file, and/or edited using a sequencer, then some form of 
"time-stamping" for the MIDI messages is required.

The International MIDI Association publishes a Standard MIDI Files 
specification, which provides a standardized method for handling time-stamped 
MIDI data.  This standardized file format for time-stamped MIDI data 
allows different applications, such as sequencers, scoring packages, 
and multimedia presentation software, to share MIDI data files.

The specification for Standard MIDI Files defines three formats for 
MIDI files.  MIDI sequencers can generally manage multiple MIDI data 
streams, or "tracks".  MIDI files having Format 0 must store all of 
the MIDI sequence data on a single track.  This is generally useful 
only for simple "single track" devices.  Format 1 files, which are 
the most commonly used, store data as a collection of tracks.  Format 
2 files can store several independent patterns.  

Synthesizer Polyphony and Timbres

The polyphony of a sound generator refers to its ability to play more 
than one note at a time.  Polyphony is generally measured or specified 
as a number of notes or voices.  Most of the early music synthesizers 
were monophonic, meaning that they could only play one note at a time.  If 
you pressed five keys simultaneously on the keyboard of a monophonic 
synthesizer, you would only hear one note.  Pressing five keys on 
the keyboard of a synthesizer which was polyphonic with four voices 
of polyphony would, in general, produce four notes.  If the keyboard 
had more voices (many modern sound modules have 16, 24, or 32 note 
polyphony), then you would hear all five of the notes.

The different sounds that a synthesizer or sound generator can produce 
are often referred to as "patches", "programs", "algorithms", sounds, 
or "timbres".  Modern synthesizers commonly use program numbers to 
represent different sounds they produce.  Sounds may then be selected 
by specifying the program numbers (or patch numbers) for the desired 
sound.  For instance, a sound module might use patch number 1 for 
its acoustic piano sound, and patch number 36 for its fretless bass 
sound.  The association of patch numbers to sounds is often referred 
to as a patch map.  A MIDI Program Change message is used to tell 
a device receiving on a given channel to change the instrument sound 
being used.  For example, a sequencer could set up devices on channel 
4 to play fretless bass sounds by sending a Program Change message 
for channel four with a data byte value of 36 (this is the General 
MIDI program number for the fretless bass patch).

A synthesizer or sound generator is said to be multi-timbral if it 
is capable of producing two or more different instrument sounds simultaneously.  Again, 
if a synthesizer can play five notes simultaneously, then it is polyphonic.  If 
it can produce a piano sound and an acoustic bass sound at the same 
time, then it is also multi-timbral.  A synthesizer or sound module 
which has 24 notes of polyphony and which is 6 part multi-timbral 
(capable of producing 6 different timbres simultaneously) could synthesize 
the sound of a 6 piece band or orchestra.  A sequencer could send 
MIDI messages for a piano part on channel 1, bass on channel 2, saxophone 
on channel 3, drums on channel 10, etc. A 16 part multi-timbral synthesizer 
could receive a different part on each of MIDI's 16 logical channels. 


The polyphony of a multi-timbral synthesizer is usually allocated 
dynamically among the different parts (timbres) being used. In our 
example, at a given instant five voices might be used for the piano 
part, two voices for the bass, one for the saxophone, and 6 voices 
for the drums, leaving 10 voices free.  Note that some sounds utilize 
more than one voice, so the number of notes which may be produced 
simultaneously may be less than the stated polyphony of the synthesizer, 
depending on which sounds are being utilized.

The General MIDI (GM) System

At the beginning of a MIDI sequence, a Program Change message is usually 
sent on each channel used in the piece in order to set up the appropriate 
instrument sound for each part.  The Program Change message tells 
the synthesizer which patch number should be used for a particular 
MIDI channel.  If the synthesizer receiving the MIDI sequence uses 
the same patch map (the assignment of patch numbers to sounds) that 
was used in the composition of the sequence, then the sounds will 
be assigned as intended.  Unfortunately, prior to General MIDI, there 
was no standard for the relationship of patch numbers to specific 
sounds for synthesizers.  Thus, a MIDI sequence might produce different 
sounds when played on different synthesizers, even though the synthesizers 
had comparable types of sounds.  For example, if the composer had 
selected patch number 5 for channel 1, intending this to be an electric 
piano sound, but the synthesizer playing the MIDI data had a tuba 
sound mapped at patch number 5, then the notes intended for the piano 
would be played on the tuba when using this synthesizer (even though 
this synthesizer may have a fine electric piano sound available at 
some other patch number).  

The General MIDI (GM) Specification, published by the International 
MIDI Association, defines a set of general capabilities for General 
MIDI Instruments.  The General MIDI Specification includes the definition 
of a General MIDI Sound Set (a patch map), a General MIDI Percussion 
map (mapping of percussion sounds to note numbers), and a set of General 
MIDI Performance capabilities (number of voices, types of MIDI messages 
recognized, etc.).  A MIDI sequence which has been generated for use 
on a General MIDI Instrument should play correctly on any General 
MIDI synthesizer or sound module.

The General MIDI system utilizes MIDI channels 1-9 and 11-16 for chromatic 
instrument sounds, while channel number 10 is utilized for "key-based" 
percussion sounds. The General MIDI Sound set for channels 1-9 and 
11-16 is given in table 1. These instrument sounds are grouped into 
"sets" of related sounds. For example, program numbers 1-8 are piano 
sounds, 6-16 are chromatic percussion sounds, 17-24 are organ sounds, 
25-32 are guitar sounds, etc. 

For the instrument sounds on channels 1-9 and 11-16, the note number 
in a Note On message is used to select the pitch of the sound which 
will be played.  For example if the Vibraphone instrument (program 
number 12) has been selected on channel 3, then playing note number 
60 on channel 3 would play the middle C note  (this would be the default 
note to pitch assignment on most instruments), and note number 59 
on channel 3 would play B below middle C.  Both notes would be played 
using the Vibraphone sound.

The General MIDI percussion map used for channel 10 is given in table 
2. For these "key-based" sounds, the note number data in a Note On 
message is used differently.  Note numbers on channel 10 are used 
to select which drum sound will be played.  For example, a Note On 
message on channel 10 with note number 60 will play a Hi Bongo drum 
sound.  Note number 59 on channel 10 will play the Ride Cymbal 2 sound.

It should be noted that the General MIDI system specifies sounds using 
program numbers 1 through 128.  The MIDI Program Change message used 
to select these sounds uses an 8-bit byte, which corresponds to decimal 
numbering from 0 through 127, to specify the desired program number.  Thus, 
to select GM sound number 10, the Glockenspiel, the Program Change 
message  will have a data byte with the decimal value 9.

The General MIDI system specifies which instrument or sound corresponds 
with each program/patch number, but General MIDI does not specify 
how these sounds are produced. Thus, program number 1 should select 
the Acoustic Grand Piano sound on any General MIDI instrument. However, 
the Acoustic Grand Piano sound on two General MIDI synthesizers which 
use different synthesis techniques may sound quite different.


Table 1: General MIDI Sound Set (All Channels Except 10)

Prog#	Instrument Name
=====	========================
1	Acoustic Grand Piano    
2	Bright Acoustic Piano   
3	Electric Grand Piano    
4	Honky-tonk Piano   
5	Electric Piano 1  
6	Electric Piano 2  
7	Harpsichord       
8	Clavi             
9	Celesta           
10	Glockenspiel      
11	Music Box         
12	Vibraphone        
13	Marimba           
14	Xylophone         
15	Tubular Bells     
16	Dulcimer          
17	Drawbar Organ     
18	Percussive Organ  
19	Rock Organ        
20	Church Organ      
21	Reed Organ        
22	Accordion         
23	Harmonica         
24	Tango Accordion   
25	Acoustic Guitar (nylon) 
26	Acoustic Guitar (steel) 
27	Electric Guitar (jazz)  
28	Electric Guitar (clean) 
29	Electric Guitar (muted) 
30	Overdriven Guitar 
31	Distortion Guitar 
32	Guitar harmonics  
33	Acoustic Bass     
34	Electric Bass (finger)  
35	Electric Bass (pick)    
36	Fretless Bass     
37	Slap Bass 1       
38	Slap Bass 2       
39	Synth Bass 1      
40	Synth Bass 2
41	Violin            
42	Viola             
43	Cello             
44	Contrabass 
45	Tremolo Strings 
46	Pizzicato Strings    
47	Orchestral Harp     
48	Timpani       
49	String Ensemble 1   
50	String Ensemble 2   
51	SynthStrings 1      
52	SynthStrings 2 
53	Choir Aahs    
54	Voice Oohs    
55	Synth Voice   
56	Orchestra Hit 
57	Trumpet       
58	Trombone      
59	Tuba          
60	Muted Trumpet 
61	French Horn   
62	Brass Section 
63	SynthBrass 1  
64	SynthBrass 2  
65	Soprano Sax   
66	Alto Sax      
67	Tenor Sax     
68	Baritone Sax    
69	Oboe  
70	English Horn    
71	Bassoon    
72	Clarinet   
73	Piccolo       
74	Flute         
75	Recorder   
76	Pan Flute     
77	Blown Bottle    
78	Shakuhachi 
79	Whistle       
80	Ocarina       
81	Lead 1 (square)    
82	Lead 2 (sawtooth)   
83	Lead 3 (calliope)    
84	Lead 4 (chiff)      
85	Lead 5 (charang)    
86	Lead 6 (voice)
87	Lead 7 (fifths)
88	Lead 8 (bass + lead)
89	Pad 1 (new age)
90	Pad 2 (warm)
91	Pad 3 (polysynth)
92	Pad 4 (choir)
93	Pad 5 (bowed)
94	Pad 6 (metallic)
95	Pad 7 (halo)
96	Pad 8 (sweep)
97	FX 1 (rain)
98	FX 2 (soundtrack)
99	FX 3 (crystal)
100	FX 4 (atmosphere)
101	FX 5 (brightness)
102	FX 6 (goblins)
103	FX 7 (echoes)
104	FX 8 (sci-fi)
105	Sitar
106	Banjo
107	Shamisen
108	Koto
109	Kalimba
110	Bag pipe
111	Fiddle
112	Shanai
113	Tinkle Bell
114	Agogo
115	Steel Drums
116	Woodblock
117	Taiko Drum
118	Melodic Tom
119	Synth Drum
120	Reverse Cymbal
121	Guitar Fret Noise
122	Breath Noise
123	Seashore
124	Bird Tweet
125	Telephone Ring
126	Helicopter
127	Applause
128	Gunshot

        
	Table 2: General Midi Percussion Map (Channel 10) 

Note #	Drum Sound
======	=====================
35	Acoustic Bass Drum 
36	Bass Drum 1    
37	Side Stick     
38	Acoustic Snare 
39	Hand Clap      
40	Electric Snare  
41	Low Floor Tom  
42	Closed Hi-Hat   
43	High Floor Tom 
44	Pedal Hi-Hat    
45	Low Tom        
46	Open Hi-Hat     
47	Low Mid Tom    
48	Hi Mid Tom     
49	Crash Cymbal 1 
50	High Tom       
51	Ride Cymbal 1  
52	Chinese Cymbal  
53	Ride Bell   
54	Tambourine  
55	Splash Cymbal   
56	Cowbell     
57	Crash Cymbal 2  
58	Vibraslap   
59	Ride Cymbal 2   
60	Hi Bongo    
61	Low Bongo   
62	Mute Hi Conga   
63	Open Hi Conga   
64	Low Conga   
65	High Timbale    
66	Low Timbale
67	High Agogo
68	Low Agogo
69	Cabasa
70	Maracas
71	Short Whistle
72	Long Whistle
73	Short Guiro
74	Long Guiro
75	Claves
76	Hi Wood Block
77	Low Wood Block
78	Mute Cuica
79	Open Cuica
80	Mute Triangle
81	Open Triangle


The Roland General Synthesizer (GS) Standard

The Roland General Synthesizer (GS) functions are a superset of those 
specified for General MIDI.  The GS system includes all of the GM 
sounds (which are referred to as "capital instrument" sounds), and 
adds new sounds which are organized as variations of the capital instruments.  

Variations are selected using the MIDI Control Change message in conjunction 
with the Program Change message.   The Control Change message is sent 
first, and it is used to set controller number 0 to some specified 
nonzero value indicating the desired variation (some capital sounds 
have several different variations). The Control Change message is 
followed by a MIDI Program Change message which indicates the program 
number of the related capital instrument.  For example, Capital instrument 
number 25 is the Nylon String Guitar. The Ukulele is a variation of 
this instrument.  The Ukulele is selected by sending a Control Change 
message which sets controller number 0 to a value of 8, followed by 
a program change message on the same channel which selects program 
number 25.  Sending the Program change message alone would select 
the capital instrument, the Nylon String Guitar.  Note also that a 
Control Change of controller number 0 to a value of 0 followed by 
a Program Change message would also select the capital instrument.

The GS system also includes adjustable reverberation and chorus effects.  The 
effects depth for both reverb and chorus may be adjusted on an individual 
MIDI channel basis using Control Change messages.  The type of reverb 
and chorus sounds employed may also be selected using System Exclusive 
messages.

Synthesizer Implementations: FM vs. Wavetable

There are a number of different technologies or algorithms used to 
create sounds in music synthesizers.  Two widely used techniques are 
Frequency Modulation (FM) synthesis and Wavetable synthesis.  FM synthesis 
techniques generally use one periodic signal (the modulator) to modulate 
the frequency of another signal (the carrier).  If the modulating 
signal is in the audible range, then the result will be a significant 
change in the timbre of the carrier signal.  Each FM voice requires 
a minimum of two signal generators.  These generators are commonly 
referred to as "operators", and different FM synthesis implementations 
have varying degrees of control over the operator parameters.  Sophisticated 
FM systems may use 4 or 6 operators per voice, and the operators may 
have adjustable envelopes which allow adjustment of the attack and 
decay rates of the signal.  Although FM systems were implemented in 
the analog domain on early synthesizer keyboards, modern FM synthesis 
implementations are done digitally.

FM synthesis techniques are very useful for creating expressive new 
synthesized sounds.  However, if the goal of the synthesis system 
is to recreate the sound of some existing instrument, this can generally 
be done more accurately with digital sample-based techniques.  Digital 
sampling systems store high quality sound samples digitally, and then 
replay these sounds on demand. Digital sample-based synthesis systems 
may employ a variety of special techniques, such as sample looping, 
pitch shifting, mathematical interpolation, and polyphonic digital 
filtering, in order to reduce the amount of memory required to store 
the sound  samples (or to get more types of sounds from a given amount 
of memory).  These sample-based synthesis systems are often called 
"wavetable" synthesizers (the sample memory in these systems contains 
a large number of sampled sound segments, and can be thought of as 
a "table" of sound waveforms which may be looked up and utilized when 
needed).  A number of the special techniques employed in this type 
of synthesis are discussed in the following paragraphs.

Wavetable Synthesis Techniques

Looping and Envelope Generation

One of the primary techniques used in wavetable synthesizers to conserve 
sample memory space is the looping of sampled sound segments.  For 
a large number of instrument sounds, the sound can be modeled as consisting 
of two major sections, the attack section and the sustain section.  The 
attack section is the initial part of the sound, where the amplitude 
and the spectral characteristics of the sound may be changing very 
rapidly.  The sustain section of the sound is that part of the sound 
following the attack, where the characteristics of the sound are changing 
less dynamically.  Figure 4 shows a waveform with portions which could 
be considered the attack and the sustain sections indicated.  In this 
example, the spectral characteristics of the waveform remain constant 
throughout the sustain section, while the amplitude is decreasing 
at a fairly constant rate.  This is an exaggerated example, in most 
natural instrument sounds, both the spectral characteristics and the 
amplitude continue to change through the duration of the sound.  The 
sustain section, if one can be identified, is that section for which 
the characteristics of the sound are relatively constant.

A great deal of memory can be saved in wave-table synthesis systems 
by storing only a short segment of the sustain section of the waveform, 
and then looping this segment during playback.  Figure 5 shows a two 
period segment of the sustain section from the waveform in Figure 
4, which has been looped to create a steady state signal.  If the 
original sound had a fairly constant spectral content and amplitude 
during the sustained section, then the sound resulting from this looping 
operation should be a good approximation of the sustained section 
of the original.

For many acoustic string instruments, the spectral characteristics 
of the sound remain fairly constant during the sustain section, while 
the amplitude of the signal decays.  This can be simulated with a 
looped segment by multiplying the looped samples by a decreasing gain 
factor during playback to get the desired shape or envelope.  The 
amplitude envelope of a sound is commonly modeled as consisting of 
some number of linear segments.  An example is the commonly used four 
part piecewise-linear Attack-Decay-Sustain-Release (ADSR) envelope 
model.  Figure 6 depicts a typical ADSR envelope shape, and Figure 
7 shows the result of applying this envelope to the looped waveform 
from Figure 5.

A typical wavetable synthesis system would store separate sample segments 
for the attack section and the looped section of an instrument.  These 
sample segments might be referred to as the initial sound and the 
loop sound.  The initial sound is played once through, and then the 
loop sound is played repetitively until the note ends.  An envelope 
generator function is used to create an envelope which is appropriate 
for the particular instrument, and this envelope is applied to the 
output samples during playback.  Playback of the initial wave (with 
the the Attack portion of the envelope applied) begins when a Note 
On message is received.  The length of the initial sound segment is 
fixed by the number of samples in the segment, and the length of the 
Attack and Decay sections of the envelope are generally also fixed 
for a given instrument sound.  The sustain section will continue to 
repeat the loop samples while applying the Sustain envelope slope 
(which decays slowly in our examples), until a Note Off message is 
applied.  The Note Off message triggers the beginning of the Release 
portion of the envelope.

Loop Length

The loop length is measured as a number of samples, and the length 
of the loop should be equal to an integral number of periods of the 
fundamental pitch of the sound being played (if this is not true, 
then an undesirable  "pitch shift" will occur during playback when 
the looping begins).  Of course, the length of the pitch period of 
a sampled instrument sound will generally not work out to be an integral 
number of sample periods.  Therefore, it is common to perform a "resampling" 
process on the original sampled sound, to get new a new sound sample 
for which the pitch period is an integral number of sample periods.<N>

In practice, the length of the loop segment for an acoustic instrument 
sample may be many periods with respect to the fundamental pitch of 
the sound.  If the sound has a natural vibrato or chorus effect, then 
it is generally desirable to have the loop segment length be an integral 
multiple of the period of the vibrato or chorus.  

One-Shot Sounds

The previous paragraphs discussed dividing a sampled sound into an 
attack section and a sustain section, and then using looping techniques 
to minimize the storage requirements for the sustain portion.  However, 
some sounds, particularly sounds of short duration or sounds whose 
characteristics change dynamically throughout their duration, are 
not suitable for looped playback techniques.  Short drum sounds often 
fit this description.  These sounds are stored as a single sample 
segment which is played once through with no looping.  This class 
of sounds are referred to as "one-shot" sounds. 

Sample Editing and Processing

There are a number of sample editing and processing steps involved 
in preparing sampled sounds for use in a wave-table synthesis system.  The 
requirements for editing the original sample data to identify and 
extract the initial and loop segments, and for resampling the data 
to get a pitch period length which is an integer multiple of the sampling 
period, have already been mentioned.  

Editing may also be required to make the endpoints of the loop segment 
compatible.  If the amplitude and the slope of the waveform at the 
beginning of the loop segment do not match those at the end of the 
loop, then a repetitive "glitch" will be heard during playback of 
the looped section.  Additional processing may be performed to "compress" 
the dynamic range of the sound to improve the signal/quantizing noise 
ratio or to conserve sample memory.  This topic is addressed next. 


When all of the sample processing has been completed, the resulting 
sampled sound segments for the various instruments are tabulated to 
form the sample memory for the synthesizer.

Sample Data Compression

The signal-to-quantizing noise ratio for a digitally sampled signal 
is limited by sample word size (the number of bits per sample), and 
by the amplitude of the digitized signal.  Most acoustic instrument 
sounds reach their peak amplitude very quickly, and the amplitude 
then slowly decays from this peak.  The ear's sensitivity dynamically 
adjusts to signal level.  Even in systems utilizing a relatively small 
sample word size, the quantizing noise level is generally not perceptible 
when the signal is near maximum amplitude.  However, as the signal 
level decays, the ear becomes more sensitive, and the noise level 
will appear to increase.   Of course, using a larger word size will 
reduce the quantizing noise, but there is a considerable price penalty 
paid if the number of samples is large.

Compression techniques may be used to improve the signal-to-quantizing 
noise ratio for some sampled sounds.  These techniques reduce the 
dynamic range of the sound samples stored in the sample memory. The 
sample data is decompressed during playback to restore the dynamic 
range of the signal.  This allows the use of sample memory with a 
smaller word size (smaller dynamic range) than is utilized in the 
rest of the system.  There are a number of different compression techniques 
which may be used to compress the dynamic range of a signal.  

For signals which begin at a high amplitude and decay in a fairly 
linear fashion, a simple compression technique can be effective.  If 
the slope of the decay envelope of the signal is estimated, then an 
envelope with the complementary slope (the negative of the decay slope) 
can be constructed and applied to the original sample data.  The resulting 
sample data, which now has a flat envelope, can be stored in the sample 
memory, utilizing the full dynamic range of the memory.   The decay 
envelope can then be applied to the stored sample data during sound 
playback to restore the envelope of the original sound.

Note that there is some compression effect inherent in the looping 
techniques described earlier.  If the loop segment is stored at an 
amplitude level which makes full use of the dynamic range available 
in the sample memory, and the processor and D/A converters used for 
playback have a wider dynamic range than the sample memory, then the 
application of a decay envelope during playback will have a decompression 
effect similar to that described in the previous paragraph.

Pitch Shifting

In order to minimize sample memory requirements, wavetable synthesis 
systems utilize pitch shifting, or pitch transposition techniques, 
to generate a number of different notes from a single sound sample 
of a given instrument.  For example, if the sample memory contains 
a sample of a middle C note on the acoustic piano, then this same 
sample data could be used to generate the C# note or D note above 
middle C using pitch shifting.  

Pitch shifting is accomplished by accessing the stored sample data 
at different rates during playback.  For example, if a pointer is 
used to address the sample memory for a sound, and the pointer is 
incremented by one after each access, then the samples for this sound 
would be accessed sequentially, resulting in some particular pitch. 
If the pointer increment was two rather than one, then only every 
second sample would be played, and the resulting pitch would be shifted 
up by one octave (the frequency would be doubled).  

Frequency Accuracy 

In the previous example, the sample memory address pointer was incremented  by 
an integer number of samples.  This allows only a limited set of pitch 
shifts.  In a more general case, the memory pointer would consist 
of an integer part and a fractional part, and the increment value 
could  be a fractional number of samples.  The integer part of the 
address pointer is used to address the sample memory, the fractional 
part is used to maintain frequency accuracy.  For example if the increment 
value was equivalent to 1/2, then the pitch would be shifted down 
by one octave (the frequency would be halved).  When non-integer increment 
values are utilized, the frequency resolution for playback is determined 
by the number of bits used to represent the fractional part of the 
address pointer and the address increment parameter.

Interpolation

When the fractional part of the address pointer is non-zero, then 
the "desired value" falls between available data samples.  Figure 
8 depicts a simplified addressing scheme wherein the Address Pointer 
and the increment parameter each have a 4-bit integer part and a 4-bit 
fractional part.  In this case, the increment value is equal to 1 
1/2 samples.  Very simple systems might simply ignore the fractional 
part of the address when determining the sample value to be sent to 
the D/A converter. The data values sent to the D/A converter when 
using this approach are indicated in the Figure 8, case I.   A slightly 
better approach would be to use the nearest available sample value.  More 
sophisticated systems would perform some type of mathematical interpolation 
between available data points in order to get a value to be used for 
playback.  Values which might be sent to the D/A when interpolation 
is employed are shown as case II.  Note that the overall frequency 
accuracy would be the same for both cases indicated, but the output 
is severely distorted in the case where interpolation is not used.

There are a number of different algorithms used for interpolation 
between sample values.  The simplest is linear interpolation.  With 
linear interpolation, interpolated value is simply the weighted average 
of the two nearest samples, with the fractional address used as a 
weighting constant.  For example, if the address pointer indicated 
an address of (n+K), where n is the integer part of the address and 
K is the fractional part, than the interpolated value can be calculated 
as s(n+K) = (1-K)s(n) + (K)s(n+1), where s(n) is the sample data value 
at address n. More sophisticated interpolation techniques can can 
be utilized to further reduce distortion, but these techniques are 
computationally expensive. 

Oversampling

Oversampling of the sound samples may also be used to improve distortion 
in wavetable synthesis systems.  For example, if 4X oversampling were 
utilized for a particular instrument sound sample, then an address 
increment value of 4 would be used for playback with no pitch shift.  The 
data points chosen during playback will be closer to the "desired 
values", on the average, than they would be if no oversampling were 
utilized because of the increased number of data points used to represent 
the waveform.  Of course, oversampling has a high cost in terms of 
sample memory requirements.

In many cases, the best approach may be to utilize linear interpolation 
combined with varying degrees of oversampling where needed. The linear 
interpolation technique provides reasonable accuracy for many sounds, 
without the high penalty in terms of processing power required for 
more sophisticated interpolation methods.  For those sounds which 
need better accuracy, oversampling is employed. With this approach, 
the additional memory required for oversampling is only utilized where 
it is most needed.  The combined effect of linear interpolation and 
selective oversampling can produce excellent results.

Splits

When the pitch of a sampled sound is changed during playback, the 
timbre of the sound is changed somewhat also.  For small changes in 
pitch (up to a few semitones), the timbre change is generally not 
noticed.  However, if a large pitch shift is used, the resulting note 
will sound unnatural.  Thus, a particular sample of an instrument 
sound will be useful for recreating a limited range of notes using 
pitch shifting techniques.  To get coverage of the entire instrument 
range, a number of different samples of the instrument are used, and 
each of these samples is used to synthesize a limited range of notes.  This 
technique can be thought of as splitting a musical instrument keyboard 
into a number of ranges of notes, with a different sound sample used 
for each range. Each of these ranges is referred to as a split, or 
key split.

Velocity splits refer to the use of different samples for different 
note velocities.  Using velocity splits, one sample might be utilized 
if a particular note is played softly, where a different sample would 
be utilized for the same note of the same instrument when played with 
a higher velocity. 

Note that the explanations above refer to the use of key splits and 
velocity splits in the sound synthesis process.  In this case, the 
different splits utilize different samples of the same instrument 
sound.  Key splitting and velocity splitting techniques are also utilized 
in a performance context.  In the performance context, different splits 
generally produce different instrument sounds.  For instance, a keyboard 
performer might want to set up a key split which would play a fretless 
bass sound from the lower octaves of his keyboard, while the upper 
octaves play the vibraphone.  Similarly, a velocity split might be 
set up to play the acoustic piano sound when keys are played with 
soft to moderate velocity, but an orchestral string sound plays when 
the keys are pressed with higher velocity.

Aliasing Noise

The previous paragraph discussed the timbre changes which result from 
pitch shifting.  The resampling techniques used to shift the pitch 
of a stored sound sample can also result in the introduction of aliasing 
noise into an instrument sound.  The generation of aliasing noise 
can also limit the amount of pitch shifting which may be effectively 
applied to a sound sample.  Sounds which are rich in upper harmonic 
content will generally have more of a problem with aliasing noise.  Low-pass 
filtering applied after interpolation can help eliminate the undesirable 
effect of aliasing noise.  The use of oversampling also helps eliminate 
aliasing noise.

LFOs for vibrato and tremolo

Vibrato and tremolo are effects which are often produced by musicians 
playing acoustic instruments.  Vibrato is basically a low-frequency 
modulation of the pitch of a note, while tremolo is modulation of 
the amplitude of the sound.  These effects are simulated in synthesizers 
by implementing low-frequency oscillators (LFOs) which are used to 
modulate the pitch or amplitude of the synthesized sound being produced.  Natural 
vibrato and tremolo effects tend to increase in strength as a note 
is sustained.  This is accomplished in synthesizers by applying an 
envelope generator to the LFO.  For example, a flute sound might have 
a tremolo effect which begins at some point after the note has sounded, 
and the tremolo effect gradually increases to some maximum level, 
where it remains until the note stops sounding.

Layering

Layering refers to a technique in which multiple sounds are utilized 
for each note played.  This technique can be used to generate very 
rich sounds, and may also be useful for increasing the number of instrument 
patches which can be created from a limited sample set.  Note that 
layered sounds generally utilize more than one voice of polyphony 
for each note played, and thus the number of voices available is effectively 
reduced when these sounds are being used.

Polyphonic Digital Filtering for Timbre Enhancement

It was mentioned earlier that low-pass filtering may be used to help 
eliminate noise which may be generated during the pitch shifting process.  There 
are also a number of ways in which digital filtering is used in the 
timbre generation process to improve the resulting instrument sound.  In 
these applications, the digital filter implementation is polyphonic, 
meaning that a separate filter is implemented for each voice being 
generated, and the filter implementation should have dynamically adjustable 
cutoff frequency and/or Q.

For many acoustic instruments, the character of the tone which is 
produced changes dramatically as a function of the amplitude level 
at which the instrument is played.  For example, the tone of an acoustic 
piano may be very bright when the instrument is played forcefully, 
but much more mellow when it is played softly.  Velocity splits, which 
utilize different sample segments for different note velocities, can 
be implemented to simulate this phenomena.  Another very powerful 
technique is to implement a digital low-pass filter for each note 
with a cutoff frequency which varies as a function of the note velocity.  This 
polyphonic digital filter dynamically adjusts the output frequency 
spectrum of the synthesized sound as a function of note velocity, 
allowing a very effective recreation of the acoustic instrument timbre.

Another important application of polyphonic digital filtering is in 
smoothing out the transitions between samples in key-based splits.  At 
the border between two splits, there will be two adjacent notes which 
are based on different samples.  Normally, one of these samples will 
have been pitch shifted up to create the required note, while the 
other will have been shifted down in pitch.  As a result, the timbre 
of these two adjacent notes may be significantly different, making 
the split obvious.   This problem may be alleviated by employing a 
polyphonic digital filter which uses the note number to control the 
filter characteristics.  A table may be constructed containing the 
filter characteristics for each note number of a given instrument.  The 
filter characteristics are chosen to compensate for the pitch shifting 
associated with the key splits used for that instrument.

It is also common to control the characteristics of the digital filter 
using an envelope generator or an LFO.  The result is an instrument 
timbre which has a spectrum which changes as a function of time.  For 
example, It is often desirable to generate a timbre which is very 
bright at the onset, but which gradually becomes more mellow as the 
note decays.  This can easily be done using a polyphonic digital filter 
which is controlled by an envelope generator.

The PC to MIDI Interface and the MPU-401

To use MIDI with a personal computer, a PC to MIDI interface product 
is generally required (there are a few personal computers which come 
equipped with built-in MIDI interfaces).  There are a number of MIDI 
interface products for PCs.  The most common types of MIDI interfaces 
for IBM compatibles are add-in cards which plug into an expansion 
slot on the PC bus, but there are also serial port MIDI interfaces 
(connects to a serial port on the PC) and parallel port MIDI interfaces 
(connects to the PC printer port).  The fundamental function of a 
MIDI interface for the PC is to convert parallel data bytes from the 
PC data bus into the serial MIDI data format and vice versa (a UART 
function).  However, "smart" MIDI interfaces may provide a number 
of more sophisticated functions, such as generation of MIDI timing 
data, MIDI data buffering, MIDI message filtering, synchronization 
to external tape machines, and more.

The defacto standard for MIDI interface add-in cards for the PC is 
the Roland MPU-401 interface.  The MPU-401 is a smart MIDI interface, 
which also supports a dumb mode of operation (often referred to as 
"pass-through mode" or "UART mode").  There are a number of MPU-401 
compatible MIDI interfaces on the market.  In addition, many add-in 
sound cards include built-in MIDI interfaces which implement the UART 
mode functions of the MPU-401.

Compatibility Considerations for MIDI Applications on the 
PC

There are two levels of compatibility which must be considered for 
MIDI applications running on the PC.  First is the compatibility of 
the application with the MIDI interface being used. The second is 
the compatibility of the application with the MIDI synthesizer.  Compatibility 
considerations under DOS and the Microsoft Windows operating system 
are discussed in the following paragraphs.

DOS Applications

DOS applications which utilize MIDI synthesizers include MIDI sequencing 
software, music scoring applications, and a variety of games. In terms 
of MIDI interface compatibility, virtually all of these applications 
support the MPU-401 interface, and most utilize only the UART mode.  These 
applications should work correctly if the PC is equipped with a MPU-401, 
a full-featured MPU-401 compatible, or a sound card with a MPU-401 
UART-mode capability.  Other MIDI interfaces, such as serial port 
or parallel port MIDI adapters, will only work if the application 
provides support for that particular model of MIDI interface.  

A particular application may provide support for a number of different 
models of synthesizers or sound modules.  Prior to the General MIDI 
standard, there was no widely accepted standard patch set for synthesizers, 
so applications generally needed to provide support for each of the 
most popular synthesizers at the time. If the application did not 
support the particular model of synthesizer or sound module that was 
attached to the PC, then the sounds produced by the application might 
not be the sounds which were intended.  Modern applications can provide 
support for a General MIDI (GM) synthesizer, and any GM-compatible 
sound source should produce the correct sounds.  Some other models 
which are commonly supported are the Roland MT-32, the Roland LAPC-1, 
and the Roland Sound Canvas.  The Roland MT-32 was an external MIDI 
sound module which utilized Roland's  Linear Additive (LA) synthesis, 
and the MT-32 combined with an MPU-401 interface became a popular 
MIDI synthesis platform for the PC.  The LAPC-1 was a PC add-in card 
which combined the MT-32 synthesis function with the MPU-401 MIDI 
interface.  The Sound Canvas is Roland's General Synthesizer (GS) 
sound module, and this unit has become an industry standard.

Microsoft Windows and the Multimedia PC (MPC)

The number of applications for high quality audio functions on the 
PC (including music synthesis) grew explosively after the introduction 
of Microsoft Windows 3.0 with Multimedia Extensions ("Windows with 
Multimedia") in 1991. The Multimedia PC (MPC) specification, originally 
published by Microsoft in 1991 and now published by the Multimedia 
PC Marketing Council (a subsidiary of the Software Publishers Association), 
specifies minimum requirements for multimedia-capable Personal Computers.  A 
system which meets these requirements will be able to take full advantage 
of Windows with Multimedia.  Note that many of the functions originally 
included in the Multimedia Extensions have been incorporated into 
the Windows 3.1 operating system.  

The audio capabilities utilized by Windows 3.1 or Windows with Multimedia 
include audio recording and playback (linear PCM sampling), music 
synthesis, and audio mixing. In order to support the required music 
synthesis functions, MPC-compliant audio adapter cards must have on-board 
music synthesizers. 

The MPC specification defines two types of synthesizers; a "Base Multitimbral 
Synthesizer", and an "Extended Multitimbral Synthesizer". Both the 
Base and the Extended synthesizer must support the General MIDI patch 
set.  The difference between the Base and the Extended synthesizer 
requirements is in the minimum number of notes of polyphony, and the 
minimum number of simultaneous timbres which can be produced.  Base 
Multitimbral Synthesizers must be capable of playing 6 "melodic notes" 
and "2 percussive" notes simultaneously, using 3 "melodic timbres" 
and 2 "percussive timbres". The formal requirements for an Extended 
Multitimbral Synthesizer are only that it must have capabilities which 
exceed those specified for a Base Multitimbral Synthesizer.  However, 
the "goals" for an Extended synthesizer include the ability to play 
16 melodic notes and 8 percussive notes simultaneously, using 9 melodic 
timbres and 8 percussive timbres.

The MPC specification also includes an authoring standard for MIDI 
composition.  This standard requires that each MIDI file contain two 
arrangements of the same song, one for Base synthesizers and one for 
Extended synthesizers.  The MIDI data for the Base synthesizer arrangement 
is sent on MIDI channels 13 - 16 (with the percussion track on channel 
16), and the Extended synthesizer arrangement utilizes channels 1 
- 10 (percussion is on channel 10). This technique allows a single 
MIDI file to play on either type of synthesizer.

Windows applications generally address hardware devices such as MIDI 
interfaces or synthesizers through the use of drivers. The drivers 
provide applications software with a common interface through which 
hardware may be accessed, and this simplifies the hardware compatibility 
issue.  Before a synthesizer is used, a suitable driver must be installed 
using the Windows Driver applet within the Control Panel. The device 
drivers supplied with Windows 3.1 include a driver for the MPU-401/LAPC-1 
MIDI interface, and a driver for the original AdLib FM synthesizer 
card. Most other MIDI interfaces and/or synthesizers are shipped with 
their own Windows drivers.

When a MIDI interface or synthesizer is installed in the PC and a 
suitable device driver has been loaded, the Windows MIDI Mapper applet 
will appear within the Control Panel. MIDI messages are sent from 
an application to the MIDI Mapper, which then routes the messages 
to the appropriate device driver. The MIDI Mapper may be set to perform 
some filtering or translations of the MIDI messages in route from 
the application to the driver. The processing to be performed by the 
MIDI Mapper is defined in the MIDI Mapper Setups, Patch Maps, and 
Key Maps.  

MIDI Mapper Setups are used to assign MIDI channels to device drivers.  For 
instance, If you have an MPU-401 interface with a General MIDI synthesizer 
and you also have a Creative Labs Soundblaster card in your system, 
you might wish to assign channels 13 to 16 to the Ad Lib driver (which 
will drive the Base-level FM synthesizer on the Soundblaster), and 
assign channels 1 - 10 to the MPU-401 driver.  In this case, MPC compatible 
MIDI files will play on both the General MIDI synthesizer and the 
FM synthesizer at the same time. The General MIDI synthesizer will 
play the Extended arrangement on MIDI channels 1 - 10, and the FM 
synthesizer will play the Base arrangement on channels 13-16.  The 
MIDI Mapper Setups can also be used to change the channel number of 
MIDI messages.  If you have MIDI files which were composed for  a 
General MIDI instrument, and you are playing them on a Base Multitimbral 
Synthesizer, you would probably want to take the MIDI percussion data 
coming from your application on channel 10 and send this information 
to the device driver on channel 16.

The MIDI Mapper patch maps are used to translate patch numbers when 
playing MPC or General MIDI files on synthesizers which do not use 
the General MIDI patch numbers.  Patch maps can also be used to play 
MIDI files which were arranged for non-GM synthesizers on GM synthesizers.  For 
example, the Windows-supplied MT-32 patch map can be used when playing 
GM-compatible .MID files on the Roland MT-32 sound module or LAPC-1 
sound card.  

The MIDI Mapper key maps perform a similar function, translating the 
key numbers contained in MIDI Note On and Note Off messages.  This 
capability is useful for translating GM-compatible percussion parts 
for playback on non-GM synthesizers or vice-versa.  The Windows-supplied 
MT-32 key map changes the key-to-drum sound assignments used for General 
MIDI to  those used by the MT-32 and LAPC-1. 

Some MIDI applications, such as MIDI sequencer software packages, 
can be set to make use of the MIDI Mapper, or to address the device 
driver directly (bypassing the MIDI Mapper).  Other Windows applications 
always utilize the MIDI Mapper.  

Summary

The MIDI protocol provides an efficient format for conveying musical 
performance data, and the Standard MIDI Files specification ensures 
that different applications can share time-stamped MIDI data.  The 
storage efficiency of the MIDI file format makes MIDI an attractive 
vehicle for generation of sounds in multimedia applications, computer 
games, or high-end karaoke equipment.  The General MIDI system provides 
a common set of capabilities and a common patch map for high polyphony, 
multi-timbral synthesizers.  General MIDI-compatible Synthesizers 
employing high quality wavetable synthesis techniques provide an ideal 
MIDI sound generation facility for multimedia applications.