Python and Sound

Topics

Exploring Sound
- What is Sound?
- How is Sound Stored?

JES Code for Working with Sound
MediaTools Explore
Audacity

Changing the Volume
Splicing Sounds

Combining Two Sounds Together
Creating General Clip and Copy Functions
Making a Library

References
Exercise

Please get the code examples and sounds used in this lab, by clicking here

1. Exploring Sound

In order to understand what we are doing in this lab, it is helpful to go through some terminology and learn about sound and how it is stored. Once you understand a little about sound, we will take a look at JES code for working with sound, and an "explorer" tool, which allows you to play sound and view the values. Finally, to record sound, we will provide a brief overview of Audacity.

1.1 What is Sound?

The answer to the question of, "what is sound?" is really a physics answer. Have you ever struck a tuning fork and placed it in water or dropped something in water? If you have, you would have seen waves. Sounds are waves in air, which are picked up by sensors in our ears. Because sounds are waves, let us take a look at a basic sine wave (which we would hear as pure single tone).

Sine graph

Picture was taken from: http://www.purplemath.com/modules/grphtrig.htm

Some important definitions are:

cycle. Take a look at the graph and notice that there is one "up peak" and one "down peak". The completion of the up and down "wave" is one cycle.
amplitude. Notice how the graph starts at 0 and goes up to 1 and eventually goes down to -1. The distance from zero to the greatest (or least pressure) is the amplitude. In the diagram above, the amplitude is 1. "In general, amplitude is the most important factor in our perception of volume: if the amplitude rises, we typically perceive the sound as being louder" (page 147, Computing and Programming in Python, A Multimedia Approach by Mark Guzdial)
frequency. How often a cycle occurs is called frequency. Frequency is measured in cycles per second (cps) or Hertz (Hz). Frequency and pitch are related: the more cycles, the higher the pitch.

Questions:

Would the following diagram represent a sound with a higher or lower pitch than the original sine wave diagram?
Would we perceive this sound to be louder or quieter than the original sine wave diagram?

The sounds that we hear are not typically pure tones: they will be composed of waves of different frequency. When you are looking at sound waves in the following sections, you will see more rough edges than the smooth and regular sine waves. This is the result of several frequencies combining together.

1.2 How is Sound Stored?

If you want to capture a sine wave such as the above, you could use an array. For instance taking a sample at (π/2)t, your array would look something like this:

0	3	0	-3	0	3	0

The resulting "wave" would look very triangular. Ideally, more samples could be taken. However, you can understand the idea of representing the wave.

Two questions come in to play when storing sound:

What is the maximum amplitude to be stored? This will determine how many bytes in memory you will use for each sound sample or array element. For instance, if you want to capture amplitudes from 32 767 (2¹⁵-1) to -32 768 (-2¹⁵⁾, then you will need 16 bits (2 bytes) for each element.
How many samples or array elements will you have for every second of recording? For instance, the array above could have more samples to "smooth" the wave. The rate at which samples are collected is the sample rate. Some typical sample rates are below:
- For CD-quality sounds: 44 100 samples per second. That means that one minute has 60 x 44 100=2 646 000 elements.
- Our telephone is designed to capture 8 000 samples per second. That means that one minute has 60 x 8 000=480 000 elements

Now that you understand a little more about sound, let us dive in with JES.

1.3 JES Code for Working with Sound

You will find that some functions that we use for sound are not part of Python; they only work inside of JES. Examples of JES specific functions are:

pickAFile, which pops up a file choosing dialog and returns a string for the file chosen
makeSound, which creates a Sound object from a file
getSampleValueAt, which returns a sample value (of the sound) at a specified index
getLength, which returns the number of samples in the sound
getSamplingRate, which returns the number of samples per second for the sound
play, which plays the sound provided as input
blockingPlay, which plays the sound provided as input and makes sure that no other sound plays at the same time
playAtRate, which plays a sound at a given time (2.0 is twice as fast, 0.5 is half as fast)

If you are not sure which functions are part of Python or which are part of JES, you can look in the JES menu under Help > Understanding Sound. Click on "Sound Functions in JES". You might find some cool stuff in there. (As an aside, playAtRate is not in the Help documentation, but mentioned in the Computing and Programming in Python, a Multimedia Approach book).

The following is a compilation of all of the JES functions from above:

def soundExplore():
  fileName=pickAFile()
  aSound=makeSound(fileName)
  print "Information about aSound", aSound
  print "Getting First Sample:", getSampleValueAt(aSound, 0)
  print "Getting the Length:", getLength(aSound)
  print "Getting Sampling Rate:", getSamplingRate(aSound)
  play(aSound)
  #blockingPlay(aSound)
  playAtRate(aSound,2.0)
  explore(aSound)
  return aSound

Notice that when you run the code and select "always.wav" the following is output:

>>> soundExplore()
Information about aSound Sound file: always.wav number of samples: 15394
Getting First Sample: 513
Getting the Length: 15394
Getting Sampling Rate: 22050.0
Sound file: always.wav number of samples: 15394

You might hear two versions of "always" being played. The one version is twice as fast and sounds like "Mickey Mouse". Try commenting out the play (put a # in front of it) and uncomment the blockingPlay to see what happens.

The other two sounds included in this lab do not work with 2.0 sent as an argument to playAtRate, but you can try them with 0.5 as an argument instead.

What difference do you notice in the sampling rate between "CS325.wav" and "always.wav"?

1.4 MediaTools Explore

After you ran the code above, you will have noticed that an additional window has appeared. This is the"Explorer" tool for sound, which is bundled together with JES as part of the MediaTools application. This is what it should look like for you (with "always.wav"):

Sound Explorer

Notice in the above diagram that the current index and sample value are encircled in red. At index 0, the sample value is 513.

If you change the index to 2496 (by typing in the box), what sample value do you get?
If you click anywhere in the black, what happens to the index and sample value?
What happens when you click on the arrow buttons?
Can you play "ways"?

You can listen to excerpts out of a recording such as this. For instance, if you want "way" from "always", you can click and drag between approximately 4296 and 9000. Click on "Play Selection" to hear that range. See below for an example of selecting "way" from the recording.

Explorer Ranges

You might be asking, "how do you know where to choose your selections?". Mostly it comes through experimentation. Look for places where the waveform changes. Small jags in the graph are either background noise (from silence between words) or sounds like "s".

1.5 Audacity

In the exercise for this lab, you will be asked to make your own sound recordings. One way of doing this is by using Audacity. Audacity is a free, easy-to-use audio editor and recorder that works on Windows, Mac OS X and other operating systems. If you would like to try it at home, you can get it here: http://audacity.sourceforge.net/download/

This section is a crash course in Audacity. The focus is on recording, trimming, and exporting.

1.5.1 Recording

The interface for Audacity looks like the following:

Audacity Recording

Before we record, we should adjust the two settings encircled in red above:

Change the sampling rate to 22050
Change to Mono Input Channel

Now, all we have to do is hit the "record" button:

When we are done recording, we can click on the "stop" button:

To listen to our recording, we can click on the "play" button:

1.5.2 Trimming

Audacity probably caught you unaware! Before you realized that it was recording, it had captured a few seconds of background noise. If you would like to trim your selection:

Click and drag on the waveform to select part of your recording.
Play the selection to make sure that you have everything that you need.
Click on the "Trim" button shown below:

1.5.3 Exporting

To "Save" the file in a format that you can use for JES, you will "Export" your file as a "wav". The steps are:

Under the main menu, select File > Export...
In the dialog box that appears:
- Type a name
- Choose a directory
- Ensure that the format is: WAV(Microsoft) signed 16 bit PCM
Click on the Save button
On the next dialog box, click on OK

1.5.4 Removing Tracks

Once you have recorded something in Audacity, it keeps that recording as a "track". Any additional recordings are added to your sound. This is not ideal if you would like to record something new. To delete old tracks, click on the X in the upper left hand corner of the track as highlighted in red below:

Remove Track

2. Changing the Volume

As discussed in the above sections, sound is a wave of air pressure and it can be sampled and stored in an array. If you change the values stored in the array, the sound will change. One change we can make is to multiply all the elements by some ratio or value. Effectively, this will change the amplitude of the wave. If we change the amplitude, we change the...what?

The question becomes: "how do we travel through all of the elements?". There are two ways; both involve a for loop. The first way is to generate a list of all of the samples, using a function called getSamples(). The second way is to use the for loop to travel through the indices of the sound sample. The following sub-sections show the two ways of travelling through the sound samples as well as different ways to modify the volume and a side-effect of modifying volume.

2.1 Looping through Sound Samples

One way you can travel through the sound is by using getSamples, which returns a list all the samples (Sample objects) in the sound. The code to travel through this list is below:

#Program 56, page 161
def increaseVolume(sound):
  for sample in getSamples(sound):
    value =getSampleValue(sample)
    setSampleValue(sample,value*2)

Other new functions used in this code are:

getSampleValue(), which returns an integer value from the Sample object
setSampleValue(), which modifies the sound sample value to twice as much as it was before (value*2)

To try out this code and listen to the results, type the following:

>>> mySound=returnSound()
>>> explore(mySound)
>>> increaseVolume(mySound)
>>> explore(mySound)

Where, returnSound() is a helper function, included in this lab's sample code file. The helper function calls pickAFile() and makeSound()and returns the sound. We call the explore() function twice so that you can see the waveform and hear the change in volume for the original versus the modification.

2.2 Looping through a Range

The other way of travelling through the sound is by accessing the sample values using an index. Notice that the code below uses the range() function to generate a sequence (or list), which goes from 0 to (getLength(sound)-1).

#program 61, modified from page 175
def increaseVolume2(sound):
  for index in range(0, getLength(sound)):
    value =getSampleValueAt(sound,index)
    setSampleValueAt(sound,index,value*2)

To access/modify the samples at an index, two additional functions are used in the above code:

getSampleValueAt(), which gets the sound's sample value at a specific index
setSampleValueAt(), which sets the sound's sample value (to twice as much) at a specific index

This increaseVolume2() function does the exact same thing as the increaseVolume() function in section 2.1. Why would we choose to use this version of for loop instead?

2.3 Creating a Generic Volume Modifier

Instead of having a function that only doubles the values of the sound samples, we can create a more generic volume function with a factor sent as an argument:

#program 58, page 167
def changeVolume(sound, factor):
  for sample in getSamples(sound):
    value =getSampleValue(sample)
    setSampleValue(sample, value*factor)

Notice that our sample values are multiplied by factor. What would the following calls do to the volume?

changeVolume(mySound, 2.0)
changeVolume(mySound, 0.5)
changeVolume(mySound, 5.0)
changeVolume(mySound, 0.2)

Be aware that "mySound" will be modified after each call to changeVolume().

2.4 Normalizing Sound

Playing with volume is pretty rewarding! What if you want to make the sound as loud as possible? You could try through experimentation to find a "multiplier" to use. That would, however, be tedious. What if you had some code that would calculate that "multiplier"! The following code does just that. It finds a multiplier that will give us the loudest volume that we can get based on a maximum sound sample and then boosts the sound values by that amount.

#program 59, page 168
def normalize(sound):
  largest=0
  for s in getSamples(sound):
    largest=max(largest,getSampleValue(s))
  multiplier=32767.0/largest
  print "Largest sample value in original sound was", largest
  print "Multiplier is", multiplier
  
  for s in getSamples(sound):
    louder = multiplier * getSampleValue(s)
    setSampleValue(s, louder)

The algorithm is as follows:

Find the maximum sample value by travelling through all of the samples and using the max() function to return the largest of two integers
Because the maximum amplitude we can get with 16 bits is 32767, our formula to boost the largest sample value to 32767 would be:
largest * multiplier=32767
To solve for what the multiplier is, we get the formula:
multiplier=32767/largest
Now that we have multiplier, we can travel through all the sound samples and multiply them by that amount

Could we have used our changeVolume() function in this code?

2.5 Note on Clipping

You might have noticed that when you increase the volume of "always.wav" that some strange sounds result: it might sound like your speakers are breaking. If you run the normalize() function on this sound sample, you will notice that the multiplier is approximately 1.4 (less than double). Because the increaseVolume() function is multiplying everything by 2.0, the largest sample will exceed 32 767. This is referred to as clipping. In other words, "the normal curves of the sound are broken by the limitations of the sample size" (page 169, Computing and Programming in Python, A Multimedia Approach by Mark Guzdial). If you look at the wave in signal view, it looks like someone has taken the scissors and clipped off the peaks of the waves. Watch out for this effect in your recordings! You will see many wave peaks extending out to the edges.

3. Splicing Sounds

And now for the section that you all have been anticipating!! How do you put pieces of sound together? For instance, you want to insert words that were not in an original recording. This section answers that question.

From a definition point of view, this is referred to as splicing sounds, "a term that dates back to when sounds were recorded on tape--juggling the order of things on the tape involved literally cutting the tape into segments and then gluing it back together in the right order." (page 177, of Computing and Programming in Python, A Multimedia Approach by Mark Guzdial)

Bundled together with the Python code for this lab are three sound samples. We will splice these sound samples in the following subsections. First, we will look at putting two sounds together by copying the values at specific ranges. Then, we will create some functions that will help us extract pieces of sound and copy them together.

3.1 Combining Two Sounds Together

The following code creates a new sound that splices "cs325" with "is fun" from "what_is_fun.wav".

#program 63, modified from page 177
#call setMediaPath() before calling this function
def merge():
  cs325Sound = makeSound(getMediaPath("CS325.wav"))
  isFunSound = makeSound(getMediaPath("what_is_fun.wav"))
  #target = makeSound(getMediaPath("sec3silence.wav"))
  samplingRate = int(getSamplingRate(cs325Sound))
  cs325Len = getLength(cs325Sound)
  silenceLen = int (0.1*samplingRate)
  isFunLen = getLength(isFunSound)
 
  target=makeEmptySound(cs325Len + silenceLen + isFunLen)
  #target=makeEmptySound(cs325Len + silenceLen + isFunLen, samplingRate)
  print "CS325 Sampling Rate is ", getSamplingRate(cs325Sound)
  print "Target Sampling Rate is ", getSamplingRate(target)
  index=0
  #Copy in "CS325"
  for source in range(0, getLength(cs325Sound)):
    value=getSampleValueAt(cs325Sound, source)
    setSampleValueAt(target, index, value)
    index = index + 1
  #Copy in 0.1 second pause (silence) (0)
  for source in range (0, int(0.1*getSamplingRate(target))):
    setSampleValueAt(target, index, 0)
    index = index + 1
  #Copy in "is fun"
  for source in range (24703, getLength(isFunSound)):
    value = getSampleValueAt(isFunSound,source)
    setSampleValueAt(target, index, value)
    index = index + 1
  normalize(target)
  play(target)
  return target

To figure out what range we had to copy from, we used the Explorer tool to find the index (approximately 24703) where "is fun" started in the "what_is_fun.wav" file.

An overview of the code is as follows:

use the makeSound() functions to create Sound objects from the "CS325.wav" and "what_is_fun.wav" files
find the lengths of: "cs325", silence for 0.1 of a second, and "what is fun"
create a sound big enough (called target) to hold those three pieces (yes, it will be a little bigger than necessary). Notice how we are using the makeEmptySound() function to create target.
use an incrementing index to copy into target:
1. "cs325" from 0 to the length of that sound
2. 0.1 seconds of silence (sample value of 0)
3. "is fun" starting from index 24703 in the "what is fun" sound to the end of that sound sample
normalize the sound so that it is as loud as it can be
play the sound

Before we run this code, we have to call:

setMediaPath(), which displays a file picker dialog box. You can select a directory (or folder) where your pictures are stored. As an aside, getMediaPath() - uses the directory that you have selected with setMediaPath and prepends it to filename.

When we run this code, something strange happens. What is happening? How can we fix it?

3.2 Creating General Clip and Copy Functions

This sub-section will combine all three sounds so that we get "cs325 is always fun". In order to do that, we need to extract the "is" and the "fun" from the "what_is_fun.wav" file. We used the explorer tool to find the approximate beginning and ending of the words:

	start	end
what	929	14109
is	24703	36955
fun	37169	60705

To make splicing easier, two functions were created:

clip(), which extracts (or returns a sound) from a starting index (start) to an ending index (end) in the source sound. The code is below:

#program 65, page 183  
def clip(source, start, end):
  target = makeEmptySound(end - start)
  targetIndex = 0
  for sourceIndex in range(start, end):
    sourceValue = getSampleValueAt(source, sourceIndex)
    setSampleValueAt(target, targetIndex, sourceValue)
    targetIndex = targetIndex + 1
  return target

Notice that we create a target sound that is big enough to hold the sound between start and end.
We cycle through the source from start to end and copy all the samples into target

copy(), which will copy the source sound into target using start as the index into target. The code is below.

#program 66, page 184  
def copy(source, target, start):
  targetIndex = start
  for sourceIndex in range(0, getLength(source)):
    sourceValue = getSampleValueAt(source, sourceIndex)
    setSampleValueAt(target, targetIndex, sourceValue)
    targetIndex = targetIndex + 1

Notice that we copy the entire contents of source into target
The index used for copying into target begins at start

Our final function that makes use of both clip and copy is below:

def merge2():
  cs325Sound = makeSound(getMediaPath("CS325.wav"))
  isSound = makeSound(getMediaPath("what_is_fun.wav"))
  alwaysSound = makeSound(getMediaPath("always.wav"))
  isClip=clip(isSound,24703,36955)
  funClip=clip(isSound,37169,60705)
  len=getLength(cs325Sound)+getLength(isClip)+getLength(alwaysSound)+ getLength(funClip)
  samplingRate= int(getSamplingRate(cs325Sound))
  newSound=makeEmptySound(len,samplingRate)
  copy(cs325Sound,newSound,0)
  copy(isClip,newSound,getLength(cs325Sound))
  copy(alwaysSound,newSound,getLength(cs325Sound)+getLength(isClip))
  copy(funClip,newSound,getLength(cs325Sound)+getLength(isClip)+getLength(alwaysSound))
  play(newSound)
  return newSound

The idea of this code is:

make Sound objects out of the three ".wav" files
clip the "is" and "fun" out of the "what_is_fun.wav" file
calculate the length (len) of all the pieces that will be put together
use the makeEmptySound() function to create the sound (newSound) that will be used as the final combined sound with a length (len) of all the pieces. We have also added a samplingRate argument so that newSound will have a sampling rate consistent with "CS325.wav" (a solution to the problem that we had in the previous subsection)
copy all of the pieces into the newSound:
- first "cs325"
- then "is"
- then "always"
- then "fun"
play the sound

There is still a problem with how the result sounds. How would you fix it?

3.3 Making a Library

Let us say that you like that clip and copy function and want to use them over again in other projects. To reuse it, you can create a library. The steps are below:

Cut and paste the clip and copy functions into a separate file (let us call it soundLib.py)
To the first line of soundLib.py, include the following line:
from media import *
This will allow you to use the JES provided functions like getMediaPath, makeSound, etc
In the second file, where you would like to call the clip and copy functions, add the following to the first line:
from soundLib import *
This is like copying the clip and copy (or all the) functions from soundLib.py into your code
Find the directory where soundLib.py is stored, and use the following to set the library path:
setLibPath("/Users/you/yourDirectory/PythonSound/")
Of course, you will want to use your own directory as an argument. This function tells Python where to look for the files that you are importing.

4. References

Picture of Sine wave from: Stapel, Elizabeth. "Graphing Trigonometric Functions." Purplemath. Available from
http://www.purplemath.com/modules/grphtrig.htm. Accessed 09 July 2012
Python Code and Concepts from: Computing and Programming in Python, a Multimedia Approach, by Mark J. Guzdial and Barbara Ericson (Chapters 6 and 7)

5. Exercise

This exercise is taken from problem 7.8 on page 191 of Computing and Programming in Python, a Multimedia Approach:

Make an audio collage. Make it at least 5 seconds long, and include at least two different sounds (i.e., they come from different files). Make a copy of one of those different sounds and modify it [by changing the volume or some other creative approach (maybe not in the lab)]. Splice together the original two sounds and the modified sound to make the complete collage.

The sounds that you use should be your own recordings.

Use the clip and copy functions in a library