Thursday, August 18, 2011

A11 - Playing notes by image processing

Listening to music is an integral part of my everyday life. It relaxes me and gives me my private moment.

Artists and musicians play the music we hear using a score sheet. A sheet with musical notes serving as a guide to perform a piece of music. So how do they play the music derived from a score sheet? A common answer would be via an instrument/voice.

But do you know that we can actually play a music using image processing? Cool!

In this blog post, we will extract musical notes from a digitized score sheet and play these in Scilab with the proper frequency and duration.




The first thing to do is to find a musical score sheet image on the web. The simpler the score is, the better. A sample score is shown below.

 
Figure 1. Twinkle Twinkle Little Star score sheet.
(image from http://bestteethwhiteninggels.files.wordpress.com/2010/07/twinkle_twinkle_little_star_score1.png)


We first divide the sheet into three images as shown, removing the clefs and words

 
Figure 2. Divided Twinkle Twinkle Little Star score sheet.

Binarize using an appropriate threshold value and invert the colors(from black to white, and vice versa).

 
Figure 3. Binarized version of figure 2. 

After the easy part comes the biggest obstacle in this activity. We need to create a procedure to remove all features of figure 3 except the notes. There are two possible ways to do this, either we do template matching by using correlation or perform morphological operations. 

In using template matching, we need to have a pattern/template to match with the images concerned. In this , we can choose to use a template of a quarter note derived from the same score sheet and binarize it.
 
Figure 4. Quarter note template (a) from score sheet,
(b) binarized and inverted.

We then use the imcorrcoef function of Scilab and obtain the image shown

 
Figure 5. Resulting image using imcorrcoef.

It can be seen that bright spots are present at locations where good matches are found. We can then binarize figure 5 to remove unnecessary features

 
Figure 6. Binarized version of figure 5.

And viola, the notes (in particular the circular structures) are separated from the score sheet.

As I've mentioned above, there is another way to obtain the similar result of figure 6. This is by performing morphological operations. The most important concern of  morphological operations is the structuring element to be used. The structuring element I chose is a 2x3 rectangle with added white pixel on the upper rightmost and lower leftmost part.
 
Figure 7. Structuring element used.

The choice of structuring element is because extra pixels on the lower and upper region of the 2x3 rectangle will effectively remove horizontal and vertical lines. The structuring element will not affect the blobs because of the square like form of them. We can then implement an erosion using this structuring element. The result of which is shown in figure 8.

 
Figure 8. Extraction of notes using erosion.

The next goal is to approximately obtain the exact locations of the blobs and correspond it with the appropriate note frequency. It is also important to distinguish quarter notes from half notes to be able to identify the play duration needed.

The issues can be addressed using the bwlabel function of Scilab which will automatically locate the connected pixels (the blobs). Then we use the concept of center of mass to convert the blobs into a single pixel. The equation of center of mass is
 
Equation 1. Center of mass equation

where r is the location of pixel i and R is the location of the center of mass. Here, since we are talking about pixels, we can imagine the mass m of a pixel to be equal to 1.

Then, we can then compare the center of mass locations to the location of the horizontal lines to obtain the appropriate note frequency. The note frequency I used is based on http://www.angelfire.com/in2/yala/t4scales.htm where the starting C note is in 3rd octave.

The issue on distinguishing a quarter note from a half note can be solved using the areas of the blobs. It can be observed in figure 6 or 8 that the area occupied by quarter notes are greater than the half notes. Thus, a threshold area can be created such that if the area of the blob is smaller than this threshold, the note is considered as a half note. 

Results from the process involving template matching is


And the result using morphological operation is



By listening to the audios of the videos above, the results obtained by using template matching and erosion are perfectly the same.

To verify the effectiveness of either method, I used the same process involving template matching on a Happy Birthday song with score sheet shown

 
Figure 9. Happy Birthday to You song score sheet. 
 (image from http://www.zimbio.com/Oblong+Shaped+Face/articles/uJ-Fz2HqhIk/happy+birthday+song+in+korean)

And the resulting played notes...

(Note to listener: You might wonder about the unusual ending of the result,
the score sheet is indeed different from our usual Happy Birthday song!
This is the Korean version.)

Finally, this is done! I actually did the activity for three meetings which is the longest as compared to previous activities. Overall, the idea of playing music using image processing is interesting and impressive but the process is tedious and grueling. I think I deserve a grade of 10.0 for being able to extract musical notes from a digitized score sheet and playing the notes with the proper frequency and duration.

References:
[1] ' Playing notes by image processing', 2010 Applied Physics 186 manual by Dr. Maricor Soriano

1 comment:

  1. Hi... can you please let me know your meal address.I need to discuss about this project. my email is smileg247@gmail.com thanks

    ReplyDelete