Thursday, September 1, 2011

A13 - Image Compression

Remember the old school floppy disk? The unattractive bulky square-like storage device. Imagine this, the device only stores up to a maximum of 1.5 MB only! But it was ubiquitous many years ago that everyone was so contented with it until the arrival of new storage devices with storage size ranging from GB to TB.

With such small storage space, what can we do to maximize its usage??
--> Compression is the answer!

Compressing a file such as a document or an image means reducing its storage size for convenience purposes.

In this blog post, we will use the Principal Component Analysis to represent an image as a superposition of weighted basis images and minimize the number of features to be used to compress the image. A possible useful discussion of Principal Component Analysis method can be found in this wiki page.


We will use the Principal Component Analysis method to compress the grayscale image below

 
Figure 1. Grayscale image to be used.

We cut the 500x334 image into 10x10 blocks as shown

 
Figure 2. Chopped version of figure 1 into 10x10 blocks.
(Note: Since the number of rows is not divisible by 10, we will
 exclude the last 4 rows)

We then convert each sub-block into a column vector using the Scilab pseudo-code subblock(:) where subblock is the subblock image. We take the transpose of the column vectors to convert them to row vectors and stack them.

 
Figure 3. Stacked version of the subblocks in figure 2.
(Note: The stack order is made by scanning the subblocks of figure 2
from left to right and top to bottom converted to row vectors.)

The result of figure 3 is an nxp matrix where n is the number of subblocks and p the number of elements in that block (here, the matrix size is 1650x100).

We apply PCA on figure 3 using the Scilab function [lambda,facpr,comprinc] = pca(x) where x is the nxp matrix, lambda contains the eigenvalues and their normalized version, facpr are the eigenvectors, and comprinc are the principal components. The produced plots of using the pca function are

 
Figure 4. (a) Correlations circle. (b) Eigenvalues in percent form.

It is interesting to note that the value of the first eigenvalue is ~93 which gives us a hint that the eigenvector corresponding to the first eigenvalue contains majority of the image's information. We can also get the principal factors (eigenimages) as shown

Figure 5. Eigenimages (100x100 matrix).

We then use the eigenimages in figure 5 together with principal components to compress figure 1. The resulting compressed images depending on the number of eigenvectors used are shown below

 
Figure 6. Compressed images
(The number of eigenvectors used from left to right and top to bottom are
1, 2, 3, 5, 8, 10, 13, 18, 21)

Now we verify if our goal of compressing the file size of the original image is achieved. First, note that initially we have 100 available eigenvectors, if we plot the achieved compressed image file size with the number of eigenvectors used, the result is as follows

 
Figure 7. File size versus number of eigenvectors.
We can see that as the number of eigenvectors used increases, the produced image file size increases as well. Thus, the idea of choosing substantial amount of eigenvectors to limit file size depending on how accurate the needed image is important.

Comparing the filesizes of the compressed images shown in figure 6 with the file size of the original grayscale image, we will get

Table 1. Comparison of filesizes of compressed images with original image.
 

 Table 1 shows that if we use at least a single eigenvector to 21 eigenvectors, we can reduce the filesize from 33% to 19%, respectively. We further note that it can be observed that using more than 10 eigenvectors, no significant change can be seen (just minor details as shown in figure 6). Thus, we can say that 10 eigenvectors is the optimal number of eigenvectors to be used which can reduce the filesize by 30.28%.

The most important concept to be remembered in this blog post is that the method of PCA can be used to compress images. The amount of eigenvectors to be used highly depends on your primary objective. If your objective is to reduce filesize only, then choose very minimal number of eigenvectors. If you want to reduce filesize without compromising accuracy, then it is best to find the optimal number of eigenvectors (at least less than the total number of available eigenvectors).


I pretty much enjoyed this activity and I am starting to imagine applications of the PCA method and how I can use it on my special project. For evaluation purposes, I would give myself a grade of 10.0 for being able to produce the needed results and for being able to compress my chosen image significantly.

No comments:

Post a Comment