Applied Physics 186: November 2016

For today's blog, I'd like to tackle an interesting topic: image segmentation. More often than not, within every picture taken, there is a subject of interest. This subject of interest, of course, varies for each individual viewer. Our eyes and brains can pickup these regions of interest ("ROI") and focus our attention on specific details within them. The question now is: "Is there any way for a computer to simulate and adopt this object-segmenting function?"

Fortunately enough, there are easy and relatively techniques to do so. I'll begin with the easiest and most direct approach.

GRAY SCALE IMAGE SEGMENTATION

To "segment" a desired object within a scene, we basically just need some way to differentiate the pixel values of the region of interest from surrounding regions. In a few cases, this is easily done using thresholding. As a example, consider the test image below.

Figure 1. Gray scale test image

Suppose that we want to extract the text from the image. At first glance, we see that majority of the pixel values within the image belong to the gray-ish background of the check. To verify this observation, let's take a look at the histogram of such a test image.

Figure 2. Histogram of gray scale test image.

The built-in Scilab function imhist() was used to generate a histogram (Figure 2) of the test image with 256 bins. Each bin thus corresponds to a single integer pixel value. As shown in Figure 2, there are two dominant peaks (the two individual peaks appear to overlap into one large peak) at the pixel values of 193 and 195. These peaks, along with the surrounding pixel values that make up their widths, lie closer to the white end (pixel value of 255) of the gray scale range and thus probably correspond to a light shade of gray. Along with the point made earlier (majority of the pixel values belong to the grey-ish background of the check), it is highly plausible that these two peaks correspond to pixels making up the background of the check. Thus, by retaining only the pixel values below some threshold value (all pixels above this threshold are set to 0 while all pixels below this threshold are set to 1), we can isolate the text of the image. This is done using the code in Figure 3.

Figure 3. Code used to apply thresholding to the test image in Figure 1.

The results outputs of the threshold procedure performed by the code in Figure 3 are shown in Figure 4 for increasing threshold values.

Figure 4. Test image after which various threshold are applied

We see that application of the threshold in effect binarizes the image where the regions of interest are represented by the maximum pixel value of 255 and the rest of the pixels are represented by the minimum value of 0. We see in Figure 4 that the text of the image is "sufficiently extracted" from the original test image using a threshold value of approximately 100-140. This is consistent with the observations form the histogram in Figure 2. Looking at the extremes, we see that too small of a threshold value "throws out" too much information while too large of a threshold value includes irrelevant portions into the desired region of interest. Based on the results, this simple threshold procedure seems to work!

Of course, there doesn't need to just be a single threshold value. Consider the test image (left) and its histogram (right) in Figure 5.

Figure 5. Shown are the test image (left) and its histogram

Just as expected, there are two dominant peaks at the 255 and 0 pixel values due to large completely white and completely black areas in the test images. Let us consider the region of interest to be the circle containing various shades of grey. Correspondingly, this region is represented by the two intermediate peaks in the histogram. Thus, setting the minimum threshold to 100 and the maximum threshold to 200, we obtain the segmented image in Figure 6.

Figure 6. Segmented image of the test image in Figure 5.

Well that's all good and done, but (if not readily apparent) the simplicity of the technique employed comes with a major limitation. Consider the following test image in Figure 7, its grey scaled form, and its resulting histogram.

Figure 7. Test image (left), grayscaled version of the test image (right) and the resulting histogram.

Now suppose our region of interest is the food on the plate and we want to apply the threshold process implemented earlier to segment such a region. Because the image is in color and the threshold procedure implemented previously is applicable to only grey scale images, we take its grey scale and plot the histogram. Instantly, we note that the pixel value counts are spread throughout almost the entire range. There is no immediate boundary in the histogram where we can hopefully "cut" out values in an attempt to segment out our region of interest. In other words, in the grey scale world, there is a significant similarity between the pixel values of our region of interest and the rest of the image. In this case, the threshold procedure fails miserably.

The limitation of the threshold process lies in the fact that it works of a difference solely in gray scale values. In other words, the degrees of freedom (whether that phrase can even be applied in this setting I don't really know) available to each pixel is limited. If not yet readily apparent, the solution therefore lies in finding a technique that can differentiate the information contained in pixels based on other available characteristics. To this end, we look at the colors contained in our region of interest as a means to isolate it from the rest of the image.

COLOR SEGMENTATION (PARAMETRIC and NONPARAMETRIC)

Inspired by the use of histograms in the previous approach, a plausible first step in the direction of color segmentation would be to consider the color histogram of an image. In other words, we'd like to plot the distribution of color pixel values within an image. In a standard RGB image, each pixel is represented by three numbers . Thus, in RGB space, our histogram would be three dimensional (one axis for each channel). This is not only ungainly to work with (due to having three dimensions), but also does not benefit the color segmentation methods that are to be implemented. Why is this so? The answer lies in the fact that in RGB space, a point is defined by the intensities of red, blue and green. As such, brightness information (intensity) and chromaticity (proportions of each channel) are encoded simultaneously. We can, however, transform our space to eliminate intensity information and retain only chromaticity information. This has the advantage of removing the effects of nonuniform lighting on a object to be color segmented within a scene. The transformation from RGB space to this "new" space involves dividing each channel by the sum of all three to obtain new r, g, and b channels. Correspondingly, this process imposes that r+g+b=1 and thus it is sufficient to describe the space with only the r and g channels (As b is not an independent variable). We call this new space the rg normalized chromaticity space [1].

Figure 8. Normalized rg chromaticity space. x-axis is for red while y-axis is for green. Image taken from [1].

Shown in Figure 8 are the colors that correspond to the proportions of r and g (indirectly, also b) values stored within a pixel. By using this space, we not only reduce the dimensionality of the histogram by 1, but also only constrict the attention to comparing chromaticity.

Now that we're able to generate a 2D histogram of the color values contained within an image, we can begin to talk about color segmenting an image. In the most basic sense, color segmenting an image involves selecting a subregion containing colors that are representative of the region that is to be segmented, finding the probability that each pixel within the image belongs to this subregion (using the probability distribution of this subregion), and plotting the resulting probability map (the image containing the segmented ROI). There are two relatively simple ways that we can go about this: parametric and nonparametric segmentation. The difference between the two approaches mainly lies in the interpretation of the probability distribution of the subregion.

PARAMETRIC COLOR SEGMENTATION

Parametric color segmentation, as its name suggests, involves parametrizing the probability distribution of the subregion with its mean and standard deviation. In this way, we then assume that the probability distribution is Gaussian in nature. The fact that we have two channels to consider in rg space hints that we may assume a gaussian probability distribution individually for each, with the combined probability (and therefore the probability of a pixel having a color within the range specified by the subregion) being the product of the two gaussians. The code to implement such a procedure is shown in Figure 9. For comparison purposes, the test image, subregion, and outputs of the code will be shown after the second procedure is discussed.

mono_roi=imread('C:\Users\Harold Co\Desktop\Applied Physics 186\Activity 7\color_seg\cropped_green_balloon_swatch.jpg')

image=imread('C:\Users\Harold Co\Desktop\Applied Physics 186\Activity 7\color_seg\balloons.jpg')

mono_roi=double(mono_roi)
image=double(image)

//Convert to normalized chromaticity coordinates
I=mono_roi(:,:,1)+mono_roi(:,:,2)+mono_roi(:,:,3)
I(find(I==0))=1000000
r=mono_roi(:,:,1)./I
g=mono_roi(:,:,2)./I

I_image=image(:,:,1)+image(:,:,2)+image(:,:,3)
I_image(find(I_image==0))=1000000
r_image=image(:,:,1)./I_image
g_image=image(:,:,2)./I_image

//////PARAMETRIC//////////////
//Mean and std dev of pixel samples

p_r=(1 ./(stdev(r)*sqrt(2*%pi)))*exp(-((r_image-mean(r)) ./ (2*stdev(r))).^2)

p_g=(1 ./(stdev(g)*sqrt(2*%pi)))*exp(-((g_image-mean(g)) ./ (2*stdev(g))).^2)

jointrg=p_r.*p_g
jointrg=jointrg/max(jointrg)
jointrg=mat2gray(jointrg)
imshow((jointrg))

Figure 9. Code used to implement parametric color segmentation

The variables mono_roi and image are the arrays that contain the RGB values of the pixels within the subregion and the entire image, respectively. Lines 8-12 and 14-17 are to facilitate the conversion of these values from RGB space to rg normalized chromaticity space. Lines 22 and 24 are the corresponding gaussian probability distributions of the r and g channels (of the subregion) evaluated at the r and g values of each pixel within the test image. Lastly, jointrg is the array containing the joint probability map and thus the segmented image.

NONPARAMETRIC COLOR SEGMENTATION

The parametric color segmentation technique described above assumes that the probability distribution of the subregion chosen is gaussian in nature. To be accurate, however, this is not the case as the "true" probability distribution is that which is described by its color histogram. Thus, an alternative approach would be to use the histogram itself as the probability distribution. Again, a subregion of the ROI is chosen containing a range of colors representative of the ROI. The 2D color histogram of this subregion (in rg space) is then taken. Given a specific pixel within the test image, this is then allocated into one of the "bins" of the subregion's histogram, with the height of the chosen bin being the probability value of that specific pixel in the output probability map. The procedure outlined is thus called histogram backprojection[1]. The code used to implement this procedure is shown in Figure 10.

mono_roi=imread('C:\Users\Harold Co\Desktop\Applied Physics 186\Activity 7\color_seg\cropped_green_balloon_swatch.jpg')

image=imread('C:\Users\Harold Co\Desktop\Applied Physics 186\Activity 7\color_seg\balloons.jpg')

mono_roi=double(mono_roi)
image=double(image)

//Convert to normalized chromaticity coordinates
I=mono_roi(:,:,1)+mono_roi(:,:,2)+mono_roi(:,:,3)
I(find(I==0))=1000000
r=mono_roi(:,:,1)./I
g=mono_roi(:,:,2)./I

I_image=image(:,:,1)+image(:,:,2)+image(:,:,3)
I_image(find(I_image==0))=1000000
r_image=image(:,:,1)./I_image
g_image=image(:,:,2)./I_image

//Histogram generation
BINS=32;
rint=round(r*(BINS-1)+1);
gint=round(g*(BINS-1)+1);

colors=gint(:)+(rint(:)-1)*BINS;
hist=zeros(BINS,BINS);
for row = 1:BINS
    for col = 1:(BINS-row+1) 
        hist(row,col)=length(find(colors==(((col+(row-1)*BINS)))));
    end
end

hist=mat2gray(hist)

rint_image=round(r_image*(BINS-1)+1);
gint_image=round(g_image*(BINS-1)+1);

//n is the matrix containing the indices (within the ROI histogram)for each r and g combination in the test image
n=rint_image+(gint_image-1).*size(hist,1)

//hist_list is a list containing the histogram values for each pixel
hist_list=hist(n)
//reshape back into 2d array
hist_list=matrix(hist_list,size(r_image,1),size(r_image,2))
imshow(mat2gray(hist_list))

Figure 10. Code used to implement nonparametric color segmentation

Lines 1-17 are identical to the code shown in Figure 9. Lines 19-32 is a code snippet taken from [1] and is used to generate a 2D histogram of the subregion. Note that because the origins of a digital image is in the left corner instead of being in the lower left corner, the resulting histograms produced must be rotated counter clockwise by 90 degrees in order to compare it to the histogram shown in Figure 8. The number of bins can be changed by varying the value of BINS. Lines 34-41 display a neat method in implementing histogram backprojection that takes advantage of the way Scilab works with arrays and matrices. First off, lines 34 and 35 associate each of the r and g values of each pixel within the image to a specific bin number within the subregion's histogram. When taken as a pair, these the two values per pixel can be considered as the coordinates of a point (or in terms of arrays, the indices) in the subregion's 2D histogram. In line 38, we then remap these two indices to a single index whose location within the histogram array holds the appropriate histogram value. This can be done due to the way elements within an array can be accessed in Scilab (For a 2D array, either with a pair of values or a single value). We then plug these index positions in our subregion's histogram to yield the probability list as done in line 41. Lastly, because the output of line 41 is that of a single row list, we reshape the array to the size of our original image. Line 44 then displays the segmented image.

RESULTS

So finally, we can head to the results. To begin, let's consider a test image in which most of the regions are of a solid monochromatic color. This is shown in Figure 11.

Figure 11. Test image to be used. Taken from [2]

The corresponding results for the nonparametric and parametric techniques (along with the colored subregions taken) are shown in Figure 12.

Figure 12. Nonparametric (left) and parametric (right) segmented images using the test image in Figure 11 and the subregions (enlarged for viewing purposes) used.

First things first, we see that both techniques are generally successful in color segmenting specific regions within the image. Upon closer inspection, however, we see that for some cases, the outputs for the parametric and nonparametric cases are virtualy indistinguishable (rows 1, 4, and 5) while for others, there are noticeable differences (rows 2 and 3). Specifically, for the results in rows 2 and 3, the boundaries of the swirly regions are sharper, more distinct, and slightly larger in the nonparametric case compared to the parametric case. This is due to the fact that the parametric case assumes a smooth gaussian probability distribution which thus leads to a gradual decrease in probability values as the pixel values deviate from the mean. Conversely, the nonparametric case pulls its probability values straight from the histogram and thus allows for the possibility of sharp boundaries within the probability map. Of course, if the changes in bin heights in the subregion's histogram are gradual, then gradual changes in probability values can also occur in the nonparametric cases. This explains the occurrence of the indistinguishable nonparametric and parametric outputs in rows 1, 4. and 5. (A subregion with a very narrow range of chromaticities will result in two identical cases: a histogram with a single dominant peak and a gaussian distribution with an almost zero standard deviation).

The last row in Figure 12 demonstrates the defining limitation of the color segmentation algorithm: a region of interest can only be fully color segmented if the range of chromaticities contained within this region are not present anywhere else. From the results in the last row, it seems as if the chromaticities contained in this particular subregion were also present in the swirly regions segmented in the first row. This is further supported by the fact that the two colors (maroon and purple-ish) are similar in terms of chromaticity.

For added intuition, we can also verify the subregion 2D histograms used in the generation of the nonparametric outputs in Figure 12 with the color map shown in Figure 8. We can overlay the histograms of each subregion with this color map as shown in Figure 13.

Figure 13. Color map overlain with the 2D histogram (left) of the subregion taken (right).

Note that for display purposes, I had to rotate the color map in Figure 8 before overlaying the generated histogram. This is to account for the fact that an image has its origin in the upper left corner instead of the conventional lower left. As shown in Figure 13, the moving white point represents the point in rg space at which the histogram is at its highest value. Comparing this point with the subregion on the right, we see that they are consistent!

Now the test image used in Figure 11 features regions containing a narrow range of chromaticities. Thus, this results in a narrow range of probability values in the color segmented image as well and correspondingly, only minute differences are observed between the parametric and nonparametric cases. To gain far more insight, we can consider the test image in Figure 14. We hope to segment the green balloons.

Figure 14. Test image (left) and the cropped subregion (right). Test image was taken from [3].

As shown in Figure 14, a cropped region of the green balloon was taken. Note also that nonuniformities in lighting conditions make the balloon appear to be of varying shades of green. The results for the nonparametric and parametric cases are shown in Figure 15.

Figure 15. Nonparametric (left) and parametric (right) color segmentation results.

Now there is obviously a difference between the outputs of the two methods. We again see verification of the fact that the gaussian probability distribution assumed for the parametric case results in a smoother probability map as shown on the right of Figure 15. Conversely, the jaggedness of the subregion's histogram is directly translated to the output in the nonparametric case. Choosing the "better" color segmented output over the other, however, depends of course on the desired application. The parametric output is of course, more visually appealing, though not fully representative of the subregion taken as compared to the nonparametric case.

I would think, however, that the choice between which method to use relies heavily on the subregion taken. If a subregion containing a large enough spread of the desired colors is obtainable, then it might make sense to employ the nonparametric case for the sake of better accuracy. Consider Figure 16 wherein progressively bigger subregions are taken and then the nonparametric method is employed.

Figure 16. Nonparametric output (left), subregion taken (lower right) and a zoom in of the 2D histogram of the subregion (upper right).

As predicted, if we take a larger subregion, more possible chromaticities are included. This results in a more spread out histogram and correspondingly, a segmented image containing a wider range of probability values. Thus, when using the nonparametric case, if monochromaticity of regions is assumed, it is important to take large enough subregions.

Lastly, we can also explore the effects of changing the number of bins used in the histograms of the subregions on the outputs in the nonparametric cases. Consider Figure 17.

Figure 17. Nonparametric outputs using histogram bin counts of 8, 16, 32, 64, 128, and 256.

As shown in Figure 17, the general trend is that as the number of bins increase, the area of the regions included in the segmented image decreases. Moreover, the spread in probability values also increases with increasing number of bins. Most importantly, however, we see that there is some optimum number of bins that will yield a generally acceptable color segmented output. The word "acceptable" here is used loosely as it depends, of course, on your specific application. The use of too few bins can include undesired regions that have similar chromaticities as your ROI (like that of the teal balloon) while too many bins can cause the occurence of "empty" bins between partially filled bins. Depending on how you look at it and your desired application, this may or may not be desirable.

For my performance in this activity, I'd like to give myself a 12/10. I believe I was able to grasp the concepts outlined by the actively and subsequently, fully apply them. Besides this, I also believe that I have gained more than a working feel of the techniques employed of at least the simplest of segmentation algorithms. Doing this activity has also piqued my interest in other segmentation techniques that I probably will try out when I have the time.

ACKNOWLEDGEMENTS

I'd like to Ma'am Jing for helping me realize that there was a much easier approach (although not as instantly realizable) in implementing histogram backprojection.

Reference:
[1] Soriano, M., “Image Segmentation,” 2014.

[2] [Digital image]. (n.d.). Retrieved from https://janefriedman.com/dekes-techniques-085-garish-color-pattern-in-illustrator-cs6-png/

[3] [Digital image]. (n.d.). Retrieved from http://il5.picdn.net/shutterstock/videos/16630771/ thumb/1.jpg

Applied Physics 186

Monday, November 7, 2016

Activity 7 - Color Segmentation