Implementing an algorithm sounds scary, but in reality, we're just taking a recipe and translating it into a language the computer can understand (code). I thought it would be fun to make a tutorial on implementing a simple graphics algorithm.
From the title, you'll know we're going to be implementing gaussian blur, with the help of OpenCV. OpenCV is a computer vision library you can use in Python to do lots of cool things, from face recognition to image segmentation. In fact, it has a gaussian blur implementation built in to it! We won't be using that though, of course - what we will use is its ability to easily read and write images, as well as comparing our implementation of gaussian blur to its implementation.
Setup
Before we dive into code, we need to get 2 things done:
1) Understand the gaussian blur algorithm (recipe)
2) Install OpenCV and numpy (if you don't have them)
First, let's understand gaussian blur. Since we're implementing it in code, we can simply skim the foundational mathematics behind the algorithm and rather focus on what will get us what we want. Essentially there is a Gaussian function which is used to calculate the transformation to apply to each pixel on the image (from Wikipedia) - this Gaussian function produces a convolution matrix, which is then used as a filter to transform each pixel to the weighted average of its surrounding neighbors' colors.
To dissect this a bit further, think of an image as a grid of pixels. Now, think of the matrix as a filter. We slide the filter across each row of pixels, multiplying the color of the surrounding pixels by the value that is covering it in the matrix, and we also keep a running sum of these colors. Once every pixel enclosed in the filter has been calculated, we find the weighted average, and apply that color to the appropriate pixel. Here's a good visualization of this that I found on the Internet (credit):
Ok, cool! We have some understanding of what we're trying to do. Our recipe involves sliding a matrix of numbers across a grid of pixels (read: an image), and for each pixel, we're calculating a weighted average of surrounding pixel colors from that matrix, and applying that color to that pixel. It's ok if it doesn't make a ton of sense now!
We can almost start coding. We just need to make sure we have OpenCV installed. It should be as simple as running these commands in your terminal:
pip install opencv-python
pip install numpy
To make sure it worked, in a terminal, try doing these two commands, one after the other
python
import cv2, numpy
If you don't get an error, than you should be good to go! If you did, read through the output of the installation to make sure there were no errors.
Coding
OK, let's get started! We need an image to blur. I'm going to use this one, but you can follow along with a different image if you like. Beware: The larger your image is, the longer your code will take to run, since this is an algorithm that runs on every pixel in the image! In fact, I'm going to be cropping this image in our code to make it run a bit faster.
Start out with this. Here, we're simply reading the image data into our script with OpenCV's imread function. We want to print out the result to see what we're working with!
You'll notice this outputs some nested arrays with numbers in them. Those are color values = R, G, B values to be exact. They make up the image that you fed into OpenCV with the filename variable. If this didn't work, make sure your filename matches the actual filename of your image exactly. Also, make sure your image is saved in the same folder as your script. Otherwise, you'll need to provide the full path to the image as well as the filename for OpenCV to find it.
Since we're not going to want to keep opening the actual file every time we run our code, let's add a way to see the image with our script, using some OpenCV boilerplate code.
When we call our preview_img function, passing our img variable as an argument, and run your code, you should see a pop-up of your image appear! Close the image to end your script. (You can also remove the print statement from before)
Cool. We're about to start on our algorithm, but first, we want to crop our image, so it doesn't take too long for our code to run. If your image is already small i.e. 300x300 or less, you are probably ok to skip this step. If not, let's crop.
This bit of code is taking only the data in the range of the bounds we gave for both dimensions, and leaving behind anything outside of those bounds. So, our resulting image consists of only the pixels between x = 100 and x = 300, as well as y = 0 and y = 200. If you're not sure how you want to crop your image, that's ok. For the sake of this tutorial, any subset of your image will do!
If you're following along using my image, yours should look like this when you run your code (you can remove the first preview_img call to avoid having to go through 2 pop-ups):
Rad! Let's go into the algorithm now that we have our image ready in our code.
Recall that we want to apply a matrix, which in computer vision terms is often called a kernel, to every pixel in the image. In psuedo-code terms, we can think of it as... "For every pixel p in img, apply a matrix computation." So, let's roughly map that out in our code.
Couple of things here. First, we get the dimensions of our 2D array of pixels, and then, to go pixel by pixel, we have a nested for loop going over those dimensions. We haven't implemented an apply_kernel function yet, but we're penciling it in as that's where we'll add our logic to apply the matrix computation. We need to pass it the whole image, not just one pixel since we'll be using surrounding pixel data to do our computation. We also pass i and j, which represent the current pixel (in terms of x and y), so it's easy for our kernel function to know which pixel in the image we're applying the matrix to. Finally, we return our (now blurred) image. It's not going to be blurred yet, so let's get started on that function!
There are actually lots of different matrices you can use for gaussian blur. For this tutorial, we're going to be using 5x5 (see
here):
Here is the starting code for our function:
Our gaussian5x5 variable represents the matrix above, computed out for convenience. We'll need the image bounds later, since we don't want to get an array out of bounds error, so we get those now. Finally, we initialize the two variables that will make up our result, col and gsum. The col variable will keep track of the total color of our computation, and the gsum variable will keep track of the total matrix weights we used. When used together as we have in the return statement, we get the weighted average that we know we need from our initial study of the algorithm. So we're on our way! All that's left to do is populate those two variables.
Recall that we want to use our matrix as a window. We're going to go through our matrix by element, and use the element value and the color value of the image at that pixel to get the contribution of that pixel to the resulting average. We're going to be multiplying colors by a scalar value and adding colors to keep the running total, so let's quickly whip up some helper functions for those purposes.
Great. We can now do our matrix computation that we described in the previous paragraph. This is the most difficult to understand portion of this tutorial! Let's see if we can understand it better in code form:
OK! The nested for loop allows us to go through every element in the matrix. The next two lines are a simple transform to center the matrix on the pixel we're doing our computation on. We want the largest weight to be our pixel in focus, which corresponds to the center value of the matrix.
Then, we want to do 2 things: multiply the appropriate matrix scalar by the color of the corresponding pixel, and then add that resulting color to the running total of colors (our col variable). That is what the third line of the nested for loop does. Finally, the last line is keeping track of the sum of the matrix element values.
There are 2 gotchas in this code! See if you can figure them out before reading on.
The first: Our bounds check! For some pixels in our image, i.e. the top-left pixel, some of our matrix will go unused, as it will be matched with non-existent pixels out-of-bounds of our image. So, we need to make sure that doesn't cause an exception in our code.
The second: This one is harder to tell, but if you noticed earlier, our image data read in by OpenCV is in 0-255 format, not 0-1. So, we need to transform our matrix element values a bit for our computation to work! The two fixes are implemented below.
We should be all good to test it out. Let's set this up so we can compare the original image to our gaussian blur, and OpenCV's gaussian blur.
Here's how we can do that:
There's some python-foo happening here - if you're interested, leave a comment and I can explain in more detail. But anyway, at this point, your result should look something like this:
From left to right, we have: our original image, the image with our gaussian blur algorithm applied, and finally, OpenCV's gaussian blur algorithm applied. Pretty darn close, I'd say! Congrats, you implemented a gaussian blur algorithm!
Let me know if this was fun for you! You can find the full script here:
click
Comments
Post a Comment