How Hard Could It Be, Part 2.
(You can find Part 1
here.)
To perform background replacement on an image, one must first perform background
subtraction – that is, figure out what portion of the given image is the background, what portion is the foreground, and then separate the two. Let's take a webcam teleconference as the test case, since that's what Apple showed. In this case, the foreground is the person in the conference, and the background is everything else. The problem looks like this:
1. Empty BackgroundThis is the background: the "empty image." What the camera sees when there's no one around.
2. Person Against BackgroundThis is what the camera sees when the person waves at the camera.
3. Background Subtracted (ideal case)This is what the perfect scenario would look like: the waving hand is pulled out of the image, and the rest is removed. Now we could superimpose this hand over a different image, or video, or whatever.
Looking at the problem like this, an approach immediately presents itself: why not just look at each pixel in the background image and compare it to each pixel in the hand-waving image? If the two pixels are the same, then the pixel must be part of the background. If not, it's gotta be foreground. Right? How hard can it be?
Well unfortunately, if you do that, you get this:
See all the little blue specks? Those are the pixels that didn't change, and so were identified as background. It looks like only very few of the background pixels were actually removed, even before putting anything in the foreground. Even worse, the pixels that are removed change every frame! What happened?
In a word,
noise. Noise is the random graininess that you get in images. In digital cameras it's usually the result of electronic interference, a crappy camera sensor, or both, and it's especially noticeable in low quality webcam images such as these. In order to be able to find the background, we're going to need to be able to recognize it even in a noisy image. Fortunately, there's a simple method that works pretty well, and you're most likely already familiar with it (or were, anyway, in stats class). Stay tuned.
How Hard Could It Be, Part 1.
Back in August, Apple finally revealed some of the insanely great features coming in the
next release of Mac OS X (10.5, aka "Leopard"). Since it's now the end of October – oops, btw – the hype train has finally cleared the station, the Internets have returned to normal, and it's time to assess the new stuff.
The Leopard vibe from the end-user perspective is
mild disappointment, and understandably so: although
Time Machine is neat in an eat-your-vegetables sort of way, there isn't much else that really pops. That shocks and awes. I mean, seriously:
email stationery? If anything, I'd call that a step backwards.
But Wired magazine aside, it's actually perfectly fine that the demonstrated Leopard features didn't rock the globe. After all, the demo was given at WWDC. The World Wide
Developer Conference. The annual gathering of people who write software for OS X. Is it any surprise that, of the ten features shown, six* were aimed squarely at the nerd patrol?
To get the real story, listen to the applause in the
streaming video of the presentation: there was a lot of neat stuff that the MSIATM** missed, stuff that hasn't been possible before but that doesn't necessarily seem that spectacular to the average user. In fact, there was one thing in particular that completely blew me away, and seemed to cause quite a bit of applause at the event: live background replacement in a webcam image.
You are no doubt familiar with the concept of
bluescreening: it's what the weatherman uses every day on the news, wherein he stands in front of a blue wall in the studio, yet appears as if he's standing in front of an animated weather map. Well, background replacement algorithms produce the same effect, only they don't need a blue wall to do it. Instead, they try to figure out what part of the image is background, and what part is foreground.
Using math.
As you might imagine, that's both neat and tricky to pull off, which is why no one's really offered it yet in a mass market application. It was therefore extremely impressive that the new version of iChat
seemed to be able to do it flawlessly (and without any choppiness – check the streaming video right around 1:19:00 for the demo). The wild applause is well earned: that's a hell of a nice implementation.
Of course, upon viewing that I immediately concluded that there must be a trick. I mean, they're doing it
in real time, on a noisy webcam image. There's gotta be some new algorithm, or some different technique. How hard could it be?
And that's how, head firmly in ass, I set off to prove that background replacement isn't hard. And learned, of course, that in fact it
is hard. You probably saw that coming.
(To be continued.)
* Those six being Spaces, Spotlight upgrades, iCal upgrades, Accessibility upgrades, 64-bit UI libraries, and Core Animation.
** Main Stream Internet and Technology Media. Obviously.