How Hard Could It Be, Part 6 (of 6).
Previous parts:
The algorithm we've developed works well for its original purpose, which was to locate a person in a webcam image, but it's also pretty robust in more difficult situations. Here are some examples, from situations with considerably more complex foregrounds and backgrounds:
Hand against a varied background
In this image, a hand is placed in front of a wall that varies from completely white (overexposed) on the far left to completely black (underexposed) on the far right. The hand remains whole and identifiable despite the brightness gradient behind it.
Forks on a table, soft light
Here, three forks lay on a table. The table is lit indirectly; soft shadows are cast. The algorithm captured the more complex outline of the forks pretty well.
Wires on a table, harsh light
Here, some power cables and miscellaneous other crap lay on a brightly lit table. Notice that here the shadows become part of the foreground, since they darken the table's coloration and are picked up by the standard deviation algorithm*. Nevertheless, no wires are missed, and the amount of background brought into the foreground isn't too bad.
So this works, right? We're done! Well .. yes and no. Yeah, it works, but there's another part of the original problem we've overlooked: performance.
If you watched the seminal Apple demo linked in part 1, you know that their background replacement occurs live, and at 30 frames per second. The algorithm we've developed here runs at .. 0.125 frames per second. That's 8 seconds per frame. Admittedly, it is written in Java (possibly the slowest environment I could've chosen), and it is a painfully sub-optimal implementation, but it's important to realize that the approach we've developed may be too slow for real-time applications. Certainly, the implementation I have right now is too slow.
Worried about this, I looked around to see if I could find a more optimal implementation of some of our ideas, and I was able to find an approach submitted to the Iron Coder 4 competition a few weeks back. The winning entry, "Spacecam," implemented an approach that produces somewhat similar results to ours from Part 3:
Spacecam
Part 3
The interesting thing about this program is that it's implemented in Objective-C (the development language of choice on the Mac), and it makes use of the Core Image framework. (Core Image is a super cool technology in Mac OS X that allows custom image processing to occur on the computer's video card – CS nerds take note.) Long story short, the Spacecam implementation, though neither optimal** nor especially analogous to our own, is a much more useful point of comparison than my Java implementation. And the Spacecam implementation runs at about 15 frames per second on my computer. Not 30, but much closer to 30 than 0.125. It seems to indicate that our implementation is, if not completely feasible, at least not wildly off the mark.
If I ever get around to learning Objective-C and implementing this approach in a more optimal fashion***, I'll be sure to revisit the topic. Otherwise, thanks for reading through the series. I hope it drew back the curtain a bit on some of the more magical things computers are doing these days.
* There are some truly amazing algorithms for dealing with shadows, but they're beyond the scope of this series.
** To be fair, the Iron Coder participants compete to create finished software in 48 hours or less, so optimization is definitely not a priority.
*** Aside to CS nerds: the good news is that all our algorithms are easily parallelizable! The standard deviation, run-length, and morphological operations all operate on each pixel individually. The bad news, of course, is that each operates on the output of the previous operation, so the algorithms themselves must be applied sequentially. Still, though, that's not bad – and since each frame is treated independently, multiple frames can be processed simultaneously. Man, I'd like to do this up in Core Image.
1 Comments:
Glad to help (and to have a cameo in your blog). Good job with the final implementation. I'm especially impressed with the image of the forks on the table, that did really well.
Post a Comment
<< Home