Fig 1., clockwise from upper left: (i) right camera image, (ii) real-time, 30fps, synthetic, virtual camera image, (iii) left camera image, (iv) features detected in left image, (v) superposition of left and right images, (vi) features detected in right image |
This application, VirtualCamera, uses StereoVideoLib to demonstrate a stereo video view synthesis algorithm.
Download VirtualCamera: virtualcamera20060731.tgz
Note: You will need to have two cameras connected to your Mac. I use two external, Firewire iSight cameras. If they aren't connected the application will die without an error message.
I develop an algorithm that combines video streams from a pair of cameras into a single output video that appears to have been captured by a third, "virtual", camera. The objective is to create the cleanest, most accurate output video given the constraint that the algorithm run at 30 frames/second on a personal computer and leave ample resources available for other tasks (ex., encoding and transmitting the video).
My interest in this project stems from a desire to have a true "face-to-face" video conference (or video chat). Typically when video chatting (ex., with Apple's iChat) you'll see a view of the other party's forehead, as depicted in the images on Apple's iChat page. This is a result of the fact that the other party's camera is placed above their monitor. To see your image, they will need to direct their head and eyes downward from the camera toward the monitor. As a result, the camera captures -- and, thus, you see -- an image of a downward-tilted head with downward-pointing eyes.
To have a face-to-face conversation would require the camera to be placed directly behind the monitor. This is typically physically impossible (but maybe not forever), so one seeks an approximate solution. One such solution is to use view synthesis to create a virtual camera. (Seitz1995, Criminisi2003)
Fig. 1, above, shows an image from such a virtual camera (top row, center). Notice how the subject appears to be facing the camera. The source images (top row, right and left), however, were captured by cameras placed to the left and to the right of the subject. Fig. 1 is a screenshot of the VirtualCamera application running on a MacBook Pro. We will refer to the algorithm that generates these virtual views as VC. It is the algorithm under development here.
The application also implements a second algorithm, which we will call DP, that uses dynamic programming. Algorithms like DP have been studied and written about extensively (Ohta1985, Criminisi2003). This implementation is simplistic compared to the state of the art dynamic programming algorithms. It is included here only for comparison to VC. [For the curious: The dynamic programming algorithm used in view synthesis is an elaboration of dynamic time warping.]
While the application is running, use <CTRL>-<OPTION>-<C> to switch between VC and DP. Note the following:
References: