Posted by Philip Bulley | Filed under Portfolio
Kino for Hugo Just Different is all about seeing things differently. An interactive film directed by Marco Brambilla allows people to explore five different scenes through the shades of Theatrical, Performance and Story, simply by tilting their heads from side to side. The webcam monitors the tilt position of the head, which in turn invokes seamless transitions from one intricately gorgeous environment to another.
My role on this project was to lead the Flash Development from the prototyping stage through to delivery. Various prototypes were created whilst exploring how best to implement the webcam head tracking:
- OpenCV Marilena – This wasn’t used as it wasn’t as performant as I’d hoped, but mostly because it is only good at face detection when the face is in the usual upright orientation. Tilt the head and OpenCV will lose you. Maybe this could have been solved by writing a lower-level Haar Cascade descriptor file (the fontal face descriptor file alone is 26,161 lines of XML!), but before even contemplating digging that deep, it was definitely worth exploring a couple of other techniques.
- Motion detection – Based on frame differencing and blob detection, this would let Flash Player know if the user has moved their head and where to. Not a bad method, but we found that this really required people to move their head from one side (of the camera’s image) to the other, all too much work for the user. We really wanted something a little simpler, ie. the relatively low in effort action of the tilting the head.
- Frame Matching – this is the process of caching three calibration images, and comparing the current webcam input with them. Using the threshold frame differencing technique, it was possible to work out quite accurately which of the three calibration images the webcam input most closely matched. This worked really well and was what made it into the final cut.
The other challenge came in the form of video synchronisation, something Flash Player is not good at. Simply creating three instances of NetStream and starting playback at the same time will eventually result in time shifting between the videos. The solution was to use a single MP4 file stacked vertically with each video frame along with DisplayObject.scrollRect to define the visible area. Three videos each 960×540 would result in one video of 960×1620. It’s worth noting that the MainConcept H.264 encoder has greater restrictions when it comes to encoding videos of non standard dimensions. In which case the Sorenson H.264 encoder provides more flexibility.