If you’re a serial video conference latecomer who tries to slink into calls on the sly and quietly find a seat in the back undetected, the new breed of smart webcams might be your worst nightmare. If, however, you’re always on time and don’t want your meeting delayed while you configure the webcam to properly frame everyone around the table, the same cameras are a dream come true.
The latest form of webcam auto tracking is auto framing, which–as its name suggests–automatically adjusts the zoom, pan, and focus of a camera to keep everyone in view, even if they move around during a meeting.
It’s a novel addition to video conferencing that is something of a double-edged sword. Yes, we’d love to go hands-free with our camera even when someone walks over to the whiteboard to write out some numbers, but no, we don’t want the picture to start jumping around every time someone enters the room or reaches offscreen for a mug of coffee.
What we want is a form of auto framing smart enough to do the former, without overreacting to every unimportant movement–and we kind of already have that in active-voice tracking.
Dolby and Logitech Preview Auto Framing
There are two versions of auto framing so new they’re not yet publicly available. The first is part of Dolby’s first venture into video conferencing camera development and forms the heart of its Dolby Voice Room system that should go on the market any day now. This new 4K, high-dynamic range smart cam can do just as we described above and track and frame the movements of every member of a video call.
You can see an idealized version of it in the corporate video below:
Unfortunately, the complete kit that this cam comes in will cost you around $4,500, but it does look quite impressive.
If you don’t want to spend that much for your auto framing, you could buy a Logitech MeetUp camera for $899 (we recently road-tested the MeetUp with a group of experienced video callers and were impressed with its performance) and wait for its built-in version of auto framing to be added as a software upgrade within the next few months. Microsoft was so impressed with the feature it unveiled it ahead of time at the recent Enterprise Connect conference. You can see it in action below:
There’s no denying that both versions of auto framing look very clever. It’s easy to make a case for why’d you want this in your life. So, we will.
Webcam Auto Tracking Is a Must-Have Video Conferencing Item
For the sake of this post, we’re going to presume both versions of auto tracking and framing work without a hitch, that they’re responsive and provide a sharp picture as they move from one viewpoint to the next.
This is set-and-forget instant video conferencing come to life. Set it up in your conference room or, as is becoming more common, your small huddle room, and you won’t have to configure it again regardless of how many people are in attendance. If there are only two of you, you can sit anywhere and the auto frame should zoom in and frame you so you can be easily seen. If there’s a whole gaggle around the table, the camera should automatically zoom out to get you all in the picture.
If, as we mentioned earlier, you want to stand up and walk over to the whiteboard to give a visual aid to your presentation, the camera should adjust again to make sure your remote colleagues can see what you’re doing. The same applies if someone wants to attend only part of a meeting. Once they get up and go, the camera should zoom in a little to frame the remaining video callers.
Without an auto framing feature, these adjustments would require, at the very least, a human to step in with a remote control and put everyone through a vertigo-inducing spin while they reframe manually. Plus, you’d have to make sure everyone arrived just a bit early in order to set up your shot before the call.
On the other hand, a human with a remote would know not to zoom in unnecessarily on the latecomer who’s trying to blend in at the back. We doubt either the Logitech or Dolby cam is that clever.
Could Voice-Tracking Work Better?
How is a computer supposed to distinguish between important and irrelevant motion? Instead, using visual cues, you could get the camera to listen to what’s going on and focus on the current speaker. This kind of active-speaker tech is already employed in video calling platforms, from Google Hangouts on your phone to Cisco’s elaborate Spark Room System package. The video below shows Cisco’s version in operation in a real-world setting:
Once a meeting is in progress, this vocal tracking performs essentially the same function as the visual version, although it does crop out the passive in-room audience around the speaker–something that may or may not be an advantage during a call. Of course, voice tracking would be no help in setting up the video call in advance, relying as it does on audio cues. Cisco’s audio version of the framing technology also appears to suffer from a significant–perhaps even deal-breaking–lag when it comes to finding and zooming in on the speaker.
So, maybe we just give Dolby and Logitech the benefit of the doubt and see what they have created for us once each version hits the market. If hands-free video conferencing has to come at the cost of shining a red-hot light of shame on the latecomers, then we might just have to learn to live with that…and learn to be on time.