AI vs. AR in “Mirrors” and Photo Booths
Over the past four years, we’ve created a ton of smiles with our “Squad Snaps” style photo booths (initially developed for the Baltimore Ravens).
Lately, we’ve been exploring how the latest and greatest technologies, specifically Generative AI (AI) and Augmented Reality (AR), can enhance the experience for fans and visitors.
Adding AI (image processing) and AR (face and body effects) to our kiosks has been a learning opportunity - something we pride ourselves on at Balti Virtual and embrace as a core value of being “expert newbies.”
Our main takeaway is that while both technologies can provide similar experiences, each has unique strengths and weaknesses.
The “OG”: AR
AR-powered mirrors are an extension of what Snapchat has been doing for almost a decade - essentially mixing video game graphics with a live camera view, and using computer vision to understand face and body landmarks (i.e. making sure those cat ears line up with your real ears).
These experiences can be a bit more simplistic in terms of visuals (essentially “video game graphics”) but much more responsive and interactive than their AI counterparts.
Since the graphics are generated in real-time, AR Mirrors provide a live “what-you-see-is-what-you-get” experience. Users see the results immediately as if they are standing in front of a magic mirror.
This makes recording video content really easy, as users get constant feedback on what the system is doing. These types of installations can even run interactive games. The downside is that they require a lot of effort to create—generally weeks of modeling, rigging, animating, etc. This means longer production times and, ultimately, higher costs.
The “New Kid”: AI
Since AI is being added to everything from sunglasses to air fryers these days, I should be a little more clear about what we’ve done here. Specifically, we are experimenting with an open-source image generation toolkit called “Stable Diffusion” and using it to process photo booth images in interesting ways.
This type of AI can create much more realistic images with far less creative input than the AR experiences I discussed above. Where a standard AR experience can take weeks of development time (modeling, rigging, animating, etc.), once a framework is created, a comparable AI-powered experience requires only a handful of inputs, which can be as simple as a few dozen typed words.
Each of these photos was generated with the exact same source image and code, the key difference is the text prompt used by the AI.
On the other hand, the AI approach has two serious technical drawbacks: hallucinations and latency.
Hallucinations are real (an oxymoron I’ve been repeating a lot lately), and it’s not uncommon for AI to generate weird/bad content. For now, we recommend having a human review of the output and/or letting users know what to expect.
The other issue with AI-powered mirrors is latency or “lag.” Depending on the technical approach, processing each image can take anywhere from a few seconds to a few minutes. This isn’t the same “magic mirror,” real-time interactive experience that AR provides.
The silver lining to these two challenges on the AI side is that this tech is evolving insanely quickly. Where new developments in the AR/VR world generally drop on a monthly-to-yearly cadence, the landscape of AI is changing almost weekly.
There are already some real-time uses of AI image processing, but the results are currently artistically limited.
On the moderation side, things are slowly improving, and it’s not hard to imagine training a new model based on moderation results that might get us from 80% accuracy to 95% or so. That said, there will always be the chance of a bad image, so our team is suggesting either leaning into this “Hey, it’s AI, so who knows what you’ll get” or carefully moderating output (which adds even more lag between a photo being taken and received).
So the Winner Is?
At the risk of a cop-out here, I was surprised to see that both technologies have their place, and the big players in the industry seem to agree. AR is great when accuracy or interactivity are important, and AI is a way to offer this type of experience at a lower price point. These lines here are blurring more and more as time goes on, as a lot of AR technically uses AI (or machine learning to be more precise).
Just last week, Snap Inc. released the latest iteration of its world-class content creation tool, Lens Studio, which features more AI tools than ever before, including some wild generative AI tools aimed at reducing the workload on creators.
As these technologies continue to evolve and intertwine, we’ll continue to do our part as expert newbies, staying on top of the latest developments and helping our clients chart a path through the ever-evolving tech landscape.
Until next time!
Comments