Will Synthetic Stock Content Save DAM?
Sometime in the near future, the marketing team at multinational beverage company in Europe receive a brief for a regional advertising campaign that calls for images of a “refined British gentleman in a bowler hat standing on the lip of a volcano sipping a cocktail.”
Rather than organizing a photo shoot – estimated to cost between $15,000 and $20,000 and involving casting calls, models, a photographer and assistants to travel to Iceland – the project lead went to the company’s DAM and typed in a few keywords to generate a synthetic image of lensless media indistinguishable from any human produced photograph the team had ever previously seen.
Sound far-fetched? Think again.
Licensed synthetic stock media, not to be confused with deepfakes, is already here and will soon be available for licensing via numerous platforms like DAMs, online content editors and web-based marketplaces.
This isn’t your grandfather’s microstock or rights managed photography. Licensed synthetic media is wholly unique, photo-realistic images generated by algorithms. No camera, lights, studios, remote landscapes or models involved. Just code and lots of special sauce.
As my colleague Michael Osterrieder, the founder and CEO of vAIsual, recently wrote, “Machine learning algorithms have revolutionized the way we handle, process, and understand information on a very profound level. For the first time in history the capacity of current GPUs, is sufficient to digest, order and reorder an enormous amount of information and data which by far exceeds the capabilities of the human brain.”
So what’s the software and tech behind all of this?
Generative adversarial networks or GANs and Generative Pre-trained Transformer 3 or simply GPT-3.
“GANs are two neural networks pitted against each other,” wrote Rhea Moutafis, a PhD student at the Sarbonne, in an article on GANs in Towards Data Science in January 2020. “The first neural network is called the Generator. It generates fake data points and passes them to its opponent. That’s the second network, the Discriminator. Its job is to tell which data point is real and which is fake.”
When given a real life dataset, the GAN teaches itself to generate new images based upon the dataset.
GPT-3 is a completely different beast. Wikipedia notes that GPT-3 is “an autoregressive language model that uses deep learning to produce human like text” and, it turns out, can also generate synthetic content from natural language text input.
In the world of licensed synthetic media, GANs and GPT-3 are the algorithmic version of a sausage maker.
Raw, real-life photographic datasets are fed in one end and once processed issue forth synthetic images indistinguishable from real photographs but thousands of visual iterations away from their source data.
Birth of a Frankenstein or the biggest disruption to photography since the arrival of digital cameras?
Companies such as NVIDIA, Google and Elon Musk’s OpenAI have been deep into development and testing of their own respective efforts, with numerous computer vision entrepreneurs and startups recently equaling the outputs of these Silicon Valley giants.
While proprietary GANs such as NVIDIA’s StyleGAN allow users to modify the image’s latent space – change hair color, eye color, ethnicity, background color and style, hair length, age, and gender – it’s OpenAI’s Dall-E that will most probably capture the imagination of content creatives and DAM vendors.
Dall-E, unlike StyleGAN or Google’s BigGAN, allows users to type text prompts – just like the subject of the above fictitious marketing campaign – and actually generate results.
A happy, multi-cultural family sitting on a red sofa watching TV? Sure. A stylish Labradoodle surfing waves off of Hawaii? No problem.
“Instead of being the tool to search a repository of content, metadata is the tool to generate it and effectively the DNA code of the asset itself,” wrote Ralph Windsor in a recently published article on DAM News.
The ability to generate one of a kind licensed stock content from your DAM’s toolbar will certainly be a game changer, but what has to take place before that occurs?
Let’s start with the mechanics.
Licensed stock media sourced from marketplaces like Shutterstock, Adobe and Getty, as well as those custom created for single clients, all start with a few key ingredients: a creative team, a rock star photographer, models that accurately fit the role and a well-honed concept.
The same goes for creating high-quality, legally clean licensed synthetic stock media.
Constructing real-life datasets to generate synthetic content requires a combination both legal and studio production expertise, as well as a director of content’s ability to control 100s of thousands or millions of assets needed to produce results that turn science fiction into licensable content reality. The larger the real life dataset, the greater the diversity of the resulting synthetic media.
Legally clean datasets (real life images of real life models who have given legal consent to having their likeness used to train AI to generate licensed stock media) are the core of this technology.
“One of the most critical legal aspects in machine learning and dataset sourcing is that any ML algorithm depends 100% on the source dataset, said vAIsual’s Osterrieder. There is no genuine or original creative aspect to it. A GAN like StyleGAN or BigGAN cannot add to this source data by itself. All outcome has its root in the original data plus the interpretations of the AI and the ‘noise’ that has been added.”
Osterrieder continued, “We create our own datasets … and own the full copyright to any creation.”
Although the benefits of legally clean datasets are clear, it still hasn’t stopped a number of well publicized efforts from scraping the internet to source their training data; thus embedding continued legal issues for anyone licensing or managing the results. Others have sourced their data from stock agencies, another major potential legal mine field, as rarely do any of those images come with biometric model releases and or the consent of the copyright holder.
Biometric release forms give synthetic content producers the right to use the model’s likeness to create synthetic media without restriction.
Legally sound biometric releases include text such as:
The photographer’s rights include … “Use the Content for data mining purposes, as part of datasets to train neural networks and to use the content in all types of computational and manual processes which generates derivative and/or derivative synthetic content generated by any type of algorithm or license the content for above mentioned purposes to third parties.”
Like their traditional photography counterparts, licensed synthetic content requires good, solid metadata to be discoverable.
Synthetic content generated by a GAN seems to get by quite well with embedding each asset with traditional IPTC metadata; albeit with a highly engineered fixed vocabulary to take into consideration GAN-generated synthetic content’s on the fly ability to modify age, gender, ethnicity, eye color, hair color, hair length, etc.
Dall-E generated images, on the other hand, require metadata more flexible than traditional keywords, and closer in look and feel to alt-text currently used to enhance the viewing and discovery experience for people with disabilities.
So what exactly will prompt content creatives and DAM vendors to embrace synthetic content?
- First and foremost, licensed synthetic stock content will friction away rights and legal issues currently associated with traditional stock imagery.
Anyone who has been on the receiving end of a legal letter from any stock agency, photographer or copyright attorney will immediately be attracted to this model.
- Licensed synthetic stock will eliminate content gaps.
Try finding stock photos of a couple enjoying dinner on the bottom of the sea, or even a photo of a child that looks just like the ad brief described? With synthetic stock media, this will be a few keywords away.
- Licensed synthetic media will reduce reliance on stock agencies and production companies.
See above for the foremost reason why creatives will gravitate to a technology which allows them to describe their visual media fantasies in a few words and actually see them generate on their screens in near real-time.
- DAM platforms will finally become true content hubs: places to store, manage, share and generate unique creative media.
Today’s DAMs have certainly come a long way since the dawn of the digital age in the late 1990s.
They are no longer solely on-prem, off-line and bereft of some kind of AI assistance. Many have recently partnered with stock media marketplaces to offer their users access to licensed stock and editorial media directly from the toolbar.But the latter offering is only a weigh station to where synthetic media and its associated technology is taking this industry, and today’s DAMs still largely require users to leave the platform to source content.
Integration with synthetic media producers will be both an accelerator and a disruptive force that will see many familiar sources of content disappear and new ones take their place.
First movers in the DAM space are predicted within the next 18 months to partner with licensed synthetic media producers to create white label offerings, and as a result instantly become content creation points themselves.
End users will find themselves overseeing volumes of wholly unique synthetic content with vast brand value and a greater need for the kind of high tech management that only DAMs can provide.
The DAM as the creation point, host and management hub for licensed synthetic content.
Want to learn more about how licensed synthetic content will revolutionize DAMs? We are actively seeking forward-thinking partnerships within this industry, and are available as a trusted resource for information on tech, ethics and legal questions. Let’s talk. Feel free to reach out to me directly or on LinkedIn