OpenAI GPT-4 image input support: 6 ways it’s a game-changer for DAM

This feature article was provided by Dan Huby, Chief Technology Officer at Montala Limited, the company behind open source DAM software ResourceSpace.

 

As the world of DAM systems keeps evolving, the integration of AI is playing a huge role in shaping the future of how we manage our digital assets. One exciting development on the horizon is GPT-4 support for image input (a so-called “multimodal model”). In this article, we’ll delve into the possible benefits of incorporating GPT-4 image support into digital asset management systems.

Background

In recent months, GPT-3 and ChatGPT have been making waves across the globe. It’s difficult to ignore the buzz — some of which might be exaggerated — around this being a new era of AI, and potentially the first step towards Artificial General Intelligence (AGI).

AGI represents a versatile AI that can adapt to any situation, much like humans. While we might still be far from achieving AGI, GPT’s ability to address a variety of problems within a narrow scope is noteworthy. As a language model, it has primarily dealt with processing and generating text. This already has significant implications for digital asset management (DAM): metadata translation, generating variations, and automatic categorization based on textual content, for example. Several vendors, including ourselves, have already integrated this technology into their software.

What makes GPT-4 particularly fascinating is its “multimodal” capability, enabling it to process images as input. Considering the substantial amount of visual content uploaded to DAM systems, the potential applications of GPT-4 are immense. The model can take photographs or video frames as input and execute virtually any task defined in a text-based prompt, resulting in textual output.

Before we get started, please note that, at the time of writing, multimodal support is currently only accessible to a very small number of researchers. To generate the examples in this article, GPT-4 was prompted with a text-based, detailed description of the photo, asking it to imagine the visual content. However, in case this may seem like it might provide unrealistic results, GPT-4’s multimodal capabilities already demonstrate remarkable potential, surpassing this example, by understanding and explaining jokes. Impressive stuff!

Let’s explore some of the ways this could be hugely beneficial for DAM:

1. Automated Asset Categorisation

One of the major challenges for DAM systems is the efficient categorisation of digital assets. This can be time consuming for administrators. A lot of headway has been made automating this using earlier technologies such as Google Vision and OpenCV (among others). GPT-4’s image support capabilities could really take this to the next level, being “steerable”, to a great extent, using prompts. For example, it would be able to instruct GPT-4 to look for specific products within an image and tag them.

Much more accurate categorisation would allow DAM users to quickly sort and retrieve relevant assets, reducing the time spent on manual tagging and increasing overall productivity.

(image by Claudio_Scott from Pixabay)

As an example, the keywords GPT-4 generated for a representation of the above photo was: woman, swimsuit, inflatable flamingo, swimming pool, sunglasses, relaxation, sunshine, summer

2. Improved Image Descriptions

Textual metadata plays a crucial role in the usability of DAM systems. With GPT-4’s image recognition capabilities, DAM platforms could automatically generate rich descriptions for uploaded images. This not only improves search functionality but also enables better compliance with accessibility standards, as visually impaired users can access image descriptions through screen readers.

The GPT-4 output for a representation of the above photo was: “A joyful woman wearing a stylish swimming costume and sunglasses relaxes on a vibrant, oversized inflatable flamingo in a sunlit swimming pool. She exudes happiness as she basks in the warm sunshine, enjoying a leisurely summer day.”

3. Automated Content Creation

The integration of GPT-4 image support into DAM systems will revolutionise the way we create and repurpose content. Given appropriate prompts, the ability to understand and describe images means GPT-4 could suggest relevant copy or captions to accompany visual assets. This would significantly speed up the content creation process.

As an example, here’s the GPT-4 output for the prompt “Generate marketing copy for the flamingo” for the above photo:

“Make a splash this summer with our vibrant, oversized inflatable flamingo! Experience the ultimate relaxation as you float effortlessly on crystal-clear pool waters, just like the happy sunseeker in this photo. Perfect for pool parties, beach getaways, or simply soaking up the sun in style, our giant flamingo ensures that you’ll be the envy of everyone. Grab your shades, slip into your favourite swimsuit, and get ready to create unforgettable memories on this fabulous floating flamingo!”

4. Customised Asset Recommendations

GPT-4’s ability to analyse and understand images could lead to the development of personalised asset recommendations. By understanding the content and context of images, DAM systems could suggest similar or complementary assets to users, making it easier for them to find the right assets for their projects.

5. Content Moderation

Content moderation is the process of monitoring and filtering digital content to ensure it adheres to established guidelines, policies, or community standards. In the context of a DAM system, content moderation is crucial for maintaining a professional and secure environment for users to work with assets that are in line with the organisation’s values and legal requirements.

GPT-4’s multimodal capabilities can significantly improve content moderation by identifying and flagging inappropriate content. It could analyse images and video frames to detect explicit or harmful content. Once identified, the model can automatically flag these assets for review or removal, ensuring that only appropriate content is accessible within the DAM system.

6. Creative assistance

This idea is more speculative, but it offers an intriguing area where we could see impressive developments. GPT-4 has the potential to analyse existing assets in a DAM system and create innovative ideas or concepts based on their visual and textual elements. This can stimulate the creative process and encourage designers to explore new approaches. GPT-4 can process images or videos to recommend colour schemes, layouts, typography, or other design elements that enhance a particular asset too.

Conclusion

The inclusion of GPT-4 image support in digital asset management systems can bring about significant changes in the way we handle and use digital assets. This enhancement promises better categorisation, metadata generation, content creation, and search capabilities, leading to an overall improvement in DAM systems. As AI technology evolves, DAM systems are set to become increasingly efficient and user-friendly.

 

Dan Huby is Chief Technology Officer at Montala Limited, the company behind open source DAM software ResourceSpace.

Share this Article: