3 Metadata Tasks that GenAI Can Automate

This feature article has been provided by Jake Athey, VP Go-To-Market and Sales for DAM and PIM at Acquia.

 

Generative artificial intelligence (GenAI) is not as widely used as one might assume. Only 5.1% of American companies use AI to produce goods and services, reported The Economist in August, down from 5.4% earlier this year. Articles debating whether GenAI has any value to businesses have become standard fare in the media.

The DAM space is different, in my opinion, because we’re long past wondering whether AI has value to us—it does. AI keyword auto-tagging has been in use for roughly a decade now, and DAM vendors quickly integrated large language models (LLMs) into their platforms following ChatGPT’s debut in November 2022. The value proposition for AI is clear: make assets more searchable and usable in less time and at lower cost.

Over the last two years, my colleagues and I at Acquia have tested GenAI on a variety of metadata tasks to see what we can automate. We’re particularly interested in whether OpenAI’s GPT-4 can generate product descriptions, more helpful keywords, and alt-text, so we conducted a study and wrote up the results for the peer-reviewed Journal of Digital Media Management (JDMM).

Rather than recap the entire study, I want to explore the actionable takeaways—the ways DAM practitioners can use GenAI right now to automate metadata workflows.

The Unavoidable Three

There are three metadata-related tasks that consume a disproportionate amount of human time and brainpower to make content searchable and accessible. Our focus today is on these:

  1. Keyword tagging for product photography. Keywords capture the product type, activity, place, environment, demographic, emotion, and more. They enable DAM users to search, browse, and filter the best assets for their use case.
  2. Product descriptions. These range from one-liners to complex descriptions that use style, perspective, and narrative to sell the product. These descriptions test an AI’s understanding of an asset and can accelerate the drafting of product descriptions by giving copywriters a starting point. They can also stand in as alt-text.   
  3. Alt-text for accessibility. Alt-text enables visually impaired individuals equipped with screen readers to understand the substance of visual media on websites. Every visual asset in a DAM system should have alt-text attached before it’s available for deployment.

Other AI use cases include translations between languages (doable before AI), text extraction from images, video transcription, and AI-generated summaries of PDFs, slide decks, and documents. These are great ways to enrich metadata, but they are not as tricky (especially with a DAM system configured to grab this metadata automatically). Either the text is right or wrong—there’s little subjectivity involved. Keywords, product descriptions, and alt-text are more nuanced.

In these use cases, DAM admins care most about two qualities: accuracy and precision. Accuracy is about whether AI identified the content and context of an image correctly. In the JDMM study, we evaluated that based on a one-line product description. Precision is about whether keywords were factual, relevant, and plausible as search terms. Depending on how you prompt GenAI, the results can vary a lot.

Being AI’s Boss

How do we get an AI to perform the repetitive but important DAM-related tasks that have to be done? It starts with skillful prompting.

In our study for JDMM, we wanted GPT-4 to draft one-sentence descriptions and 10 keywords for products covering six categories: Bicycles, Food & Beverage, Home Goods, Office Furniture, Footwear, and Tools. We had 10 images for each category and developed six prompts, yielding 360 results to analyze.

Here’s the template we used to inform our prompts.

  1. Context: The overarching purpose of the exercise.
  2. Persona: The character or role AI plays.
  3. Task: What the AI is expected to do.
  4. Steps: The process by which the AI will complete its task.
  5. Specs: Guidelines, definitions, and parameters the AI should follow.
  6. Format: How the response should be rendered.

In each prompt, we changed only one of these six bulleted components while keeping the others unchanged from an original, base prompt. Then, we compared how the prompts performed on accuracy and precision.

What worked best

To be clear, we didn’t enter prompts and images manually into GPT-4. My co-author Jacob Williamson, Senior Software Developer at Acquia, created an automated upload profile in Acquia’s DAM system to run each prompt on all 60 images. Of the six prompts, this one performed best if we average scores for accuracy and precision.

## Context

We are trying to make product images in a digital asset management (DAM) system more searchable.  

## Persona

You are a DAM administrator.

## Task

Come up with 10 keywords that each describe a different aspect of the image shown. Keywords may refer to qualities like the product name and type, setting, actions, season, demographics, emotions, and others.

## Steps

1/ Write one sentence describing the image

2/ List 10 keywords that a person would use to find this image via a text search

## Specs

A good keyword is a noun that a marketing or ecommerce professional would use to search for a product image in a DAM system. That image will be used in digital content such as an advertisement, social media post, ecommerce product listing, or web page.

## Format

Number the keywords 1 through 10 in a bulleted list.

A few things to call out. First, designating the persona as a “DAM administrator” is important. When we tested with another title, “marketing technology specialist,” performance dropped significantly.

Second, this is the only prompt in which we gave GPT-4 examples of what keywords could refer to: “…product name and type, setting, actions, season, demographics, emotions, and others.” Surprisingly, this prompt performed worse than four out of its five peers on keyword precision, though only by 1% or less. Perhaps GPT-4 needs examples of real products, keywords, and descriptions that are done well instead of general terms describing types of keywords.

Third, notice that we emphasize the purpose of this exercise—making images searchable in a DAM system—three times, in three different ways. That was intentional. LLMs consider the order in which information is presented (what you say first it deems more important), and anecdotally, they seem to get sidetracked if you don’t remind them of their mission often enough. In that respect, they are like people.

That easy? Not quite

The top-performing prompt, run against a generic version of GPT-4, wasn’t perfect. Its accuracy rate of 88% means that 12% of the product descriptions it generated were dead wrong or too flawed to be used for anything, like alt-text. Its precision score of 91% means that 9% of the keywords it tagged were untrue, misleading, or unhelpful to DAM users.

With alt-text, having something isn’t necessarily better than having nothing. If 12% of your alt-text is wrong, that is not only embarrassing but does a disservice to people with disabilities. With keywords, the reverse may be true. If 9% of your keywords are wrong, there’s still plenty of keywords that will retrieve what a DAM user wants.

Which brings me back to the value of GenAI in DAM administration. Although it requires careful oversight, using GenAI is more efficient than doing everything manually for fear of mistakes. Moreover, the DAM admins willing to craft and test custom prompts will find ways to elicit more accurate and precise metadata. LLMs are surprisingly sensitive to small changes in how we communicate with them and what we ask them to do.

Don’t let the media’s disillusionment with GenAI deter you from using it. We know what we need GenAI to do—we just need to put in the work to make it happen.

 

About Jake Athey

Jake Athey is

You can connect with Jake via his LinkedIn profile.

Share this Article: