Applying Linked Data Concepts To Derive A Global Image Search Protocol

June 3, 2013 By Ralph Windsor in Semantic Web

Time Strehl, whose blog we were feature on our resources page, has written a post about applying Linked Data to problem of a universal search facility find for finding images on the web that allows descriptive metadata and licensing information to be integral element of both searching and the results found. I would have to agree that the current process involving either a mix of specialist sources like stock media libraries or legally dubious options (such as Google Images) is ‘suboptimal’ as he describes.

Tim examines the various current alternatives, including proprietary APIs, like Getty’s Connect and some more open examples such as CMIS and, as he says, the problem with those is having a multiplicity of different approaches with reams of documentation just to check more than one repository. As an alternative he considers a standards-based approach where RDFa data is embedded into an associated HTML page with each image. This is the technical description:

“The protocol and format should be HTTP and HTML with RDFa: HTTP and HTML (and the ecosystem of browsers and search engines) have proven to work well at “Web scale”, with millions of producers and billions of consumers of information. HTML is readable by any human with a Web browser, which is its killer feature. And RDFa seems to win the race against microdata for semantic markup within HTML. (The current discussion on embedded metadata in image files is important as well, but in HTML it’s so much easier to access and modify that I see it as the primary data source.)” [Read More]

For those whose eyes glaze over at this kind of stuff, there is also this more business oriented outline of the benefits :

“Image licensing is an existing market with some money on the table. There is an incentive for both producers and consumers of digital images; finding the right photo is hard and copyright and licensing become increasingly important. (Plus it helps that it’s potentially a global market with few barriers: If you find the perfect photo of a rose, it shouldn’t matter that it was taken by an amateur who lives on a different continent and doesn’t speak English.)” [Read More]

Essentially, what I think Tim is suggesting is a universal protocol where images get described like web pages (HTML) so you can crawl them using search engine techniques but where the nodes at the end are images. This is an interesting idea and it would definitely be useful if you could get an image search facility that works as he describes – it would certainly help the cause of those developing DAM systems too (and vastly improve interoperability and integration between products).

There are, however, some powerful forces that might not be so keen. For a start, the larger commercial stock media operations depend on scale and branding to retain a lot of their current market – they are perceived as both comprehensive and ‘business-safe’. Allowing their media out into the open for some third party to index – who they probably regard with wary suspicion (e.g. Google) is likely to be a step too far. If they are not participating, then you still have to go into their proprietary libraries to find images, which will slow adoption.

There is also the embedded metadata issue with this and where that stands. One of the reasons use of IPTC, PLUS, XMP etc is considered to be a good idea is the reduced risk of separation of the metadata from the asset. With copyright this means the ownership could be lost (hence the concerns over social media sites stripping metadata and generating orphan works in the process). That said, I don’t think the two need to be necessary mutually exclusive and in a web based solution, it could mean that the image and metadata can be more closely linked rather than less, it’s a case of mirroring one with the other, not choosing between the two methods.

Overall, I think Tim’s idea is a good one and it’s one of the examples where the knowledge and expertise of software developers (when applied to some of the challenges of the stock media industry) can help rather than hinder progress. I suspect that both commercial interests and the complexity of getting everyone in the world to both agree and implement something like this might mean it is some time before it becomes a reality, but the benefits to image suppliers and users alike are clear and that may generate the required momentum to see some gradual progress towards it over time.

Share this Article:

Related Posts:

Semantic Web

Applying Linked Data Concepts To Derive A Global Image Search Protocol

One comment

Leave a Reply