As work on the IKS project progresses, my (extremely) pragmatic mind keeps going back to the how can we make this simpler? question.
One of the major goals of IKS is to create semantic extensions for content management systems, but what does that mean? The exact use cases are still vague, and in such a situation it is too easy to over-engineer things, just in case.
We have been talking about RESTful interfaces to IKS components for a while now, but what does this mean exactly? How can we make a concrete step towards defining such interfaces?
I'm a big fan of small concrete steps that lead us towards pragmatic solutions, so let's try to take one such step.
Machine-level use cases
Let's start by defining a few simple use cases, at the "machine level": a content management system is the client, and the IKS semantic engine the server. We have discussed this already within IKS, here's a synthetic summary:
- Semantic lifting
- Let IKS extract semantic information from (multimedia) content: person and place names, structured links between content items, etc. Optionally make this information editable/confirmable by the client system, as a human user might have to refine the system's suggestions.
At the machine interface level, this requires registering content with the IKS semantic engine, reading the resulting semantically lifted document, and optionally modifying it.
- Classification and auto-tagging
- Let IKS suggest categories and/or tags for pieces of multimedia content. If an author validates the suggestions, inform IKS of what choices were made.
From the machine interface point of view, this is very similar to semantic lifting.
- Query building assistance
- Let IKS assist users in formulating search queries, interactively.
From the machine interface point of view, this is very similar to semantic lifting.
- Similarities, correlation
- Let IKS find similarities between pieces of multimedia content. The axes on which those similarities are found can vary: images, for example, can be graphically similar, or similar in terms of the real world entities that they display.
At the machine interface level, this requires registering content with the IKS semantic engine, and later running queries against this content.
This simple list already hides significant complexity, yet those use cases should be understandable by Joe Author.
Enabling those four use cases could add a lot of value to existing and future content management systems, depending on the quality of the semantic components.
RESTful interface
Let's design a RESTful interface based on the machine interactions required to implement the above use cases.
Remember that, in what follows, client designates a content management system that wants to use the IKS engine.
Register content with IKS
To build knowledge about our content, IKS needs to be able to find it. In RESTful terms this means providing IKS with an URL that points to said content, so we have:
Rule #1: Content is registered with the IKS server by HTTP POST requests, containing lists of URLs that point to (created or modified) content items.
Rule #2: IKS reads content by making HTTP GET requests to registered pieces of content. Those URLs must return Content-Types that IKS understands. Some Content-Types are preferred and allow IKS to better understand the content.
Semantic Lifting
Once content is registered, the client can request a semantic view of that content from IKS. That view lists semantic entities that have been extracted from the content.
Depending on the IKS implementation, the semantic view can be editable. It is retrieved by a GET request that contains the IKS identifier (provided by IKS when content is registered) of the content item, and modified using an HTTP PUT request.
The Content-Type and data formats use existing standards, as far as possible.
The semantic view includes IKS-specific metadata, for example to indicate that some parts of the semantic view are still being computed.
Rule #3: The semantic view of a content item is retrieved with a GET request, and if editable can be modified by a PUT request of the modified version.
Semantic queries
Semantic queries are implemented using GET methods on various query URLs, that define how the query is interpreted.
Results are returned with similar Content-Types and data formats as used for semantic lifting.
Rule #4: Semantic queries are executed via GET requests, and return the identifiers (URLs) of the selected content items, optionally with some contextual info to display on query result pages.
IKS engine status
Semantic lifting and indexing operations might take some time, so it's useful for the client to have information on the engine's status, in machine-readable form.
Rule #5: The IKS server reserves part of its URL space for system status information, and provides status information in a structured format.
Is that it?
I think that's it - these simple RESTful interactions should be sufficient to implement our use cases.
What's left is to define the Content-Types used, and for this we can most certainly use existing formats, no need to reinvent any wheel here.
RESTful IKS framework
The proof of the pudding is in the eating, and if we wait too long the pudding might lose its taste...so why not start buiding this right away?
Purists might (rightly) argue that the above is not a design, just a somewhat vague set of principles. Yet, combined with a prototype implementation, this might be a very good way of making a step in the right direction, and of clarifying requirements and interfaces.
My suggestion for the next steps is as follows:
-
Implement the above interface, using dummy semantic components.
-
Provide system interfaces to integrate actual semantic components (semantic lifting, classification, auto-tagging, querying) as plugins.
-
Researchers can work on the semantic lifting components, and integrate them without requiring significant changes on the client side.
Conclusion
The best way to go forward with this is probably to create an open source project to collaborate on this RESTful IKS framework.
Even if that framework is thrown away later as the IKS architecture progresses, if would allow IKS consortium members to build a much better understanding of what's actually needed to add "semantic value" to existing and future content management systems.