Beyond the Viewer: fragments and links in annotation space
Web Annotations and the model provided by the International Image Interoperability Framework (IIIF) allow us to unlock and reuse content parts from published digital carriers and weave together new user experiences.
I’ve been reading Mysteries of the Rectangle by Siri Hustvedt, a collection of essays on painting. My paperback edition has colour reproductions, but the text refers to many pictures and only a few of them are reproduced in the printed book. I can visit Wikipedia on my phone and look at the paintings, and later when I get home I can do the same at my desk, on a bigger screen. I can find every single artwork on Wikipedia, often with a high resolution image that has far more detail than the reproductions in the book.
One of the essays is about Vermeer’s Young Woman with a Pearl Necklace. Hustvedt writes about her response to the painting, and a hunch that it echoes earlier Italian paintings on the theme of the Annunciation. One piece of evidence she presents towards this interpretation is what looks like an egg in the painting. “What’s that egg doing there?” she asks — a symbol suggesting pregnancy or fertility? She admits that the egg is “probably part of the window’s architecture”, but entertains the possibility that it is not an accidental trick of the light, an unintended visual similarity, but part of the painting’s symbolic language. After all, Vermeer has plenty of form in this area.
Naturally this is intriguing, a visual puzzle — she says nobody writing on Vermeer has mentioned this egg before. Is she reading too much into that detail? I want to take a closer look. My literal-minded reading of the image is that Vermeer is simply painting the light falling in the curved ornamentation on top of the window frame. It only happens to look like an egg; it’s an accidental optical illusion. I can’t be sure that’s all there is to it, so I want to share this detail with someone who knows about these things, attaching my own thoughts to it.
We don’t need anything special to go and take a closer look at Wikipedia’s high resolution image. I don’t need a new standard to take a snippet from a screenshot of Wikipedia’s image and email it to my friend. I could load it into a viewer for pan and zoom, as it’s quite a large image to download. But I don’t feel the need for a specification, an interoperability framework for this kind of day-to-day exploring, sharing and referencing of images on the web. The web is the interoperability framework already. I can take a screen grab, snip out or highlight the detail I’m interested in, and send it in an email or other more immediate ways. And I can do this with any image on the web. Here’s a window detail in another Vermeer — it’s no bother to extract and reuse it here:
But suppose I want to do this kind of thing a lot. I want to publish my snippets and their accompanying comments on web pages, treat my snippets as content, look after my snippets in a content management system, maybe capture my snippets with a browser plugin. What if my snippet is more complex than just one rectangle? Now I want to build new web pages from my accumulated digital scraps and notes. My manual cutting-up approach begins to feel primitive. I want to talk about images and parts of images on the web in a standard way, where the things I say aren’t just one-off snippets I put in an email but follow a convention that allows them to be treated as content, and shared and used in different contexts. IIIF does this for us, and via IIIF the Web Annotation Data Model, which is how IIIF assembles and integrates web content such as images of paintings. In this case, the content is Wikipedia’s hosted Vermeer image along with my comments about a region of that image.
Imagine everything on Wikipedia is available via IIIF. What does that mean? It would be nice if all images on Wikimedia Commons had IIIF Image services, including the Google Art Project content that is often 10,000 pixels on an edge in its largest version. But for what I’m doing, it’s even more important that those images are associated with IIIF Canvases, and Wikipedia’s groupings and reuse of Wikimedia content is reflected in published IIIF Manifests that contain those canvases.
Manifests and Canvases
The Manifest is the unit of distribution of IIIF — a document loaded by viewing software for rendering the content to humans, a bit like an HTML document loaded by a web browser. A manifest has a list of one or more canvases: the canvas represents a view of the object. Sometimes there is only one view of an object and therefore one canvas in the manifest. Sometimes there are hundreds of distinct views in a manifest — a canvas for each page of a book. Publishing a canvas for each view means providing a stage on which content can be placed, such as images and text. A canvas is like a PowerPoint slide — a rectangle to fill with content. By providing a canvas for the painting, Wikipedia stages its JPEG image of this painting in interoperable IIIF space — in annotation space.
Vermeer’s Young Woman with a Pearl Necklace is 45 cm wide and 53 cm tall. The canvas is a two dimensional integer coordinate system with the same proportions as the painting. In this space, Wikipedia can place the high resolution image from Wikimedia commons, but also any other content. And anyone else can target this space, or parts of this space, when making statements about the painting, or regions of it. They don’t need to worry about the actual JPEG, just this 2D stage for it.
To refer to regions of that space with accuracy, we need enough resolution in the coordinates to target content precisely. Wikipedia’s image is 4500 pixels wide and 5236 tall, so we’ll use these numbers out of convenience to define our canvas space for this painting. The image dimensions and the canvas dimensions don’t have to match — our image might be the torn half of a page and we only want to fill half the canvas with it. Later if we find the missing part and photograph it, we could place that image into the other part of the canvas. In the meantime, we could place a text note into that space.
The problem with this story is that Wikipedia doesn’t yet publish IIIF manifests, and therefore doesn’t define a canvas space (essentially a width value and a height value, along with a unique identifier) for Young Woman with a Pearl Necklace, or for any other painting. So for now, I have defined a canvas of my own to stage the content for this painting, and included it in my own published IIIF manifest. But my canvas and manifest are ephemeral. My canvas has no established authority as a definition of a content-space for Young Woman with a Pearl Necklace. Nobody else knows about it. That doesn’t stop me using it, but it would be better for Wikipedia to be the authority for the canvas, or perhaps the Gemäldegalerie in Berlin where the painting lives. As a publisher of images, they can upgrade the image resource(s) associated with that canvas if different representations become available; they define that coordinate space for the canvas for others to target with additional content. This means that anyone can say things about the image, including its publisher; when I say something about part of a canvas, you know that it’s Wikipedia’s canvas for Young Woman with a Pearl Necklace that I’m talking about.
Publishing canvases wrapped in manifests wouldn’t place too many demands on Wikipedia’s infrastructure. That’s all that’s required to stage images in annotation space. Publishing deep zoom services (the IIIF Image API) is an extra bonus — a fantastically useful one, but not essential. A manifest is usually a transformation of data that already exists, data that is already being used to deliver the web pages. It’s not too much of a leap for Wikipedia to support the IIIF Presentation API. Some management and sensible choice of width and height values would be required, especially for artworks like this where the media might get upgraded later. Once there are canvases for Wikimedia resources, and manifests to carry them over the web, then all of that Wikipedia content is available in IIIF space. If Wikipedia publish the canvas, they really only need to say one thing, to start with, to make the canvas useful: they can say that these pixels belong in thisspace. Now any software loading this canvas can see that it has Wikipedia’s JPEG of the painting associated with its entire dimensions, so that software should just show the picture.
That software might be a simple viewer, but it might be something that allows the creation of new content on a canvas — a tool that, given a canvas, can create a new fragment of content that targets it. This is the kind of tool I could use to make a statement about Wikipedia’s Young Woman with a Pearl Necklace. Instead of manually clipping part of the image and pasting it into an email, I create a new piece of data instead — an annotation. All content in IIIF is placed on a canvas — linked to the canvas — by annotation
Using a tool to create an annotation on the Vermeer painting means associating some content (in this case, my text) with a part of the canvas. The publisher has already used the exact same mechanism to associate the large image with part (or in this case, the entirety) of the canvas:
The annotation I just created could be stored somewhere, and maybe made available when other people look at the painting. But I want to think about it as a piece of content in its own right, that carries the information to reconstruct what I was looking at. In this next tool, I paste the chunk of data I just created and it reconstructs what I was looking at:
Here I’m peering under the hood to show the annotation being shared — the document on the left. This is a small data document, but it contains all the information required to reconstruct the detail on the right. Crucially the annotation is tiny. It doesn’t contain pixels. It doesn’t even refer to a particular JPEG image. It refers to a canvas (in fact, a region of a canvas) within a manifest for the painting. If Wikipedia replaced the Vermeer JPEG with an even higher resolution one, this would still work. If Wikipedia added layers of multispectral imagery to their canvas, this would still work. If Wikipedia added a deep zoom image service for the image, this would still work (in fact, it would work more efficiently).
In practice the mechanisms so visible here in this demo would fade into the background; IIIF and the annotation model become the plumbing behind the scenes, only developers need to see the details.
Annotations as content
In creating an annotation on a region of the Wikipedia canvas, I have created a new fragment of content. A new thing. We don’t usually think of an annotation as an independent piece of content in its own right, we usually think of it as a bit of data attached to something else, like marginalia. We could start thinking about these fragments as independent content, for building user interfaces. Using fragments of the IIIF space as editorial, alongside conventional web editorial.
What we just did is create a brand new fragment of content. Many existing IIIF resources already carry masses of content beyond the images — that’s the point of IIIF, to integrate the content of an object. We’re used to loading a viewer to see the object, and then moving down into the object to look at content in detail, such as the text transcription of a page, or commentary. But the viewer can take many forms.
IIIF as content carrier
In this example (seen in the screenshot) the IIIF Manifest is being processed by a client that instead of presenting a slick book-reading experience, simply renders the annotation content by dumping it to the screen. This was developed partly as an experiment and partly as a debug tool, but it clearly shows the content available on each page of the book represented by the manifest.
Text lines and figures, tables and images in the text are available as annotation content. The annotation targets are canvas regions, and we know that in this case there is a IIIF Image service available for the page image that fills the canvas, so we can make image API requests to the image service. That is, the published IIIF API for this book allows us to get right down to the fine detail of content in the book, such as this annotation:
That chunk of data, published as part of the IIIF representation of the book (alongside a similar annotation for every other line of text and illustration in the book), is just like the fragment about the egg I created earlier. It’s the same mechanism. I can use it to pull in that region of the image, and the textual content, and use it on a web page. A little code can render it like this:
So far, so good. We can see how we could start building up these units of content in completely different kinds of applications and user experiences. But this isn’t typical of how IIIF content is reused today. Given a IIIF manifest containing full textual content, as this one does, a client experience could eschew the book reading experience entirely, in favour of a text-only approach, or a hybrid approach, or an audio-only approach — a screen reader rather than a viewer. The manifest driving the experience is the same.
It’s already commonplace to take advantage of the Image API to generate HTML5 responsive images, or show a detail from a larger image, without having to go to the bother of making new derivatives in an image editor. The IIIF Image service does it for us. It’s possible to put IIIF-aware cropping tools into your CMS interfaces to select parts of images for articles. And it’s common to embed a whole object, at the manifest level, using a IIIF client like the Universal Viewer or Mirador. On catalogue pages, web editorial and blog posts, we can embed IIIF objects — an archival item, an artwork — just as we can embed a YouTube video. CMS plugins, shortcodes and other utilities make it easy to embed IIIF viewers at the object level, at the manifest level. But sometimes that is far too broad a brush.
Accessing the Intellectual Object
What about everything that lies between an image request and a Manifest? Everything between the raw pixels of one image, and the whole object? The whole object might be a 1000 page book, or an archival box, or a volume of a newspaper. What if you want to focus attention, and render as user experience, a particular subset of a larger piece of content? That content could be some parts of some images, with their textual content, alongside your own editorial content. You might not have any intention of rendering this in a page-turning, viewer kind of way: you could be generating simple templates on a web server. In this scenario, there is no particular viewer present, it’s just HTML pages. You want to make a page about a specific intellectual object, where that intellectual object doesn’t itself have a manifest because it’s one structural part of a larger manifest. Sometimes manifests describe their structure and provide detailed internal navigation. This depends on the cataloguing and digitisation process capturing that structure, which can be coarse-grained, especially at scale. Often it requires human intervention, manual effort, to describe and record the structure. A manifest might not even mention the structure you are interested in: you’re going to describe it from the outside, looking in, and create a new resource just as we created the egg annotation earlier.
In these scenarios, the IIIF manifest is the carrier, the unit of content distribution, but the intellectual objects we want to focus our attention on are below the level of the manifest and not individually described. We can look inside the manifest, and describe what we’re interested in, and then use that description to generate web pages and other user experiences.
We preserve the natural carrier, the manifest, while freely creating or reusing finer structure, whether it’s already described by the carrier or we have to make new descriptions from outside.
A bound volume might be a book of short stories. The physical book is the carrier for the stories. There is a manifest for the book; it’s the thing on the shelf, the thing you can hold in your hand. One particular short story is identified by a range (a IIIF Range object) within that book. We can use the range to present just that story. We don’t have to mint a new manifest; we just need a way of viewing the range on its own. The range describes one short story, by pointing to its parts in the carrier, the subset of canvases in the manifest that are the pages for that story. We don’t have to decompose and rebind a book just to read one story, and in the digital world, we don’t have to repackage our short story collection into a new manifest either, if we don’t want to. We just describe the parts we are interested in from outside, if they are not already described in the published manifest, and use those parts to generate user interfaces— any kind of UI, not necessarily viewer-like UI.
Consider an archival scenario. A single manifest, but it contains many discreet intellectual objects. Sometimes it makes sense for us to consider them as a whole, when we are thinking about the archival box these things live in. There’s probably a good reason they were all catalogued in the same box. Finer description was either impossible due to time constraints, or unnecessary for material that at the time of cataloguing would have been described for the benefit of in-person visitors, who can remove the material from the box, lay it out and look at it.
Sometimes we want to think about the things in the box individually, the wider context of the manifest they happen to live in is irrelevant. We’re interested in that letter, the other 100 images in the same manifest are distractions for now. If the manifest doesn’t describe that individual letter in its structure, with a IIIF Range, we can make a new one from outside, just for our current purpose, without disrupting the link between that letter and its manifest carrier.
Through a cataloguing or other grouping decisions, conceptually distinct real world objects surrender their online identity to a single object, which is then looked at through the window of a viewer:
This is absolutely fine, and unavoidable; it may not be possible to give an identity to every scrap of content. But that doesn’t stop us interacting with that content at any level of detail we like, and building user experiences from it at any level of detail we like. We can resurrect the identities of individual parts for the purposes of web presentation in a standardised way.
Looking up and looking down
In the following examples, the structures are not part of the published manifest. By creating new pieces of IIIF content that describe parts of these manifests, I give myself the ability to present arbitrary details. I can focus on whatever my context considers the intellectual object to be; I can present web content about it. These are very simple presentations, but the same techniques could be used alongside a CMS and with more sophisticated presentation tools and templates. Construction of IIIF descriptions of things within larger IIIF objects can become part of an editorial process, to direct focus.
A bug in a manuscript
There’s a bug in this manuscript! No, really. An annotation can describe it, and a web page can use the annotation to pull out the content, just like the egg example:
In this live demonstration the raw annotation can be seen, just as in the egg example. The important point here is that the fragment above is formally described, and is self-contained and external to the content it references; I don’t need to keep that image excerpt anywhere. The totality of information to manage this content is a few lines of data in a document; the mechanism is defined by a W3C standard, and other software can use this data in different ways.
A paragraph of transcribed text
Here the annotation contains a transcription of the content. The body (the text) and target (the region of the canvas) of the annotation are used to generate a standalone extract, a piece of content.
The fragments of content so far have been annotations, which can be used for any kind of content or linking purposes. We can also use IIIF ranges for structure — a set of canvases, or a part of a single canvas, or anything in between. Within a viewer, ranges are used to construct navigation, to help a user find their way around a work. In the context of a digitised newspaper a range could be used to identify the extent of an article. Here, the article starts at the bottom of one column and finishes at the top of the next; the extent of the range comprises two separate regions of the canvas. The canvas also has textual annotations (the transcription of the printed text) and annotations that identify the figures and illustration blocks in the run of text. An additional structural feature of IIIF added after research and implementation experience in digitised newspapers allows a link between a range and the subset of canvas annotations that target the same extent as the range — thereby providing the additional content of that range directly, rather than indirectly by seeing what available annotations on the canvas intersect the Range.
This allows us to pull the whole article — its text and images — out of its manifest carrier, by defining a range for just this article. In this example the parts of the canvas that carry the text have not been pulled out and displayed on the page (we’re opting for the textual data directly), but the photo has. This approach to transformation of IIIF-derived content could be used to generate versions of IIIF resources better suited to e-readers and other similar interactions, bringing the text into direct interaction while preserving original illustrations and figures. It pulls the content out in an accessible way.
The demonstration site for these examples also includes a longer article, spread across two pages.
These rudimentary examples show that content in IIIF form can be used on the web without having to cross over to that content at the manifest level, the usual entry point from a web page into IIIF space, via a viewer. The usual viewer experience of this content, at the manifest level, retains the context of the entire issue of the journal or newspaper — which is a valid experience, but not the only one. This is even more true for presentation of significant intellectual objects in archival content, where the carrier may sometimes be irrelevant to the individual letter, extract, postcard, document or photograph that is the current focus of attention.
A longer extract
In this case, an entire chapter of a book, with its illustrations.
Beyond the Viewer
All these examples hint at what could be done by pulling content out of distributed IIIF resources to build new user interfaces — beyond the viewer. That is, more direct and interwoven with other types of content on a web page, rather than confined to a particular viewer rectangle.
You can leave that rectangle in one way, by exploding the viewer across a web page, breaking its constituent UI elements out and placing them appropriately into slots in the page design. Rather than packing the UI into a single rectangular box you can distribute interacting components for navigation, thumbnails, metadata and the content viewport around the page.
You can also distribute a digital object across web pages, so that there is one page per canvas. This approach can be appropriate when each view of an object has a large amount of other content associated with it, or you want to accrue a lots of new content for each canvas, such as in a crowdsourcing project.
Both those techniques are also ways of going beyond the viewer. They are different from (but not incompatible with) the explosion of the content itself described here, where there is an absence of anything recognisable as a viewer or its familiar controls. If your focus is on one intellectual object, that object may be small enough (even if distributed over different canvases) for its content (images and textual) to be presented without any interceding client application. If you can pull out just what you need for your purpose, you have less need of controls — of UI to find your way around and understand a complex object, because you have removed the complexity by focusing on just one thing. Control at the browser level of hyperlink and back button is enough.
None of this does away with the viewer as application. The rich client viewer is still the default for a digital object of arbitrary complexity; for when you don’t know what the user’s intent is or you are not attempting to present the object in any special interpretive context. A user’s first encounter with an object may be in a viewer on a catalogue page; a later encounter with part of that object may find that part woven into a web page somewhere else.
IIIF provides windows on content via canvases. But the unit of distribution is the manifest, and for good reason: the manifest provides the context for a list of one or more canvases, and it provides an agreed entry point for a published representation of a conceptual work. We look at objects through the window the viewer provides when we load a manifest into it (or it loads it for us). But we don’t have to always stand outside the viewer and use its interaction pattern for content. We can sample and reuse the IIIF Universe for other ends, by looking into manifests using constructions like those described here and generating user interfaces from the parts we want. IIIF’s purpose is the generation of user interfaces, for humans: it doesn’t dictate how you do that, or the pieces you use.
Neither does this approach dethrone the manifest as the published carrier of content via its list of canvases. The constructions above — mostly annotations and ranges — don’t give access to the content. They refer back to a canvas, and say what manifest that canvas is to be found in. Just as in the physical world, the choice of carrier for an intellectual object is important, and not to be discarded in careless digital unbinding and removal of context. The context is still there for the taking; it’s just that the unit of distribution no longer mandates a particular interaction pattern. When classification doesn’t align with our object of interest, we change focus, and build whatever user interfaces we need.
Hyperlinking the archive
So far, we have seen how IIIF allows digitised (and digital) content to intersect the regular web in a way that hasn’t been possible before. While you could always craft bespoke web pages with extracts of digital objects in them, you couldn’t do this at scale, in a decentralised, standards-compliant way, until IIIF came along. Any tooling (for example, in a CMS) would be specific to the content (or rather, its publisher’s technical choices), and not reusable elsewhere.
We can go a step further than this.
We’re temporarily back in the viewer again — but this time, the viewer supports links in the digitised content itself. In this case, the links open other IIIF resources — either in the viewer directly, or via an interstitial “exiting the viewer” UI:
These journeys via hyperlinks within content are from one viewer-level experience to another, but this is a starting point for making the digitised content as much a part of the user journey as the web pages around the content. Annotations on the HTML content of a page can point back in to the digitised content. User journeys dip in and out of IIIF space: links from one part of a digitised object to another, links to a point in a different IIIF resource, links to web pages, links from web pages with IIIF targets.
Here, entities in the text link to web pages about that entity. And regular web content links into target regions of digitised content — which is exactly where we started with the egg example. Rather than an isolated chunk of annotation JSON data, managed annotations that target content can be hyperlinks between different forms of digital and digitised content, assisted by viewer-like technology, but not confined by it. We can loosen, a little, the navigation distinction between the digitised object and the rest of the web. A little more styling on those boxes could render them as underlined links, waiting to be clicked. These user interfaces are doing double duty as hyperlinks and more familiar annotations, tags. The boundaries can be slippery without losing the integrity of the digitised object.
A library, museum or archive site today often frames its digitised content in a viewer. Maybe that viewer lives on a catalogue page, but it might also live in articles and other web pages. Some viewers are modular, like the Universal Viewer. With a library of IIIF UI components you can construct different user experiences by using different bits of the viewer in different ways outside the confines of a particular rectangle on the page.
What we also want to do is freely reuse the IIIF model itself, to describe things in manifests without publishing a new manifest for a conventional viewer to load. And then using these extracts, these collected descriptions, to generate new user experiences. The IIIF model is fantastic for this.
All these millions of published IIIF manifests don’t just exist to be processed by viewers. We can cut up the model in different ways, we can say new things by making new IIIF resource fragments as part of a content editorial process. This might involve making new manifests that can be viewed in a manifest viewer — but it might involve arbitrary creation of annotations, ranges, or alternate sequences, and use of those new resources behind the scenes to generate new user interfaces on the server as well as the client. And extend the familiarity of hyperlinking into digitised content: the creation of linking annotations in and out of IIIF resources.
To do this, we need more content creation tools — more digital scissors, but also digital glue, paint, duct tape, cartridge paper and spray mount. We need more tools so that editorial processes have access to this content, and can reshape it. Transforming the IIIF model to simple HTML representations isn’t hard; it’s the extraction and remodelling that needs tool assistance.
There’s a Universe of IIIF content out there waiting to be reused.