One of the aims of the Linking Lives project is to demonstrate the value of Linked Data through the creation of an end-user interface that pulls in content from the Hub Linked Data, including the external data sets we are linking to. The Linking Lives interface will be part of the Archives Hub service, that is to say, available from within the Hub website. We will present it as a beta service; something that is usable and useful, but also in a state of development. With the provision of this interface, we can start to build up an understanding of how valuable this type of name-based resource is for researchers. We will be able to monitor use as well as carrying out an evaluation to ask researchers what they think of the site. This is far preferable to positing benefits based upon potential, which is tending to happen too much with Linked Data at present.
This post is written from a non-technical perspective and covers a few of the areas that we are currently thinking about, as we start to set out our interface design.
We will be concentrating on development of the interface, rather than prioritising scale for this project: quality rather than quantity you might say, although we expect to have some thousands of records included. This is partly pragmatic, because we are still finding challenges over integrating EAD data (Archives Hub descriptions) into our Linked Data because of inconsistencies and sometimes problematic content. The problems that we face with variable data are ongoing, and maybe highlight a basic issue with Linked Data: it works best with consistent data-centric information, and not so well with archival descriptions, built up over decades, many created before there were any standards at all to adhere to. However, on the positive side, our Linked Data work has enabled us to highlight and deal with many data issues, which is beneficial in the long run for any data processing that we might do (or that others might do).
Our focus for this project is on the Linking Lives pages themselves, and what researchers can access from there, so we will not be prioritising the creation of different search options into the data: this would be a next stage, once we get a clearer idea of the use of the interface.
Archives Hub Branding and Navigation
We want Linking Lives (LL) to be recognisably part of the Hub, although it would be premature to try to fully integrate the two. As yet, we don’t know how users will respond to what we are proposing, and we need to evaluate what we are doing before taking it further into service. We are carrying out an evaluation as part of the project: we will be asking a small group of researchers questions about the current Hub interface, and following this up with some focus group work to get reactions to our new LL interface. This will help us in understanding user requirements.
Linking Lives will be an interface available within the Archives Hub site, but we propose to incorporate data other than archival descriptions within the page. This does raise questions about the clarity of what we are doing and the balance between the different data sources. If we strongly brand the page as Archives Hub, will researchers expect to access just archives, and not other information resources? Will they assume all of the sources are held by us, or that we are responsible for them? If we include the basic Hub navigation at the top of the page, will that actually confuse users, as they may click on links that take them into the main Hub search without realising that LL and the Hub are somewhat different?
We are looking at creating a sub-brand of the Hub as a possible way to identify LL as part of the Hub, but still distinct from it to some extent. This may help to distinguish between the two different applications. We will use the basic Hub logo, but modify it to signify something different. We do want to keep the links between the two, as we believe that researchers will benefit from this, and we do want to bring archives and other data sources together to provide a fuller context, and not make them too distinctly separate. The idea is to enable researchers to move seamlessly from archives described within the Hub to other resources, and take a fairly bold approach to integration, otherwise we will not get the benefits we are after. I am somewhat reminded of The National Archives’ initiative called ‘Your Archives‘, which is a Wiki for community content that it does seem to have remained rather separate from the main TNA catalogues, and maybe that has been to its detriment in terms of profile and use (I often have trouble finding links to Your Archives from within TNA’s website).
The LL interface, like the Hub itself, will not be aimed at subject specialists or expert users. It will primarily be aimed at academic researchers, but is intended to appeal to a broad audience: anyone who might be interested in undertaking research. This means that we need to avoid making assumptions about knowledge. Our ‘designated community’ may not have prior knowledge of archives and certainly won’t have knowledge of Linked Data. So they may not know how archives are organised, what an archival ‘biographical history’ is, what an archival creator is, or what ‘same as’ links are between different data sources.
Our aim, therefore, is to incorporate these things in a way that makes sense and makes the person the primary focus of the page, so that it is easy to see that a page is about George Bernard Shaw, for example, and it provides life dates, descriptive information, biographical information, an image or two, aliases for the same person, etc. It is information you might expect to find, or information that makes sense within the context of a page about a person. At the same time we are keen to ensure that we capture provenance, and so this adds another dimension. Starting to include the source of each piece of information could clutter the screen and so we will need to think about how best to incorporate it. We believe that it will be important to some users, as it could have implications for the quality and accuracy of the data. It is something we would be pleased to see others do for our data, if they were presenting it in a Web interface.
The BBC Example
Our interface will combine content from different sources. We would like to draw in content, in a similar way to the BBC (on the BBC page for Stevie Wonder you can see how the Wikipedia biography is pulled into the page). The BBC page pulls in some of the Wikipedia biog, and provides a link to to go Wikipedia and read more. This helps to make clear that the information comes from elsewhere. With MusicBrainz, another Linked Data source, the BBC provide a link to the MusicBrainz site, but also, further down their page, they state: “Links & information come from MusicBrainz. You can add or edit information about Stevie Wonder at musicbrainz.org.” The information includes personal and business relationships, such as ‘child of’ and ‘collaborated on’.
On the BBC page, the Wikipedia information is more clearly labelled as being from that source; the MusicBrainz information is also identified, but in a less obvious way. But for this, they are not only declaring where the information comes from, they also also invite people to edit the information themselves.
LL will be a useful resource in itself, but can also be a starting point, in much the same way as the BBC provides a page that gives substantial information on a musician or an animal they are interested in, but also invites people to move away from the site to other resources. This in itself is an interesting shift of focus. Long gone are the days when some sites actually disabled the ‘back’ button, and now we are moving towards an even more fluid world, if this type of approach continues to gain traction, where we are not always trying to keep people on our pages, but are actually encouraging them to move around the ‘Web of Data’.
Focus on Expectations
Looking at the BBC page on Stevie Wonder again, one thing that I notice is that it is quite busy. There is a good deal of information, with various boxes and loads of links and options for the user. There does seem to be a trend towards busier pages now, maybe an indication that people are increasingly adept at finding their way through information online, so a certain level of complexity is acceptable. Also, the page is quite long. The BBC page about mammals is similarly long and complex: introduction, links to other pages on mammals, distribution, classification, BBC news, video, information elsewhere, size ranges, the Wikipedia ‘about’ page, etc. Yet the page does not seem cluttered or difficult to navigate. This is partly because of use of plain language, as well as BBC expertise in web design. It may also be that expectations largely match reality: users may expect the BBC to provide a wealth of information, and they generally know what they will get if they go to ‘programmes’ or ‘video’ or ‘news’ pages.
Expectations do play an important part in good Web design, and maybe it is easier if you are a very well known provider, as the expectations people have are clearer? Many people come into a page through a search engine, so you cannot expect they will have used your homepage, and picked up information via this route. However they arrive at a BBC page, most people know what the BBC is. But arriving at an Archives Hub Linking Lives page, you probably have little idea of the provider in this case, and you may not be clear about what archives are in this context.
We chose to create a biographical resource partly because this would provide a focus; we can convey the fact that the page is about one person relatively easily. This makes it easier in some ways that working the Archives Hub itself, which doesn’t have that kind of focus. If we provide a page with a whole range of links to various types of biographical content, then we should be able to convey what the page is about fairly easily. It may be that good clear and simple headings and relevant content (about one subject – in this case one person) is better than providing explanations about what you are and what you are trying to provide, as people don’t tend to read help pages.
A ‘Controlled’ Experience
Our interface will use the external data sources within our data, and will be designed in order to give users a controlled experience, in the sense that we are evaluating the sources we include and presenting the interface in a very defined way. Of course, we cannot control the content of the external data; I am just talking about the way we present it.
An alternative approach would be to pull in all the data that can be found on a topic and display it. Maybe this is the ideal for Linked Data – the ability to bring in any data sources on a topic – but we are quite some way, it seems, from presenting this in a way that end users will want to use. Try a search on Hakia, a semantic search engine (not directly about consuming Linked Data, but about pulling in related information in a more semantic way). I looked for Beatrice Webb, and got a substantial amount of information from a very diverse range of sources, including news, blogs, twitter, images and video. It’s quite impressive in principle, and could be really useful for a researcher, but the net is cast very wide, so it’s not easy to process all of this varied information. Sig.ma describes itself as a semantic information mash-up. If you take a look at the page that sig.ma provides for Beatrice Webb, a substantial amount of data is pulled in, but it is not very user-friendly, not always very coherent and sometimes not relevant. Obviously it is just a demonstrator, and I would say it is for a different audience, with more expertise in Linked Data. It does show the potential for this type of approach, that draws in a really diverse range of data on on-the-fly, but it also shows how semantic searching is complex and difficult to achieve within a user-friendly interface.
The Linking Lives Unique Selling Point
Sites like Wikipedia have biographical pages, and we can never compete with them, so what can we offer that is of value? Essentially, our focus is on meeting the needs of those who want to carry out more in-depth research and who are likely to use primary sources. It may not be people who know they want to use primary sources, it may actually be a means to bring people to archives for the first time (we know that a large proportion of Archives Hub users are first time users of the Hub, and have not necessarily used archives before). We want to make primary sources the focus, but at the same time put them within the context of a whole range of information sources about a person, so that they are not held apart as somehow different and not for mainstream researchers.
It is also worth pointing out that our interface will still in some sense be a demonstrator – it will provide one option for presenting our Linked Data, but the data is there for others to create their own interfaces, and the Sparql endpoint is there for people to query the data in the ways they want to. In addition, we can re-expose the data that we present. So, there are several purposes here: benefiting end-users, evaluating a name-based approach and putting archives within a broader context, demonstrating the sort of interface that can be provided from Linked Data and possibly re-exposing the data to create more potential benefit.