This blog is based upon a report written by colleagues at Mimas* presenting the results of the evaluation of our innovative Linked Data interface, Linking Lives. The evaluation consisted of a survey and a focus group, with 10 participants including PhD students and MA students studying history, politics and social sciences.
This blog post concentrates on their responses relating to Linking Lives. We also asked about their use of the Archives Hub, methods of searching and interpretation of results. You can read more about their responses to this on our Archives Hub blog.
Mock up of the Linking Lives interface: this shows the interface as it was presented to the participants in the focus group with the results for a search for ‘Beatrice Webb’.
Evaluating the response to the Linking Lives concept is essential in order to be able to answer the crucial questions around the value of a Linked Data approach for researchers. Obviously Linking Lives is just one interface, but it is based upon the principle that lies at the heart of Linked Data: bringing together diverse data sources in order to allow researchers to make new connections. We did not have a live demo to provide to the participants in the focus group, so we used a number of mock-ups to show what was intended. (The site is currently still in development).
Provenance and Quality
Participants in the focus group were clear that provenance was vital to them. They wanted to know where the data had come from. There were concerns about including data from Wikipedia:
“Wikipedia is not considered a good source, so it needs to be clear where the information is coming from.” (Survey respondent)
There was a feeling that Wikipedia is not credible as an academic source. In the Linking Lives interface we have only included the image and the place of birth and death from Wikipedia, rather than any descriptive information. It could be argued that even this is not properly verified, but on the other hand, it could equally be argued that verification through hundreds of people, effectively providing a comprehensive data checking service, could be more accurate than one cataloguer creating a description. A lone cataloguer might be more likely to make a mistake. In addition, a page about a person on Wikipedia may benefit from the expertise of the crowd whereas a cataloguer is not likely to be expert on all the archives they catalogue. This is a very fundamental and broad issue around the integrity, accuracy and trustworthiness of resources, and the Linked Data approach does require us to think more carefully about the issues here, because of the intention to bring sources together. We know that a very large number of linked data sources are using Wikipedia as a hub to link to because of its profile and the links out that it provides. The BBC are including descriptive data from Wikipedia on their pages – but maybe there is a case for using it on a more populist resource rather than a resource that is aimed at academic research? In addition, the BBC invite people to participate by updating the Wikipedia article, and this kind of participatory approach may work better in some situations than in others.
Selection of Data
Participants wanted to know about the choices underlying Linking Lives: why is the data chosen? what gets left out? It is interesting how researchers respond when you explicitly show them a resource that brings sources together and ask them to think about the pros and cons. I wonder how often they think about ‘the data that gets left out’ when using other resources where they are not explicitly thinking about this as an issue.
Maybe it could be argued that keeping sources separate has advantages in the sense that the researcher then uses one source at a time and they are more likely to know what that is, who created it and what it covers. We have found with the Archives Hub (which just searches across archives) that researchers want a clearer idea of what is covered and they don’t always understand the results they see and why they get certain results in response to their searches. I can’t help thinking that, bearing this in mind, bringing diverse sources together may make it more difficult for users to understand and interpret results.
A Biographical Interface
Overall, participants liked the concept of basing Linking Lives around people as a way of “getting a good overview”, and preferred this to an interface based around concepts:
“I’d be a little bit sceptical if you were to extend it to concepts…it might tend to homogenise and evacuate some of the complexities and subtle nuances of particular theories.” (Focus group participant)
But they remained cautious about the the principle of bringing sources together, and there was a feeling that portals like this don’t always do the job very effectively. When asked about the benefits of serendipitous searching there was a feeling that it could potentially be useful but also that it could actually distract the researcher from what is relevant.
Participants were in no doubt that the breadth and completeness of the service would be key to its value. So, for example, if Linking Lives includes a list of works by a person, can the researcher trust that the list is complete? If not then its utility is significantly diminished. Maybe there is an issue here for a Linked Data approach; if you are drawing in data from other sources, you might select those sources on the basis of quality, but you would not be responsible for what they provide. So, for Linking Lives, we would not be able to guarantee that a list of works is comprehensive, although we might choose to take the list from a trusted source such as the British Library.
The benefit of Linked Data is that you can draw in a diverse set of sources, and the aim is to provide a well-rounded view, but the more sources you pull into a single interface, the more you have to consider how to present them clearly; to show that they are distinct sources and to convey to the researcher that they are not under your control. There are certainly issues around expectations and understanding here that need further exploration.
From reading the Evaluation report and thinking a little more about the issues, I wonder whether a front-end designed to enable researchers to utilise sources brought together through a Linked Data approach should focus more on building an appropriate search mechanism. One option would be for researchers to select sources to search from a list so that they are in control of the sources they search. The nature of each source could be explained at this point. For example, a researcher might choose to bring together The Archives Hub, the British National Biography, the BBC and the British Museum (each choice of data set would affect other choices they could make based upon which sources are linked together). When they click to select each of these sources a short summary tells them what each resource offers. The researcher could then go on to search within these sources, and when the results are presented, they already have a reasonable sense of what the data represents.
Linking Lives Audience
The participants generally felt that something like Linking Lives would be more appropriate for undergraduates, or useful for teaching, but it would not enable the more sophisticated searching that PhD students might want to carry out, maybe based on a more contextual approach:
“I think at PhD level there’s a kind of artistry to how you make your way through…I’ve certainly never come across a search engine that can do the same or be as complex as your own thinning patterns.” (Focus group participant)
There was a feeling that having a group of separate archives brought together that relate to one person would be useful for teaching and helping undergraduates to understand more about how an archive works. However, expectations would need to be managed because it might encourage students to think that the archives are more readily accessible than they often are.
The power of Linked Data to connect diverse sources also seems to raise one of the main challenges. If you want to provide a user-friendly interface, and enable researcher to search across particularly diverse sources (e.g. archival data, census data, climatic data) to make unlikely connections then there is a challenge around how the data is presented so that a researcher can interpret it. Maybe this will inevitably involve the researcher in more complex searches and interpretation of results, but the reward could potentially be high. For example, a researcher might be able to discover correlations between weather patterns and social behaviour over time because the required datasets have been linked together. If Linked Data enables researchers to (reasonably) easily draw quite different sources together then it would offer something potentially very valuable.
In the end, there is still a challenge for Linked Data to make a sound business case and really showcase the end-user benefits. Certainly there is a strong case in favour of the idea of a Web of Data; a Web that is about data and not about documents; something that enables researchers to navigate across data sources rather than jumping from one silo of data to another, but maybe there is too little focus on how researchers will actually achieve this in a way that works for them – how we can present Linked Data to them in a way that really answers their research questions. One of the respondents in the survey said that it sounded confusing when the principles of Linked Data were explained and this may present its own challenge – explaining what Linked Data actually is and how it works. As Joy Palmer states in her commentary on the Evaluation Report:
“Whilst it could be said that it is not important for users to understand how data is pulled together under the hood, our research suggested that potential users, particularly advanced researchers, do indeed have an interest in how and why this information has been gathered together in a particular way. [To] what extent is it possible, or even desirable, to explain the mechanics of Linked Data, and does an understanding of how Linked Data works represent an advanced aspect of information or digital literacy?”
Maybe we’ve reached the point in the Linked Data story where we need to focus more strongly on how it will answer the requirements of researchers. Maybe we need to find better ways to explain Linked Data to them and the vision that goes along with it. Surely we need a more collaborative approach that draws in the technical people, the information professionals and the researchers.
* Evaluation Report by Lisa Charnock, Frank Manista, Janine Rigby, Joy Palmer