Designing an Interface: some first thoughts

One of the aims of the Linking Lives project is to demonstrate the value of Linked Data through the creation of an end-user interface that pulls in content from the Hub Linked Data, including the external data sets we are linking to. The Linking Lives interface will be part of the Archives Hub service, that is to say, available from within the Hub website. We will present it as a beta service; something that is usable and useful, but also in a state of development. With the provision of this interface, we can start to build up an understanding of how valuable this type of name-based resource is for researchers. We will be able to monitor use as well as carrying out an evaluation to ask researchers what they think of the site. This is far preferable to positing benefits based upon potential, which is tending to happen too much with Linked Data at present.

This post is written from a non-technical perspective and covers a few of the areas that we are currently thinking about, as we start to set out our interface design.

Priorities

We will be concentrating on development of the interface, rather than prioritising scale for this project: quality rather than quantity you might say, although we expect to have some thousands of records included. This is partly pragmatic, because we are still finding challenges over integrating EAD data (Archives Hub descriptions) into our Linked Data because of inconsistencies and sometimes problematic content. The problems that we face with variable data are ongoing, and maybe highlight a basic issue with Linked Data: it works best with consistent data-centric information, and not so well with archival descriptions, built up over decades, many created before there were any standards at all to adhere to. However, on the positive side, our Linked Data work has enabled us to highlight and deal with many data issues, which is beneficial in the long run for any data processing that we might do (or that others might do).

Our focus for this project is on the Linking Lives pages themselves, and what researchers can access from there, so we will not be prioritising the creation of different search options into the data: this would be a next stage, once we get a clearer idea of the use of the interface.

Archives Hub Branding and Navigation

We want Linking Lives (LL) to be recognisably part of the Hub, although it would be premature to try to fully integrate the two. As yet, we don’t know how users  will respond to what we are proposing, and we need to evaluate what we are doing before taking it further into service. We are carrying out an evaluation as part of the project: we will be asking a small group of researchers questions about the current Hub interface, and following this up with some focus group work to get reactions to our new LL interface. This will help us in understanding user requirements.

Linking Lives will be an interface available within the Archives Hub site, but we propose to incorporate data other than archival descriptions within the page. This does raise questions about the clarity of what we are doing and the balance between the different data sources. If we strongly brand the page as Archives Hub, will researchers expect to access just archives, and not other information resources? Will they assume all of the sources are held by us, or that we are responsible for them? If we include the basic Hub navigation at the top of the page, will that actually confuse users, as they may click on links that take them into the main Hub search without realising that LL and the Hub are somewhat different?

We are looking at creating a sub-brand of the Hub as a possible way to identify LL as part of the Hub, but still distinct from it to some extent. This may help to distinguish between the two different applications. We will use the basic Hub logo, but modify it to signify something different. We do want to keep the links between the two, as we believe that researchers will benefit from this, and we do want to bring archives and other data sources together to provide a fuller context, and not make them too distinctly separate. The idea is to enable researchers to move seamlessly from archives described within the Hub to other resources, and take a fairly bold approach to integration, otherwise we will not get the benefits we are after. I am somewhat reminded of The National Archives’ initiative called ‘Your Archives‘, which is a Wiki for community content that it does seem to have remained rather separate from the main TNA catalogues, and maybe that has been to its detriment in terms of profile and use (I often have trouble finding links to Your Archives from within TNA’s website).

Broad Appeal

The LL interface, like the Hub itself, will not be aimed at subject specialists or expert users. It will primarily be aimed at academic researchers, but is intended to appeal to a broad audience: anyone who might be interested in undertaking research. This means that we need to avoid making assumptions about knowledge. Our ‘designated community’ may not have prior knowledge of archives and certainly won’t have knowledge of Linked Data. So they may not know how archives are organised, what an archival ‘biographical history’ is, what an archival creator is, or what ‘same as’ links are between different data sources.

Our aim, therefore, is to incorporate these things in a way that makes sense and makes the person the primary focus of the page, so that it is easy to see that a page is about George Bernard Shaw, for example, and it provides life dates, descriptive information, biographical information, an image or two, aliases for the same person, etc. It is information you might expect to find, or information that makes sense within the context of a page about a person.  At the same time we are keen to ensure that we capture provenance, and so this adds another dimension. Starting to include the source of each piece of information could clutter the screen and so we will need to think about how best to incorporate it. We believe that it will be important to some users, as it could have implications for the quality and accuracy of the data. It is something we would be pleased to see others do for our data, if they were presenting it in a Web interface.

The BBC Example

Our interface will combine content from different sources. We would like to draw in content, in a similar way to the BBC (on the BBC page for Stevie Wonder you can see how the Wikipedia biography is pulled into the page). The BBC page pulls in some of the Wikipedia biog, and provides a link to to go Wikipedia and read more. This helps to make clear that the information comes from elsewhere. With MusicBrainz, another Linked Data source, the BBC provide a link to the MusicBrainz site, but also, further down their page, they state: “Links & information come from MusicBrainz. You can add or edit information about Stevie Wonder at musicbrainz.org.” The information includes personal and business relationships, such as ‘child of’ and ‘collaborated on’.

On the BBC page, the Wikipedia information is more clearly labelled as being from that source; the MusicBrainz information is also identified, but in a less obvious way. But for this, they are not only declaring where the information comes from, they also also invite people to edit the information themselves.

LL will be a useful resource in itself, but can also be a starting point, in much the same way as the BBC provides a page that gives substantial information on a musician or an animal they are interested in, but also invites people to move away from the site to other resources. This in itself is an interesting shift of focus. Long gone are the days when some sites actually disabled the ‘back’ button, and now we are moving towards an even more fluid world, if this type of approach continues to gain traction, where we are not always trying to keep people on our pages, but are actually encouraging them to move around the ‘Web of Data’.

Focus on Expectations

Looking at the BBC page on Stevie Wonder again, one thing that I notice is that it is quite busy. There is a good deal of information, with various boxes and loads of links and options for the user. There does seem to be a trend towards busier pages now, maybe an indication that people are increasingly adept at finding their way through information online, so a certain level of complexity is acceptable. Also, the page is quite long. The BBC page about mammals  is similarly long and complex: introduction, links to other pages on mammals, distribution, classification, BBC news, video, information elsewhere, size ranges, the Wikipedia ‘about’ page, etc. Yet the page does not seem cluttered or difficult to navigate. This is partly because of use of plain language, as well as BBC expertise in web design. It may also be that expectations largely match reality: users may expect the BBC to provide a wealth of information, and they generally know what they will get if they go to ‘programmes’ or ‘video’ or ‘news’ pages.

Expectations do play an important part in good Web design, and maybe it is easier if you are a very well known provider, as the expectations people have are clearer? Many people come into a page through a search engine, so you cannot expect they will have used your homepage, and picked up information via this route. However they arrive at a BBC page, most people know what the BBC is. But arriving at an Archives Hub Linking Lives page, you probably have little idea of the provider in this case, and you may not be clear about what archives are in this context.

We chose to create a biographical resource partly because this would provide a focus; we can convey the fact that the page is about one person relatively easily. This makes it easier in some ways that working the Archives Hub itself, which doesn’t have that kind of focus.  If we provide a page with a whole range of links to various types of biographical content, then we should be able to convey what the page is about fairly easily. It may be that good clear and simple headings and relevant content (about one subject – in this case one person) is better than providing explanations about what you are and what you are trying to provide, as people don’t tend to read help pages.

A ‘Controlled’ Experience

Our interface will use the external data sources within our data, and will be designed in order to give users a controlled experience, in the sense that we are  evaluating the sources we include and presenting the interface in a very defined way. Of course, we cannot control the content of the external data; I am just talking about the way we present it.

An alternative approach would be to pull in all the data that can be found on a topic and display it. Maybe this is the ideal for Linked Data – the ability to bring in any data sources on a topic – but we are quite some way, it seems, from presenting this in a way that end users will want to use. Try a search on Hakia, a semantic search engine (not directly about consuming Linked Data, but about pulling in related information in a more semantic way). I looked for Beatrice Webb, and got a substantial amount of information from a very diverse range of sources, including news, blogs, twitter, images and video. It’s quite impressive in principle, and could be really useful for a researcher, but the net is cast very wide, so it’s not easy to process all of this varied information. Sig.ma describes itself as a semantic information mash-up. If you take a look at the page that sig.ma provides for Beatrice Webb, a substantial amount of data is pulled in, but it is not very user-friendly, not always very coherent and sometimes not relevant. Obviously it is just a demonstrator, and I would say it is for a different audience, with more expertise in Linked Data. It does show the potential for this type of approach, that draws in a really diverse range of data on on-the-fly, but it also shows how semantic searching is complex and difficult to achieve within a user-friendly interface.

The Linking Lives Unique Selling Point

Sites like Wikipedia have biographical pages, and we can never compete with them, so what can we offer that is of value? Essentially, our focus is on meeting the needs of those who want to carry out more in-depth research and who are likely to use primary sources. It may not be people who know they want to use primary sources, it may actually be a means to bring people to archives for the first time (we know that a large proportion of Archives Hub users are first time users of the Hub, and have not necessarily used archives before). We want to make primary sources the focus, but at the same time put them within the context of a whole range of information sources about a person, so that they are not held apart as somehow different and not for mainstream researchers.

It is also worth pointing out that our interface will still in some sense be a demonstrator – it will provide one option for presenting our Linked Data, but the data is there for others to create their own interfaces, and the Sparql endpoint is there for people to query the data in the ways they want to.  In addition, we can re-expose the data that we present. So, there are several purposes here: benefiting end-users, evaluating a name-based approach and putting archives within a broader context, demonstrating the sort of interface that can be provided from Linked Data and possibly re-exposing the data to create more potential benefit.

 

 

 

 

 

 

 

 

 

 

 

 

 

Posted in archival context, branding, interface | 4 Comments

One Person in Context – Working with Biographical Histories

I have been starting to think about the user interface for Linking Lives. We will probably go for something quite simple in terms of layout, because there is quite a bit of complexity when bringing together a range of data sources.

It may be thought that integrating the external data sources is the challenge, but I think that it is probably more of a challenge to integrate several archival descriptions into one biographical record and also to convey the context of the archival descriptions clearly.

In this post, I am focusing on that often very useful field of information, the biographical history.  This is a field that is used to help place the archives in their context, by providing significant and relevant information about their creator(s). It is widely used in archives, although there are increasingly moves to exclude this information from the actual collection description and provide it separately. There are a few observations worth making about this field:

  • In general, it is considered good practice for the biographical history to be appropriate to the records being described. So, you don’t include a full life story when you are describing one letter relating largely to one event in a person’s life….
  • …but this guidance is not always adhered to, so some biogs are long and detailed for a small and discreet collection, others are very brief, even though they may relate to an archive that spans the individual’s entire life.
  • Some repositories will use largely, or entirely, the same biog for different collections about one individual, others will create very distinct biogs, and some may use biogs that have  been created by other institutions.
  • Some biogs will involve a significant amount of research, with the archivist drawing on the unique sources they are cataloguing to provide information that may then be quite unique in itself, making this field particularly useful for researchers.

I am going to use the example of Martha Beatrice Webb here, a significant figure in history, and one with plenty of archival sources that relate to her.

Photo of Martha Beatrice Webb

From the LSE collection on Flickr

On the Hub we have 14 collections where Beatrice Webb is the ‘creator’ or co-creator of the archive (for information on archival creators see a post on the Hub blog, Who is the creator?).  These collections are from three different archive repositories. Here is a selection of the biographical histories (not all yet available from our Linked Data store):

Beatrice Webb (1858-1943), nee Potter, social reformer and diarist. Married to Sidney Webb, pioneers of social science. She was involved in many spheres of political and social activity including the Labour Party, Fabianism, social observation, investigations into poverty, development of socialism, the foundation of the National Health Service and post war welfare state, the London School of Economics, and the New Statesman.
(from A summer holiday in Scotland)

Beatrice Webb (1858 – 1943). Fabian Socialist, social reformer, writer, historian, diarist. Wife, collaborator and assistant of Sidney Webb, later Lord Passfield. Together they contributed to the radical ideology first of the Liberal Party and later of the Labour Party.
(from Letters)

The role of the Reconstruction Committee involved ‘…surveying and unravelling the whole tangle of governmental activities’ introduced during World War I (1914 – 1918). It was established in early 1917 but by July 1918 had been disbanded, Webb reporting that its ‘…machinery was too rickety to survive’.
(from Webb Beatrice 1858-1943 nee Potter)

Beatrice and Sidney Webb were pioneering social economists, early members of the Fabian Society and co-founders of the London School of Economic and Political Science, and had a profound effect on English social thought and institutions. Beatrice Potter Webb was born in 1858, the eighth daughter of Richard Potter, a wealthy businessman, and Lawrencina Heyworth. Surrounded from an early age by her parents’ intellectual and worldly friends and visitors, notably the philosopher Herbert Spencer, she was largely self-educated through copious reading, and frequently a partner for her father during business trips abroad. Following a tempestuous relationship with Joseph Chamberlain, which began in 1883 and lasted several years, Beatrice took up social work in London, acting as a rent collector for the Charity Organisation Society, and becoming steadily disillusioned by the inability of charitable organisations to tackle the basic causes of poverty. During 1886, she participated in research for Charles Booth’s investigations into London labour conditions, eventually contributing to Volume I of Life and Labour of the People of London (1889). During this period she continued to write articles on social subjects, most of which were printed in The nineteenth century , and published The co-operative movement in Great Britain (1891). She met Sidney Webb in 1890 during research into economic conditions and labour unions. Sidney Webb was born in London in 1859. Educated in the local academy, he left school at sixteen to work as a clerk in a colonial brokers. By attending evening classes, he passed the civil service exams in 1881 and was appointed a clerk in the Inland Revenue. The following year, he took the Civil Service upper division examination and was appointed to the Colonial Office in 1883. He also began lecturing on political economy at the Working Men’s College. Webb was a close friend of George Bernard Shaw, who induced him to join the socialist Fabian Society in 1885, where both men became leading members: Webb was responsible for putting forward the first concise expression of Fabian convictions in Facts for Socialists (Fabian Tract 5, 1887). As a member of the Fabian executive, Webb continued to write and lecture extensively on economic and social issues, and took a leading role in Fabian policy-making…..…….[cont’d]
(from Webb, Beatrice, 1858-1943 and Webb, Sidney, 1849-1947, social reformers and historians)

If we want to create a biographical page for Beatrice Webb ideally we would have one biog that combines the best of all of the 14 available. However, apart from this being pretty much impossible, we come back to the fact that they are often appropriate to specific collection descriptions. You can see a good example of this above, where the text refers to the ‘Reconstruction Committee’, although the title does not, in fact, tell you that this is what the collection is about.  There are also clearly some issues with two of these titles, which are not really titles at all, but names of creators, but that’s another story…

For researchers, the prospect of trawling through 14 biog entries may not seem very enticing. We do have the option to use one as the default display and then provide links to the others, but then which to pick and why?

So that leaves us with listing all of the biogs along with the collection titles. Possibly a rather unwieldy answer, but on the other hand, it could be argued that this is an improvement on researchers having to click through 14 separate records. It does at least pull the biographical information together to some extent.

In terms of our data modelling, the great thing about Linked Data is that we can decide what to say about entities within the data. For Locah, we have linked bioghist to the agent – so in this case the agent is Beatrice Webb (or Beatrice and Sydney Webb) – and we have also linked it to the ‘Archival Resource’ (the collection itself). We could decide to say that a bioghist is about someone strictly in the context of one archival resource, rather than making a link directly with the agent, but this would probably complicate things too much.

The SNAC project in the US (Social Networks in Archival Context) is working on creating archival authority records, which is a little like our project to create biographical records, but they are using a distinctly archival standard, EAC-CPF, and not incorporating external data within the records (though it may be referenced on their interface). Most of the people on their prototype have only created one collection, which makes life easier, but looking at the entry for Ella Fitzgerald, there are two collections. You can see that both biogs are displayed, and the source for each is given. It is interesting to note with this display how the source is given less prominence, being given in smaller letters at the end of the text. Another example, Royal Chicano Air Force, provides two biogs, but they are both the same apart from a small addition to one, even though the collections are held in different institutions.

I should emphasise that the SNAC interface is a prototype, and I know they will be doing more work on the display, so I’m not out to be critical (I think its a great initiative). But I do wonder whether it is a good idea to display all the biog entries one after the other with not much emphasis on where they come from and hence why there are several of them, maybe with substantial repetition. If they had an entry for Beatrice Webb with our 14 collection descriptions the biog entries would create one very very long page.

I think that we may look at including all of the biog entries, clearly linking them to the collection titles, but possibly only displaying a limited number of words for each, with the option to go to the full entry. That way we can include all of them, give a sense of what each of them provides, and let the user decide where to go from there.

Another avenue we would like to explore is extracting concepts from this data, and maybe that would be a way to start to find common concepts within a number of biogs. But we’ll have to see how far we manage to get with that particular challenge.

 

 

 

Posted in archival context, biographical history, interface | Tagged , | Comments Off on One Person in Context – Working with Biographical Histories

A Little Bit About Licensing

The Linking Lives project aims to deliver:

  • An end-user interface that provides a means to integrate archival data with other information sources
  • Blog posts that share our progress and reflect on the work
  • Reusable software outputs for manipulating RDF and formatting within Web pages
  • An evaluation report
  • Documentation setting out the data sources and relationships behind the interface

You can read more about it on our ‘About Us’ page.

First things first. We currently have a Linked Data store with a small amount of Archives Hub data. We need to expand this considerably. Our aim is to provide a substantial amount of the Hub data, preferably the entire data set, as Linked Data, and then it will be part of the Linking Lives interface.

We had already consulted with Hub contributors about our Linked Data work, but in order to really expand the data set, we need to make it very clear to them what we want to do. The Archives Hub is an aggregation of data from over 200 archives across the UK, so we are in a very unique situation, and we want to work with archivists to move the community towards an open data agenda. It is vital for us to show our contributors that we are working on their behalf, and that they will be fully informed about our plans and progress.

We feel that it is important to give the data an explicit licence, preferably a completely open licence. That way we don’t put any barriers in the way of its potential reuse. I was recently at the Europeana Tech conference in Vienna, and the dominant theme of the conference was the fundamental importance of open data. One observation that struck me was the conclusion from Europeana participants that it is  better to put less data out but put it out under an open licence, than put more data out but compromise with a complex and/or restrictive licence. Some of the Archives Hub contributors have been concerned about commercial exploitation. It is worth looking at Jill Cousins’ presentation on this. She argues that even a non-commercial licence means that you are substantially restricting the potential of the data. It can’t be used on any sites or cultural blogs that demonstrate any commercial activity, it can’t be used with Wikipedia or by commercial companies that might generate income for partners.

We need to bring Hub contributors on board with this vision, and to do this we sent out an email to all contributors outlining our proposal and asking that they let us know if the do not want to participate.

In the email I did the following:

1) Set out the benefits of Linked Open Data

2) Described the Linking Lives proposal

3) Referred to the potential for us to be involved with the US-based ‘SNAC‘ project. This is not a Linked Data project, but it is creating name authority files using the archival standard of EAC-CPF, and I wanted to show that we are working on different fronts with the aim of improving access to archives.  I do think it’s worth giving this kind of context; showing that services like the Hub are working in different ways on behalf of archives to promote understanding and use of primary source material.

4) Referred to the options for licensing, referring to the possibility of an attribution licence, although ideally we would still opt for a completely open licence and strongly promote best practice around attribution (and we are looking at named graphs with this in mind, as a means to ensure that the provenance of statements can be shown).

5) Emphasised that this is about the metadata, not the content. This may sound obvious, but it is an important distinction. The metadata is there to promote the collections. There are far more complex issues around open access to some collections, where there are legal issues around IPR.

6) Referred to some useful sources to read more about open data and some initiatives, such as Europeana and Discovery, that are fully behind an open data approach.

I think the real potential of Linked Data is still difficult for people to grasp. I pointed to things like Tim Sherratt’s recent work, creating a narrative using the Web of Data, as this is a great way to demonstrate the possible uses of this structured data, and I also referred to established and respected institutions like the BBC leading the way with using different data sources and taking the risk of incorporating Wikipedia data on their site.

So far we have had two contributors asking to opt out of the Linked Data work, one very small archive and one large HE archive. We have also had some questions about what the work involves, questions that show a certain level of concern (as you would expect), albeit with an overall positive attitude towards open data. Maybe we need more explicit help with licensing archival data. There are a number of useful sources, such as the Licensing Open Data guide (PDF) available from the Discovery website, but it would be useful to have a document that specifically refers to opening up archival metadata, and maybe more information on the issues around data aggregations.

Several contributors have written to us to show their support, including the Universities of London, Dundee and Hull. We are very pleased that two of our biggest contributors, the University of Glasgow and John Rylands Library at the University of Manchester, have shown very strong support. We’re going to be adding their data to our triple store in the near future, as they have large collection descriptions with thousands of component items, so that will be a good test for our stylesheet. Institutions like this have some great archives, and detailed descriptions that lend themselves to strong narratives, linking up people, places, events, to create a whole host of different stories.

We are still working on exactly which licence to use for the Archives Hub data, but we are certain that it will be open, as this is vital to ensuring that we can truly connect data. As Edward L Ayers wrote, back in 1999: “Might history, which exists in symbiosis with large amounts of diverse evidence, be especially well-suited for the technology evolving around us?” (from History in Hypertext). I think that the answer is ‘yes’, and I think that Linked Data promises much if it really does become embedded in the Web.

 

 

Posted in licensing, open data | 1 Comment