In this post I offer some initial considerations on how ‘follow the data’ methods might be used in researching open data demand.
A number of ODDC studies, and other studies into open data, are interested in understanding demand for open data. By understanding demand, it is argued, providers of data can better prioritize data to release, and ensure data is shaped and provided in ways that help re-users overcome common barriers to working with the data. However, demand for information and data is very hard to ask about in the abstract for a number of reasons, including that:
Our information behaviors are often invisible, even to ourselves. For example, ask yourself if you could list all the information sources you used yesterday? Or how many times you sought out information and failed to find it? Or when you made do with imperfect information rather than keeping on searching till you found the information you wanted? Unless we consciously audit our information use and take notes as it is going on, these can be very difficult questions to answer.
We rarely demand information we don’t know about and we often make do with the information we can get. Over time our working practices evolve around the information we can access, so if new information becomes available it might not be obvious to us how it would be relevant, even if someone starting our work from scratch might prefer this newly available information source.
Much open data will reach end users indirectly. For example, if someone is already a user of a particular database they may be interested in improvements to that database, which might come from inclusion of open datasets in it, but they may not be interested in changing their practices to draw on the open dataset directly.
The most common approaches to assess demand for open data are survey based, either asking people to select from pre-defined lists of data the kinds they are interested in, or inviting respondents to describe their current data use, and to articulate use cases where they need further data, or where they face barriers to accessing the data they want. Whilst survey approaches can scale, they are limited by the considerations above, and in their ability to get a deep understanding of potential demand for open data. This is where some of the ‘follow the data’ interview methods that have been developed by Christine Borgman and her team at UCLA might come in.
The simple idea in the ‘follow the data’ protocol, developed to research astronomers use of secondary data from sky surveys, is that rather than asking about data use in the abstract, you can gain far deeper insights by getting an interviewee to talk about particular artifacts they have worked on which involve data. In the astronomy case this was an academic paper, but in the case of open data research it could be a report, a website, a presentation and so-on.
Applied to open data demand, a follow the data method might look something like this:
The researcher locates someone who might have demand for open data, and arranges an interview with them. In advance they ask for the interviewee to send over a report or document they have been working on that relates to the topic of data demand in focus (or they find this through looking on the organisations website, etc.). For example, if researching demand for open aid data, they might ask for any analysis the organisation has carried out of aid spending.
The researcher readers the report, and identifies data sources that it draws upon as well as thinking about other data sources it might have drawn upon.
The researcher carries out an interview, ideally face-to-face, discussing the different data sources for the research and asking about data sources the interviewee considered using but didn’t, about the strengths and limitations of the data sources they did use, about challenges in using the data, about data they would have liked to use, about tools they used to work with the data, and about help or support they accessed in using the data. The UCLA protocol for research with astronomers shows how a semi-structured interview schedule could be constructed to support a discussion.
This is a qualitative interview, ideally recorded to be transcribed and analyzed later. The researcher might also make use of other qualitative techniques, such as as asking the interviewee to show them the tools they used to work with the data on their computer, or having printed-out cards representing different source datasets, and getting the interviewee to pick from these to highlight sources they considered using and to annotate them for strengths and weaknesses.
The researcher may introduce new datasets, or discuss the opening of existing datasets, to explore how this would impact the interviewees data use. Within the context of the concrete examples of data use that have been discussed, the researcher could then mention particular datasets that are being made available, asking the interviewee to describe how they might use this data, what challenges they envisage to using it, and how valuable they think it would be. This might require the interviewer to carefully explain open data, showing how it would be different from some of the other data sources available, and being careful to be clear when a realistic, or idealistic, picture of open data is being offered.
After the interview, the researcher identifies particular ‘upstream’ providers of data, and seeks to interview them also. For example, if an interviewee got their data from a colleague who had already reformatted it, or from a website that collated and curated data, the researcher may want to go to these actors to understand their data use and demand.
In this way the method both follows an individuals use of, and demand for, data - and follows chains of data use and intermediation.
Central to this method is the discussion of data use in a concrete situation, rather than in the abstract.
Whether or not this can be easily applied is something we’re yet to explore in ODDC, and in the coming month we’ll start sketching out in more detail a draft research tool to present our own follow the data protocol. I’ll share updates through the ODDC website, and our Linked In group as we do….