
The Guardian published a story earlier this week about a Belfast climate scientist Prof Mike Baillie, who is disgruntled at having to make his department's decades' worth of tree ring data available to a known climate sceptic as a result of a Freedom of Information Act request. This story prompted the editor of this blog to post the above tweet. Also: "I don't see the point of curating data for the public", and "any nutter can attempt to disrupt my research".
Really? Let me turn this question round: what reason could there be for the public not to have access to publicly-funded academic research? When research is funded from the public coffers, surely it's automatically relevant to public interest?
Also, don't some of our most respected colleague science bloggers frequently campaign for increased transparency in the handling of data from, say, clinical trials? Why should that call not apply to other subjects?
Let's backtrack for a moment.
The internet has allowed communities of scientists to grow and flourish across borders imposed by geography or field of study, and to mobilise in the face of a growing irrationality in the public's perception of science. Blogs such as this one and those of its many contributors are eminent examples of this. Democratic as the internet is, unfortunately, those who feel threatened in their beliefs and values by science have gained an equal voice. Declining standards in the mainstream media have allowed an immense amount of pseudo-science to seep into our everyday media landscape. This undermines the credibility of hard-working scientists.
Last year's leaking of thousands of emails between climate scientists around the world showed just how far denialists would go to discredit the scientific evidence on climate change. The affair also highlighted the huge divide that exists between the ivory tower-dwelling scientific establishment working to understand this crucial issue, and the rest of us, who inhabit the planet and are likely to pay the price for the consequences of climate change.
A big lesson learnt from this incident was that we scientists have a battle to fight, and digging deeper into our trenches is not the way to win. While politicans and the media too have much to answer for in the current situation, this doesn't absolve us researchers from all responsibility. Only by engaging with the public can we help restore science to its rightful place in society. And being open with research results and data is an excellent place to start.
In astronomy, the advent of large publicly-funded multinational observatories like VLT, Gemini and the Hubble Space Telescope pushed data curation and access onto the agenda. Organisations like ESO and NASA have led the way in ensuring that much of publicly funded astronomical data is available to the entire community and indeed to any member of the public. The success of data archiving was beautifully demonstrated in a recent paper [pdf] by Richard White of the Space Telescope Science Institute and colleagues submitted to the ongoing Astro2010 Decadal Survey, which shows how archive-based literature from the Hubble (pictured below) and Chandra Space Telescopes now outnumbers PI-led publications.

Astronomical data are not made public from day one: a proprietary period of typically one year gives the Principal Investigator of the campaign a head start to the publication process. This period is key. Let's be honest: scientists may well say they want to learn how the Universe works, etc, but really, from day to day, they just want to win (or at least not be scooped). Wide-eyed wonder doesn't drive science, competition does. Having sole access to new data is essential to this process.
Contrary to popular belief, making datasets public needn't take away credit from the scientists who gathered the data. In astronomy, surveys and even instruments are accompanied by comprehensive descriptive papers. These then become citable by other researchers using the data for their own work.
To the outside world too astronomical data archiving has made its mark. The huge Sloan Digital Sky Survey dataset, publicly available in its entirety, made possible the Galaxy Zoo project, where citizen scientists have helped with morphological galaxy classifications - 60 million of them.
The journal article providing the original description of the SDSS is the 4th most cited paper in astronomy since 2000.
Astronomy is not a very politicised science compared with, say, medical research or climatology. But our experience has shown that being open with data is beneficial for both progress in research and public engagement with the subject. It's not about making every byte of data instantly available to everyone. It's about being clear and open in research projects about how your data will be managed and shared.
The data policy for the European Planck satellite, launched last year to study the cosmic microwave background, is unusually draconian with all data essentially in lockdown until 2012. This is unprecedented for such a mission - but as long as there is a clear policy and a timeline from the outset, there can also be trust.
I don't want to comment on this particular FOI case. I don't know a thing about dendrochronology or the individuals involved, and not a great deal about the Freedom of Information Act either. But if we scientists are going to gather on blogs and rail against pseudo-science and pharmaceutical companies for hiding or misrepresenting data and call for absolute transparency in research, we need to hold ourselves to that same high standard. Data repositories and curation require resources - but the experience in astronomy has shown that the investment is worth it, both from a purely scientific and a public relations perspective.
Yes, the FOI act is probably a clumsy and overly heavy-handed tool for obtaining access to research data, and many requests may be nothing but harassment of honest hard-working scientists. But if the "nutters" knew more about what you were doing with your data all along, they may not have wanted to disrupt your research in the first place.
Image: White et al (2009)








Ha, nice one, I can see I'm going to be blogging this tonight :P
Martin is the editor of layscience.net.
Follow Me!
RSS | Twitter
Totally agree. Where would the OpenSource movement be without the sharing of all data and as Sir Tim Hunt demonstrated perfectly last night on 'Beautiful Minds'; the more you share the more you expedite the achievement of the solution.
Sorry Martin, I've got to agree with Sarah here...at least to the basic principle at stake. I argue a lot with academics about opening up data and am trying to get parts of the university I work for (quite old, lots of pretty cotswold stone buildings) to do this following on the linked data model of http://data.gov.uk/
Why should publicly funded data be private? Ok, I know there are plenty of instances, especially pre-publication that data might be best kept under wraps. But once published, full underlying data should be made available.
I think the principles of the open knowledge definition should be applied by default to data produce by publicly-funded research grants. (I.e. in bidding for the money you'd have to explain why you weren't going to open certain data until certain dates, or at all in cases of confidentiality, etc.)
In order for our society to truly make use of the knowledge it creates, publicly funded data should be available under the following guidelines:
1) Free and open access to the material
2) Freedom to redistribute the material (with attribution, of course)
3) Freedom to reuse the material
4) No restriction of the above based on who someone is (e.g. their nationality) or their field of endeavour (e.g. commercial or non-commercial)
Most of this can be done easily by people pre-licensing the data they do make available (say with a creative commons license if appropriate).
One of the problems in the UK though is that many academics and researchers don't actually own their data. That is, the IPR resides in the institution not the employee and so they are bound by the institution's IPR policies. (My institution agrees not to assert a claim unless it is of significant financial value...ok they don't put it that way. And we have a policy of being allowed to open source things with Head of Department approval.)
One of the big and growing areas for this is in energy-use data. How much energy does the department of X or the data-centre they employ use to power their servers? How does that compare across the UK? It is only opening of data that will allow us to answer all sorts of interesting questions like these.
-James-C
I was very surprised to see Martin's comment on Twitter and am glad that he seems to be in a minority.
See also Effect Measure:
http://scienceblogs.com/effectmeasure/2010/04/making_data_available_to_o...
Martin is the editor of layscience.net.
Follow Me!
RSS | Twitter
I'm all for it!
This other post about exactly the same topic doesn't come from the astronomy perspective - also interesting: http://www.practicalethicsnews.com/practicalethics/2010/04/im-a-taxpayer...
Martin: Ah that makes a lot more sense. I agree that FOI is a clumsy tool by which to make people release data and that the data should already be released.
Martin, James C - Yes I completely agree with that. The FOI outcome of "release it all, right now" is obviously not a workable approach when it comes to science data, where in some circumstances the researchers are reasonably within their rights to withhold them (before publication etc).
But this precedent should show that there is a need for some kind of formal requirement/policy/code of conduct or such, for allowing access to publicly funded research data. Stalling on this issue will only make things worse.
Of course some people seem more concerned with getting us to be quiet than letting us share information, http://www.dailyack.com/2010/02/how-do-i-stop-scientists-talking.html