Getting a scientist to accept your data

Return to Blog

A volunteer for the National Park Service’s Dragonfly Mercury Project records data (Cuyahoga Valley National Park). Photo Credit: Sarah Nelson, University of Maine

Guest post by Hannah Webber, Research and Education Projects Manager, Schoodic Institute at Acadia National Park
– – –

My daughter’s school recently had a cookie exchange holiday party. As I looked over our collection of cookies when she came home and there are several things about these cookies that I didn’t know—what’s in them, where did they come from, who made them? With some it’s obvious—the cookies with OREO stamped on them I bet came from OREO, but they are unwrapped and how did they get from a cellophane wrapper to my kitchen table—were they passed around from child to child, or did my child grab some from a communal plate (and if she did, were her hands clean)?  I am not a typically suspicious person and I consumed my fair share of these cookies (because they looked awesome). But, if I were a scientist and this were data I probably would not be able or comfortable using any of it.

That could be a problem for citizen science data—how do you get scientists to ‘eat’ your data? First, a little about scientists like me. I have worked in the field with various and sundry citizen scientists and received samples and data from citizen scientists from afar. Unlike the cookies of unknown provenance, we scientists are not worried about our waistline or cholesterol, but rather our careers, science, and professional reputation—that’s a lot to put on the line. Using data with an unknown history just isn’t possible for a scientist. So, when you are faced with 1. ‘I can’t get scientists to take my data’, or 2. ‘I am starting a project and want my data to be used in productive analysis’, or 3. ‘I want to focus on a community effort but I want my data to stand up to scientific review’ it’s going to help to develop some good data practices so your data can be used to the fullest extent possible.

Here are some things to consider:

Data provenance or life history—Scientists will want to know A LOT about where your data came from, and not just geographically speaking. Among other things this is what the scientist(s) will want to know:

What protocol did you use to collect these data —did you use a protocol that’s generally accepted by the scientific community? If not, then what parts of your protocol overlap with a scientifically accepted protocol? These may be acceptable changes if they were consistently implemented and documented—check with a scientist (see Malin Clyde’s blog about finding a scientist).


Looking for migratory seabirds as part of Schoodic Institute’s SeaWatch, Schoodic Point, Maine. Photo credit: Schoodic Institute

How did the data get to the end user? If the data started as a measurement in the field (“the field” is defined however it makes sense for your project – it could be the drain in your kitchen sink) then how did it get to the end user’s email inbox—and can you trace it back to where it started? For that it’s important to collect metadata about your data. Metadata is just information about your data—when was it collected, by whom, where did it get collected and how? You’re probably already documenting a lot of this now. But you need to be able to keep track of all of that information so that it can be shared right alongside your data. In ten years a scientist may want to use your data and will need all of the information about your data—keep it all together (for more information on metadata check out the work of theCSA’s working group on Data and Metadata)!

What equipment did you use? Most people can use a tape measure, but a Secchi disk? Or a water quality kit? This kind of information needs to go along with the data life history and metadata. Scientists will want to know that the equipment made sense for the task at hand.

How were your citizen scientists trained?
I personally learned how to use iNaturalist in the span of time it took to walk one city block (while looking at my smartphone). Now I snap pictures and share them without much thought, a lot of that background information, that metadata, is collected on and transmitted by my smartphone. But if I were asked to record data on bird behavior? Forget about it. I would need extensive training, and if a scientist were going to use data from a citizen scientist like me? That scientist would want to have some confidence that I could tell one behavior from another. Going back to the equipment for a second—knowing your citizen scientists used a tape measure is one thing, but if you wanted to share with your scientist data on crab carapaces then knowing what and where-exactly-your citizen scientists measured is another thing altogether. The scientists want to know how your folks were trained.


Earthwatch volunteers measuring crab carapaces, Acadia National Park, Maine. Photo credit: Libby Orcutt

Data verification and validation—Large on-line projects such as eBird and iNaturalist have some data verification and validation built into them.

But, I imagine yours is a small project (or it’s starting small but could grow into something massive). If the scientists know the data life history, that’s great. But if the scientists also know that the data are corroborated with other data—way better! If  the scientist got not only your spreadsheet of data but also photos of the site where the samples came from then they would know a lot more about the data. If you sent everyone out to collect invertebrates from streams and the scientists knew that everyone used the same type of collection equipment they would feel much better about the data. If you collected data in an area where there’s already corroborating data being gathered (you’re collecting dragonfly data near a meteorological station and you or your participants are also collecting meteorological data), this is good. Two pieces of data are much better than one.


The Data Lifecycle. Credit: By Mushonz (Own work) [CC BY-SA 4.0], via Wikimedia Commons)

Data management—Data management means you have and use a plan, a plan for every step of your data’s (and metadata’s) life. There are great resources available online to help with data management (see, for example, the DataONE Data Management Guide and the US Federal Toolkit’s steps for managing your data), but a good place to start is by making a super low-tech data lifecycle timeline with sticky notes. A few years ago at the Citizen Science  2015 Conference I got a great timeline from the Port Townsend


Port Townsend’s data management timeline. Click to view full size.

Marine Science Center, I have used it as a template since (and Port Townsend also has a nice, evolving Data Management Plan to use as a good template).

Wrap-up—By now you might be thinking “My gosh, this blogger, and scientists of her ilk want the world from me, I am going to go look for leftover holiday cookies.” But chances are you are already doing most of what has been outlined above, even if you’ve not been documenting it or have not brought it all into a cohesive data management structure. The pieces are probably there, they just need to be made into a whole. Reach out to a scientist, or several, and work with them to create that whole and get the most out of your citizen scientist’s hard-earned data!

(And I will accept any leftover cardamom spice cookies.)

Posted on: December 30, 2016  |  Category: Blog, CSA Working Groups, News, Professional Development