Data analysis code

PNB 3EE3; Fink

In the age of Open Science and Reproducability, it is becoming fairly standard practice that analysis code is submitted alongside papers for publication. The idea is that the papers’ claims rely on the analysis code and others should be able to reproduce the results exactly. With this goal in mind, the code you write to analyse the data from your experiment should be well-documented and reproducible.

For the sake of this course, the easiest way to accomplish the goal of creating understandable, reproducible code is to use the ‘notebook’ format, which allows you to intersperse chunks of code with chunks of text (markdown). There are many notebooks to choose from: Jupyter Notebooks, Google Collab Notebooks, R Notebooks, etc. You are free to use whichever tool is most comfortable for you.

Below are some examples of final analysis repositories from my own research (that are hopefully reproducible on your machine!):

None of these are perfect (I am still learning too!); they are here to provide some examples/inspiration.

SIDENOTE: it is a bit beyond the scope of this course, but worth pointing out that one reason code intended to be reproducible might no longer be is due to changes in, e.g., function, package, language, or operating system versions, compared to when the code was originally written and run. Tools like Docker and Code Ocean provide solutions to this very real problem! They “containerize” your code in the computational environment you developed it in (e.g., with specific operating system, python verison, package versions, etc.).

What to do for this assignment

Your analysis code should follow your pre-registered data analysis plan. Be sure to write the code for all the analyses you proposed, including any exploratory analyses (as registered or now desired). Also be sure to include visualizations. In addition to visualizing your main findings, you might also want to produce visuals to ensure your data meet certain assumptions (e.g., is variable X normally distributed?) or that your experimental conditions were balanced as intended.

You should organize your analysis notebook logically, providing headings and descriptive text, notes, and comments where necessary. Someone unfamiliar with your analysis plan should be able to look at your notebook and understand what is going on and what it means.

You will run your analysis code on data we will simulate together in class for your experiment.

Assessment

A full rubric for the poster presentation is in the course outline.