What Is a Reproducible Project?
Open workflows for ecology and wildlife research
Topics:
- Folder structure
- Raw vs processed data
- Scripts and automation
- Documentation
- Portable workflows
Learning Goals
By the end of this session you should be able to:
- Recognise the components of a reproducible project
- Understand why project structure matters
- Separate raw and processed data safely
- Organise scripts logically
- Use relative paths
- Create a simple reproducible workflow in RStudio
Does this look familiar?
Desktop/
analysis/
final_analysis.R
final_analysis_v2.R
final_analysis_v2_FINAL.R
final_analysis_v2_FINAL_REAL.R
figure_new_final.png
temp_data.csv
old_data.csv
Common problems
- Which script is correct?
- Which data were used?
- Which figures belong to which analysis?
- Can another person rerun this?
Reproducibility Is Not Just About Publishing Code
Reproducibility means:
Someone else can:
- Open the project
- Understand the workflow
- Run the analysis
- Recreate the outputs
Ideally with minimal confusion
This includes:
- Data organisation
- Documentation
- Scripts
- Dependencies
- File structure
Why Reproducibility Matters in Ecology
Ecology workflows are often complex
Examples:
- Camera trap datasets
- Spatial layers
- Occupancy models
- Meta-analysis workflows
- Long-term monitoring
- Multi-country collaborations
Complexity increases risk
Small workflow issues can make projects difficult to reproduce.
A typical reproducible workflow
Raw data
↓
Cleaning scripts
↓
Processed data
↓
Analysis scripts
↓
Figures and tables
↓
Reports/manuscripts
Core components of a reproducible project
A project usually contains:
- Raw data
- Processed data
- Scripts
- Outputs
- Documentation
- Metadata
- A project file
The structure itself is part of the methodology.
Example project structure
my_project/
├── data_raw/
├── data_processed/
├── scripts/
├── outputs/
├── docs/
├── README.md
└── my_project.Rproj
Benefits
- Easier navigation
- Easier collaboration (even with your future self!)
- Easier automation
- Easier handover
Raw vs Processed Data
Never manually edit raw data
Raw data
- Original source files
- Untouched
- Read-only if possible
Processed data
- Cleaned data
- Derived variables
- Analysis-ready datasets
Example ecology workflow
camera_trap_data.csv
↓
clean_camera_data.R
↓
camera_data_clean.csv
↓
occupancy_model.R
↓
occupancy_results.csv
↓
figures/
The scripts explain exactly what happened.
Scripts Are Better Than Manual Steps
Manual workflows are fragile
Problems with manual editing:
- Difficult to track changes
- Easy to introduce errors
- Impossible to reproduce precisely
- Hard to scale
Scripts are documentation
Good scripts explain:
- What was done
- Why it was done
- In what order
Documentation Matters
Good projects explain themselves
Minimum documentation:
A README.md should explain:
- What the project is
- Where the data came from
- How to run the analysis
- Folder structure
- Required software/packages
Future-you will appreciate this.
Relative Paths
Avoid absolute paths!
Bad
read.csv("C:/Users/matt/Desktop/project/data.csv")
Better
read.csv("data_raw/data.csv")
Even better
here::here("data_raw", "data.csv")
Why Relative Paths Matter
Absolute paths break collaboration
Absolute paths:
- Only work on your computer
- Break when folders move
- Cause problems across operating systems
Relative paths:
- Improve portability
- Improve collaboration
- Improve reproducibility
By biggest bugbear
Common Mistakes
- Files on Desktop
- Manual spreadsheet editing
- Missing scripts
- Missing metadata
- Unclear filenames
- Mixing raw and processed data
- No version control
- Hard-coded paths
Hands-on exercise
Goal
Build your first reproducible research project.
You will:
- Create an RStudio project
- Build a folder structure
- Add a small dataset
- Create a cleaning script
- Generate a figure
- Write a README
Exercise workflow
Create project
↓
Add folders
↓
Import raw data
↓
Create cleaning script
↓
Save processed data
↓
Create plot
↓
Document project
Step 1 — Create a project
In RStudio:
- File → New Project
- New Directory
- New Project
- Name it:
wildlife_reproducibility_project
Step 2 — Create folders
Create these folders:
/data_raw
/data_processed
/scripts
/outputs
/docs
Or create them using R:
folders <- c(
"data_raw",
"data_processed",
"scripts",
"outputs",
"docs"
)
sapply(folders, dir.create)
Step 4 — Create a cleaning script
Create:
Example:
library(readr)
library(dplyr)
wildlife <- read_csv(
here::here("data_raw", "wildlife_data.csv")
)
wildlife_clean <- wildlife %>%
filter(!is.na(count))
write_csv(
wildlife_clean,
here::here("data_processed", "wildlife_clean.csv")
)
Step 6 — Create a README
Create:
Include:
- Project title
- Short description
- Folder structure
- How to run the analysis
- Required packages
Keep it simple
What did we build?
You now have:
- A reproducible folder structure
- Raw and processed data separation
- Documented scripts
- Reproducible outputs
- Basic documentation