Introduction to R

ResBaz Victoria 2024

Introducing R

R is a language and an environment for working with data.


We will primarily work with data by writing R code.


R has a large community of users and developers,
and many specialized packages.

Why write code?


If every step of your analysis is recorded in an R script:

  • You have a complete record of what you have done.
  • Early decisions easily changed, changes easily tested.
  • Today’s big project becomes tomorrow’s building block.
  • You can share your code, ensuring your results are reproducible.



R is open-source and free, so others can use your code without any barriers.

Data analysis follows a script

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Data analysis follows a script

A self-portrait by Chat GPT

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Model here is intended to cover a broad range of tasks:

  • Summarize data with counts, means, etc.
  • More generally “fit a model” to the data.
    • Traditional statistical models.
    • Machine learning models →
  • Using the model, perform statistical tests.

Data analysis follows a script

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Modelling is enabled and informed by the other steps!

  • Visualization to identify problems or make sure you are asking the right question.

  • Load and tidy and maybe transform your data to be able to plot and model it.

  • Finally, you should communicate your results.

(workshop)

Conclusion

We’ve had a taste of the workflow in R. We’ve covered loading, touched on tidying, done some visualization and a little modelling (or at least summarization).


You still need to communicate your results, with your colleagues or the wider world! Quarto can help with this.


Learning programming in R will super-charge your abilities. Writing your own functions, loops, packages, …