How familiar are you with this situation: You are editing a document, data file, or analysis script. You’ve made changes that you think are good, but you’re not sure. You need to check with your supervisor or collaborator first, or your just want to know that if you decide to change your mind, you can go back to the old version. So what do you do?
Some common ad hoc solutions include
These solutions feel reassuring, but they are often not as functional as we would like, and can lead to more problems further down the line. Some of these situations might be familiar to you:
Using a Version Control System, either manually or via a software tool like Git, allows us to be more systematic about these changes while keeping a clean, current copy of every file we’re working on. Some advantages of version control are
All changes are well-documented, so we know who made the change, and what change was made.
Nothing that is committed to version control is ever lost. Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results.
As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed, revert to a previous version, much like the “undo” feature in an editor.
When several people collaborate in the same project, it’s possible to accidentally overlook or overwrite someone’s changes. Version control software automatically notifies users whenever there’s a conflict between one person’s work and another’s.
Teams are not the only ones to benefit from version control: lone researchers can benefit immensely. Keeping a record of what was changed, when, and why is extremely useful for all researchers if they ever need to come back to the project later on, once their memory has faded.
Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.
Prerequisites
In this lesson we use Git from the Unix Shell. Some previous experience with the shell is expected, but isn’t mandatory.
Setup | Download files required for the lesson | |
00:00 | 1. Manual Version Control | What is version control and why should I use it? |
00:15 | 2. Automated Version Control | What is git and why should I use it? |
00:20 | 3. Setting Up Git | How do I get set up to use Git? |
00:25 | 4. Creating a Repository | Where does Git store information? |
00:35 | 5. Tracking Changes |
How do I record changes in Git?
How do I check the status of my version control repository? How do I record notes about what changes I made and why? |
00:55 | 6. Ignoring Things | How can I tell Git to ignore files I don’t want to track? |
01:00 | 7. Remotes in GitHub | How do I share my changes with others on the web? |
01:30 | 8. Exploring History |
How can I access old versions of files?
How do I review my changes? |
01:55 | 9. Collaborating | How can I use version control to collaborate with other people? |
02:20 | 10. Conflicts | What do I do when my changes conflict with someone else’s? |
02:35 | 11. Open Science | How can version control help me make my work more open? |
02:45 | 12. Licensing | What licensing information should I include with my work? |
02:50 | 13. Citation | How can I make my work easier to cite? |
02:52 | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.