8 Collaborating with git and Github

Anyone who has ever worked on any sort of collaborative writing project (code or otherwise) has likely run into frustrations of version control. For example, I might be working on some revisions to a document. I make some edits and send it to my colleague. By the time they get my version, however, it’s already behind their version in other areas (e.g, maybe they revised their description of the analysis at the same time I provided edits to the document in whole). They can’t just accept my edits without losing other changes, but they’d like to incorporate both changes. In many situations like this, version control is handled by naming conventions, e.g., "full-manuscript_v1.docx" becomes "full-manuscript_v1da.docx" which then becomes "full-manuscript_v2.docx" until we eventually end up with "full-manuscript_vFinal.docx" which inevitably ends up as "full-manuscript_vFinalv2.docx" and so on until we actually call the document complete. git is an alternative approach to managing versions of a document by tracking changes to the lines of documents. If two people are working on the same document, but editing different lines, those changes will be merged together automatically. Critically, however, nothing is lost and the entire history of the project is stored from its initial creation through the final stages. git was initially created for software developers, but we advocate for its use for a broader range of applications, including dynamic documents (i.e., anything produced through RMarkdown). From our view, the learning curve for git is deceptively steep. We often see discussions of git being simple (e.g., “learn these five commands to work with git”) with little explicit instruction devoted to the topic. Rather, it is a skill that is presumed one will “pick up” along the way. While there are exceptions to this (e.g., the wonderful Happy Git and GitHub for the useR book by Jenny Bryan), we felt it important to cover the topic explicitly.

This chapter aims to cover:

  • Creating repositories
    • GitHub first or GitHub second
  • Making commits
    • Merge conflicts
  • Working with existing repositories
  • Branching, forking and stashing
  • Pull Requests

This chapter is not intended to be all encompassing. For example, we will not cover more advanced topics like rebase-ing (which can be important if your branch is several commits behind the branch you want to merge with). Additionally, this chapter will emphasize the use of a GUI (graphical user interface) for working with git, specifically the GitKraken software, which has both free and Pro licensing. The primary drawback to the free tier is that you cannot work with private repositories. If all of your repositories are public, however, the free tier is likely sufficient.

Repo: Shorthand for repository - a collection of files that are tracked by git, along with the history of these files.

GitKraken is to git as RStudio is to R. In other words, it is not a standalone software. Rather, it is an interface for working with git, just as RStudio is an integrated development environment (IDE) for working with R. GitHub is a remote, or online, hosting service for git. As we will see later, you (and all of your collaborators) will have the entire git repository stored on your local machine. GitHub allows you to collaborate on that repo through a network. You should always think of git through both your local (on your computer) and remote (on GitHub) repositories. Many beginners conflate git with GitHub in particular, and sometimes with GitKraken as well. It is important to realize these are all separate. git is essentially the engine behind it all, while GitKraken provides us with a nicer user interface, and GitHub allows us to host our repos online. Note that there are other online git hosting services, such as https://bitbucket.org and https://about.gitlab.com. We have chosen to go with GitHub because it is the most popular among R users (at the time of this writing).

We have chosen to illustrate the topics throughout this chapter using GitKraken, rather than the command line interface, because in our prior experience teaching the content, the visual interface of GitKraken helps reinforce the concepts, particularly branching and stashing. However, it is important to note that all of the concepts discussed in this chapter can also be accomplished through direct code through a terminal window using git [command]. Additionally, some of the more advanced topics (e.g., cherry picking) cannot (to our knowledge) be completed through GitKraken, and must be completed through the command line interface.

The fundamental goals of this chapter, however, are to (a) help you understand the structure of git, which will help you be better equipped to solve unexpected problems, and (b) provide you the tools for roughly 95% of the (typical) work you will need to do with git.

The big picture

The primary reason we use git is to track changes of documents and collaborate with colleagues on documents in a way that allows all parties to contribute together. GitHub is the online host for our git repository, while GitKraken is a tool we use to make git more visual and easier to use/understand.

In what follows, it may not always be immediately clear why we’re doing each step, but our hope is that this will become more clear as you progress through the chapter.