Subversion and LyX for project writing
Ok, so a while ago I mentioned that I was going to try using a new way of working on my thesis using Subversion (a version control system predominantly used by computer programmers) and LyX (a document processor that uses LaTeX in an almost-WYSIWYG style to allow the author to focus on content alone rather than having to constantly worry about typesetting issues). And the result of the experiment? It’s been a resounding success. In this article I’m going to explain how to set up a working system so that you can work seamlessly across several computers, or collaborate with other authors without having to resort to fiddling around with USB disks or emailing yourself latest versions of documents. All using completely free software! How’s that?
The problem
Do you work on the same writing project using several different computers, or collaborate on articles with others via email? Do you find yourself copying latest versions onto USB flash drives, only to discover that the version you’d called “final draft draft” had since been superseded by the version called “final draft final” and now the new version you’d called “final final draft” actually still had some old work in it and now somehow you’ve got to cobble together the “final final final draft” version from the two other versions?
Another commonly used trick is to email yourself work-in-progress at the end of the day so that you can simply find it sitting in your email next time you’re at any computer and continue your work. With free webmail clients such as Gmail with massive storage capabilities, this is a very tempting thing to do, and works fairly well most of the time. However, what happens when this fails? As Ben Goldacre tweeted about a week ago, Gmail had locked him out of his account for 24 hours due to suspected fraudulent use. This is kind of understandable, as Gmail is frequently used by spammers for unscrupulous activity, and of course their monitoring systems are going to throw up a few false positives. But this doesn’t stop them from being infuriating (not to mention quite nerve-wracking) when they do happen. And besides, this still doesn’t get round the problem of whether “final draft final” or “final final draft” is the latest version…
So email (Gmail in particular) is not really a useful solution for this problem (don’t get me wrong, I still think it’s a fantastic and innovative tool for doing its job – sending and receiving emails). Google Docs is quite a nice facility, but unfortunately the editing software is still (for the moment, at least) quite basic, and pretty much requires you to be online at all times. But what if you want to work offline (on a train, say, as I do frequently)?
This is the problem that we’re trying to solve here.
Components
And the solution works by pulling together a bunch of different software packages and using them in such a way that they work seamlessly together. There’s quite a few components, mind. Here’s the list:
For your computer
- A version control system. I use Subversion. That’s not to say I think it’s the best, mind, it’s just the one that I use. Subversion is one of the de facto standards among collaborating software developers for automatically managing the ongoing development of software projects. The software keeps track of versions of files, changes made, who made those changes, whether changes made by one person conflict with someone else, and so on.
- A document processor. I use LyX, and this time, I strongly recommend it over word processors like Microsoft Word or OpenOffice.org. It’s based on LaTeX (pronounced lay-tech), which has become the de facto standard for document writing among programmers, statisticians, mathematicians, computer scientists, and, well, computer-boffins. However, whereas LaTeX requires the author effectively to learn a whole new coded language to mark-up documents so that they’re properly formatted, LyX does all that for you, and does it very well. It’s still something of a departure from Microsoft Word, but requires only a conceptual shift in your way of thinking about typing rather than the learning of a completely new language, and has a very good tutorial to smooth the transition.
- A BibTeX-based reference manager. JabRef is a particularly good one, and works on Mac, Windows and Linux. BibTeX (pronounced bib-tech) is the method of managing bibliographies developed specifically for LaTeX, and as such, is used in LyX. If you already have all your references stored in a proprietary reference manager such as EndNote (*disapproving look*) then you should be able to simply export the whole lot to BibTeX format and manage them in JabRef instead. Don’t be scared. It’s remarkably easy to use. And what’s more, if it breaks, it’s actually not that difficult to open your reference library in a text editor and sort out the problems. But I’ve never had to do that yet, touch wood.
On-line
- A Subversion repository. Project Locker is a website which offers free Subversion repositories for up to 5 users, with 500MB of storage. This sounds like loads, but remember that all versions of your files get saved, so it can get used up quite quickly. It should however be plenty for an article or book, provided it’s not completely loaded with graphics. (If you don’t know what a Subversion repository is, don’t worry, all will be explained, though you can probably guess from the name).
- An online reference manager (optional). I know I’ve waxed lyrical about the joys of Zotero and CiteULike in the past, and I still think they’re fantastic services, but in this set-up, they’re surplus to requirements, for reasons that will be explained soon.
How to set it up
The first thing to do is to install LyX and JabRef and get yourself familiar with those before you even start messing about with version control. Work through the LyX tutorial. Convert your existing reference library to BibTeX and open it in JabRef to familiarise yourself with how it works. Then see if you can get one of your own references into a LyX document. If you can do this then you’re ready to get going.
The next thing to do is to set up your Subversion repository. The repository is the centrally-maintained “master copy” of your documents, and it’s here that you’ll send your edits as you make them. Not only does the repository contain the latest version of files, it also contains all the differences between versions, so that you can revert to old ones if necessary, and compare edits made by two different authors (or the same author on two different computers) to see if they can be merged or if they conflict with each other.
If you’ve got your own server or access to a subversion-enabled server owned by a mate, then follow their instructions to get it set up. This might involve some playing around with the hosts admin tools, or entering some commands into a Linux terminal. If you don’t have access to such a server, or can’t make heads or tails of the instructions you’ve been given, then you can get yourself a free account at Project Locker. I should warn you here that Project Locker is very much marketed towards computer programmers rather than authors of literature, so its language reflects that, but don’t let that put you off! To sign up, you’ll need to enter the following information:
- Account name: this is your user name.
- Initial project name: to get started with Project Locker, you actually have to tell it the name of the first project you want to work on. Examples might be “dissertation”, “guardian_article” or “diabetes_paper”.
- Initial repository type: Choose “Subversion” rather than “Git”. Ok, I’m sure Git repositories would work just as well, but if you want to go down that route, you’re on your own.
- Email address and password: fairly self-explanatory.
Once you’ve logged into ProjectLocker, you’ll be greeted with a (rather spartan) homepage with details of your empty project, including the URL for your repository. Leave this window open while you install Subversion on your computer.
Installing Subversion (on your local computer)
The original version of Subversion was written for the Linux command-line. While this is ok for the computer scientists for whom Subversion was originally written, this is not terribly nice for ordinary collaborative writers. Fortunately, there are graphical interfaces available, and even better, these integrate nicely with the standard Windows, Mac and Linux file explorers. For Windows, there is TortoiseSVN. For Mac, there are (I’m told) SCPlugin and Svnx. For Linux, there are plenty of ongoing projects developing graphical interfaces for Subversion. RapidSVN and NautilusSVN are two that I’ve encountered.
Working with Subversion
Once you’ve got your flavour of Subversion installed, all that’s left now is to understand how Subversion works, and develop some good working habits.
How it works (in a nutshell)
As explained before, the master copy of your work is stored remotely (on Project Locker or some server). However, all your editing tools are stored locally on your computer. What you (and the other authors if you’re working collaboratively) have to do is to check-out a copy which becomes stored on your local hard-drive. This is called a working copy. The beauty of this set-up is that you can check out the whole project and not leave it locked for anyone else. Many different working copies can exist on different computers at the same time. In fact, the way I’ve used the system, the only time I check out the project is the first time I create the working copy on any computer.
After this, there are really two commands you have to remember:
- At the beginning of any session, you should update your working copy to take on board any changes you or anyone else has made elsewhere.
- At the end of any session, you should commit your working copy to submit your edits to the repository so that the master copy changes.
That (apart from merging and conflict resolution, which I won’t go into here) is pretty much it. No faffing about with emailing yourself copies or copying things to an easily losable USB stick. No more trying to remember which version is the latest because that information is all stored in the repository. What’s more, if you want to work offline, you can simply update your working copy (whilst you’re still online), take your laptop onto the train or to the middle of a field or wherever you want, write to your heart’s content, and then commit your changes when you get back to an internet connection.
How to do it with ProjectLocker
My suggested method is as follows:
- Create a folder in My Documents (or its equivalent) called “working-copy” or somesuch.
- Check-out your project into that folder. To do this using TortoiseSVN in Windows, right-click on the folder you just created and select “TortoiseSVN -> Checkout” from the menu. Enter the URL of your repository as shown on your Project Locker home page. Enter the email address you used to sign up and the password.
- You should now have a working copy of the repository in the folder you just made. An emblem on the folder icon showing a green tick indicates that that folder is now under version control. However, at the moment, the project is empty.
- We can now add files and folders to the project. However, this is not a simple case of dragging-and-dropping or making new folders, because Subversion doesn’t automatically assume that those files belong to the project. A couple of extra steps need to be taken to tell Subversion that these files and folders need to be put under version control as part of the project. Firstly, right-click on your new files and select “TortoiseSVN -> Add…” from the menu. This lists the files and folders as “to be added”. Next time you commit your edits, these will be added to the repository.
- Now commit these edits and check-out your repository on a different computer. You should, simply by performing the check-out, see the files and folders you just added, and be able to edit them on the second computer.
And that, largely, is it. TortoiseSVN (and other Subversion interfaces) include all sorts of other tools, which are predominantly intended for programmers. For the time being at least, these can be safely ignored.
Why use LyX and BibTeX with this set-up?
Finally, a word about the advantages of using LyX and BibTeX in this case, rather than using a standard word processor such as Microsoft Word or OpenOffice.
As mentioned before, LyX and BibTeX are completely text-based, and if you really wanted to, you could edit your documents or reference library by hand in a text editor. While there’s no obvious advantage to a human in doing so, the benefits of this approach are reaped by the Subversion system itself. For one thing, it is more obvious to the program that looks for changes where such changes have been made. This enables the repository just to store the parts that have been changed, rather than a completely new file each time it is updated. This saves enormously on server space (which, as mentioned before, is not infinite). Secondly, it means that merging changes from two authors has a greater chance of success, and if the system still comes up with a conflict, it is actually for the human user to look at the text files and resolve the conflict – something that would be impossible if the document were stored in a closed format.
Also, by keeping a copy of the BibTeX file in the Subversion repository, this is in fact doing exactly the same function as CiteULike or Zotero do in syncing your library to a central server, so these online services become more redundant. However, I still heartily recommend them as reference managers for a more “normal” set-up. Having said that, I hope that this guide will encourage you to try using version control for your writing project, thus making a Subversion based system the “norm” – not just for computer programmers!
I would also like to acknowledge this article by Rob Oakes which also gives a good explanation of Subversion to non-tech people and a guide to setting up a repository on your local computer, rather than on a server that can be accessed from anywhere. It also has some very helpful screenshots of TortoiseSVN in action.
I hope that this article, and Rob’s, will help you to save your time and sanity and end your days of manually naming files with version numbers, copying files to USB sticks or emailing them to yourself so that you can work on them at home, only to find out that you’ve copied the wrong version or your Gmail account’s been falsely frozen. You also get automatic back-ups of your project, both in the repository and any working copies you create. Good luck!
[...] löysin yhden asiajutun kun hellitin avainsanoista: Michael Grayer kirjoittaa Nontoxic-blogissaan versiohallinnasta ja LaTeXista, mikä on ihan mukavaa. Olinkin tulossa siihen juuri: ellen vallan väärin ymmärrä, kaikki [...]