Writing a thesis on open-source software, across several platforms. An experiment.
Writing a thesis is a particularly difficult job. Writing a PhD thesis in Microsoft Word, particularly, is a difficult job. Writing a thesis in Microsoft Word, across several computers, running forms of Windows, Mac OS and Linux is an excruciatingly difficult job.
For one thing, standards vary across all platforms. Linux, for example, doesn’t have a native version of Microsoft Word – ok, so it is possible to get it to run, but it’s a) so bonkily unstable and b) if you’ve made the plunge and chosen to use Linux, you’ve probably done so with a long-term view to completely freeing yourself of Microsoft software so trying to run it defeats the object of installing Linux in the first place. Different platforms also have different fonts – how many presentations have you seen where some poor sod has prepared their Powerpoint presentation using the gorgeous looking Helveticus Roman Sans Grotesque font, only to discover that the font doesn’t exist on the projector computer, and the replacement font is some awful blocky thing resembling Ceefax that’s far larger than the original, pushing all the text either off the edge of the screen or over the top of a painstakingly created graph? Thought that might ring some bells.
The other problem is that of transferring documents from computer to computer – and remembering which version is the newest. This can get incredibly confusing – for example, if you have the latest version stored on a laptop, but transfer it over to your home PC (because the screen’s bigger and the keyboard is more comfortable to use) to make changes to Section 3.4, then the next day you forget to copy the new version back to your laptop, and whilst on the train to the office you make changes to Section 5.6. Which version is then the “newest”? Of course, both are, but yet neither are, as one version has the new Section 3.4 but the old Section 5.6, and the other has it the other way around. Furthermore, if there’s no access to the internet or a network, the whole transfer process involves a Great Elaborate Faff with a USB flash drive – which (knowing my luck) I’ll either leave on the train or put it in a bag for “safekeeping” and never remember which bag it was.
There are open source programs available, designed to address both problems. For the first, there’s LaTeX (pronounced lay-tech – watch it!) . It’s a text markup language that takes a document written in plain text, with a few special instructions to say things like “this is a section” and “this text should be emphasised”, and runs it through a processor to produce an immaculate and professional looking PDF of your document. This represents another advantage over word processors like Word or OpenOffice – no farting about making sure all the fonts are the right size or that the margins are consistent or that the section numbers are in the right order. Using LaTeX frees you up from worrying about all that stuff, as it does it itself. What’s more, LaTeX documents are loved by publishers (well, the less luddite ones anyway) as they can, to a large extent, at least, take your LaTeX file, process it according to their own house style, and come up with the nicely presented pages, complete with such things as page numbers, journal titles at the tops of pages, and so on, without having to graft through it by hand.
However, LaTeX has its disadvantages – so much so, that I’ve tried it, and got very fed up with it very quickly. For a start, you have to learn its syntax. This is fairly straightforward for the basic stuff, but can get quite overwhelming. Secondly, and this really was the key point in why I gave up on it, editing pure LaTeX just looks so depressing. All the text is the same size, the same font, if you use syntax highlighting then there are spots of garish pink and blue all over the place, and looks nothing like a nicely presented article.
The solution, however, that I’ve found is to use LyX. It looks like a word processor – like a really old version of Word, but uses LaTeX as its basis. In fact, when you use it for the first time, you could be mistaken for thinking it was a crappy version of Word, with its inability to let you choose fonts, type double spaces after full stops or double carriage returns after paragraphs. It doesn’t even let you change the margins! What’s with that, eh? You soon find out why this is actually a really good thing when one of two things happen. Either you press the pdf preview button, or you have to enter (or remove) a section somewhere in the middle of your text.
Pressing the pdf preview button shows you what you were missing while typing in the crappy word processor – although your text looked sort-of-okay-ish (i.e. the section titles were in big text and the emphasised text was in italics), it still looked pretty rubbish. The pdf preview, however, is stunning. Neatly laid out paragraphs, figures and tables automatically placed in convenient gaps without having to put them in manually, nice smooth fonts, page numbers all dealt with. What’s more, if the formatting style of the pdf is not to your tastes, you simply choose a different style, re-generate the pdf and the same content is displayed equally stylishly, but formatted completely differently.
The other bain of the word processor-user’s existence is section re-numbering. If you have a new section to insert in the middle of your text, it becomes an absolute nightmare to go through the rest of the document and adjust the section numbers. In LyX, you insert a new section and all the other sections renumber themselves. Just like that. Right in front of your eyes.
In conclusion, LyX takes the pain out of dealing with the cosmetic appearance of your work, and instead lets you concentrate all your efforts on generating the content. Fantastic stuff.
The problem of getting versions all muddled up is one of version control. There a number of version control programs available for large publishing projects – indeed, I remember using one called SigmaLink at a former job. The project is split up into lots of files, which are all stored in a central repository. Editors have to check a file out of the repository to edit it. This copies the file to their own computer, and stops anyone else from checking it out. Once they’ve finished editing, they check the file back in to the repository. The service also backs up previous versions of the file so that any major cock-ups can be undone by reverting to a previous version.
Open source version control systems have existed for some time, for example Subversion (also known as SVN). However, despite the documentation for SVN saying that it can be used for just about any sort of project, it’s not been widely taken up outside the world of computer programming. It seems that once something is associated with technical wizardry and geekiness, it is merely assumed by anyone outside that field that they won’t understand it and so they have this Pavlovian response not to want to touch it. Largely, the same can be said of LaTeX, which is why LyX is such a fantastic tool, as it crosses that bridge from “L337 geeks only!!1!one!” to “Hey guys, everyone uses it, I can’t imagine what it was like when we all used Word”…
So anyway. This post marks the start of a project. I am going to attempt to write a humanities thesis using LyX, and manage it using an SVN repository.
I work mainly on three computers: my Linux computer at home (currently running Linux Mint 7), my Linux laptop (currently running Fedora 11), and my office computer (running Windows XP Professional). LyX is thankfully cross-platform, so I have already installed LyX to these three computers.
My SVN repository is going to be set up on www.yoyo.org (thanks Matt!) and while installing SVN clients to the two Linux computers should be a relative doddle, installing an SVN client on the Windows machine will prove a little trickier. This is because a) it’s Windows, and SVN is more commonly used on Linux and Unix machines (though I am told that TortoiseSVN will do the job), and b) I don’t have administrator rights to this computer (which TortoiseSVN needs for it to be installed).
The office and home machines are connected to the internet on a permanent basis, and the laptop usually is, though I also frequently work in places where no internet access is available, such as the train, or a cafĂ© where they try to charge extortionate amounts to use their not-very-secure WiFi network. I don’t like to be stuck in an office 24/7; being able to take my work outdoors and have a change of scenery is very important to me.
So that’s my project brief. The aim is to set up a system enabling a very wordy thesis to be written, across several computers, without needing to remember which copy of which file is the most up-to-date, and to document this project so that even those who think of themselves as completely computer illiterate can set it up and reap the benefits from it. It’s got to be better than dragging around indent markings on the ruler in Word and faffing about with USB sticks any day.
If you have experience in doing this, please feel free to add your advice as a comment below. But please, remember that an important part of this project is to keep it manageable for people who underrate their computing skills. For too long, open source software has been confined to fields reflecting its scientific origins, and this project aims to widen the user base.