Need for Subversion

Subversion, with the command line tool svn, is a revision control system, also known as a source code management system. The subversion server maintains a repository, where files are stored in a hierarchy of folders, same as a traditional file system. Files can be checked out and modified. The changes can be committed, that is, copied back into repository, creating a new version of the file. The versioning maintains the history of each file, so that you can recover any version or revisit changes by looking at differences between versions.

There are three reasons we want to use a source code management system:

  1. Where and how you want to work on your code might not be where and how you want your code stored. Code should be stored where it will have reliable backups, where it will be accessible from a variety of locations and platforms, and where it can reside for years.
  2. Source code in a SCM system allows for collaboration. Using a SCM system, the TA can sync to your source code, downloading it, compile, and analyze the errors. The TA can put corrections into the code and comment back into the repository.
  3. Versioning. A source control system controls the source, and this is a good thing. It allows cycles of code submission. It documents your progress and submission.

Subversion basics

Subversion is organized as revisions of a repository. A repository looks like a directory in a file system. Every time a file is changed, added or deleted, or a directory added or deleted, the revision number of the repository increases by one. A revision number is therefore a complete state of the repository.

Attaching the revision number to the totality of the repository might seem odd, but the alternatives are way more complicated. Perhaps each file should have a revision number, and the state of a repository at a point in time is the collection of all files and revision numbers? Except files that were deleted. Unless they were put back. You see, it's getting complicated.

The basic work cycle of subversion is copy-modify-commit.

Copy
A copy of a subtree of the repository is made onto the local filesystem. There are two sorts of copies: checkout and update. Checkout is the initial copy, and along with the copy of files, it builds the required subversion infrastructure. Future copies use update, since update refers to the existing infrastructure to guide its operation. Modify
Changes are made in the local copies. This includes adding files or directories, deleting them, or moving them.
Merge
The modifications are to be copied back to the repository, but changes might have occurred to the repository, due to other developers working on the code in the repository. These changes are first merged into the local copy.
Commit
The changes, including the modification made by merge, are then copied into the repository.

After every commit that actually changes something, the version number is incremented by one.

Initial local copy: checkout

Each directory of the local copy of the repository have a special subdirectory named .svn. The contents in this directory guide subversion. Most source control actions require going to a directory that has a .svn directory, and issuing the simple command "svn action", where action might be update, commit, add, or delete.

To create the initial source tree, cd to your working directory and check out against the subversion repository using the full web URL of the repository. As it is set up, there is one repository for all of csc524, and its URL is:

    http://web.cs.miami.edu/svn/repos/csc524

Since this directory allows anonymous read, as does the subdirectory csc524/class, these will check-out without the need of a password. Each csc524 student has a subdirectory of csc524 which should have permissions restrictions. These will not be checked out by the anonymous read. You will need to name the subdirectory explicitly and present your username and password to check out that subdirectory. This directory will be named by your username. Supposing you are user pikachu, the process will look like:

    cd ~/mywork
    svn checkout --username pikachu http://web.cs.miami.edu/svn/repos/csc524
    cd csc524
    svn checkout --username pikachu http://web.cs.miami.edu/svn/repos/csc524/pikachu
'

Nota bene: The name checkout does not make sense to me. What checkout really does is create the proper .svn directories. It also copies files towards the local machine, but so do other subversion commands, notably svn update. Checkout does not obligate you to checkin. You can, if you wish, immediately rm -rf your checked out directory and subversion is none the wiser. You can check it out again, and be back where you started from. The commit action has checkin as another name, however this isn't a checkin in the sense that after which you are now free to walk away when before you weren't. However, checkout is crucial, and things won't work without it, because checkout creates the .svn directories on your local machine.

Adding files and directories: add, commit

Now that you have your csc524/pikachu directory, you can begin creating directories and files in that directory. Creating a file or directory is a three step process.

  1. Create the file or directory locally in your checked-out file tree.
  2. Inform subversion locally of the existence of the file or directory, using svn add.
  3. Contact the subversion server and push the new file or directory into the repository, using svn commit.

In your local copy you can create a file or directory and subversion won't care. It won't have anything to do with the contents of the repository. Once created, you have to inform subversion of the existence of the file or directory by running svn add. Although subversion now knows of your intent to add this file or directory, and it exists in the local copy, to make the change in the repository, you then must run svn commit.

There's a forth step, actually. I have found that to get everything to work sensibly, I follow svn commit with svn update.

In summary, picking up our example from above (the checkout of pikachu accomplished):

    cd pikachu
    touch helloworld.txt
    svn add helloworld.txt
    svn commit --username pikachu -m 'change for the sake of change'
    svn update --username pikachu
'

Nota bene: The commit action requires a message, a notation on the commit. You must use the -m option followed by a string notating the change. Also, I am not quite clear on the use of the --username option. I think it is unnecessary here, as the username used at checkout is recorded among the data in the directory's .svn folder, and will be used here as the default. The documentation doesn't talk about this. F.Y.I., at least the last used password is cached in the ~/.subversion directory. Again, it is unclear to me that the scheme is, and it is not explained, but I only see a single entry that would seem to apply.

Revert: how to undo all sorts of things

I have added stuff to a directory that I don't have permissions to modify. There was no problem during svn add - add is an offline action, doesn't contact the server. However, problems arise when you try to commit. Then the server is contacted and the commit fails. However, the add is still "scheduled" and you will have this commit failure as look as the add is scheduled. To de-schedule the add, run svn revert filename.

Revert also works to undo edits to a file which has been changed but not committed.

Modifying, committing and deleting:

To modify an existing files inside an existing directory, run svn commit if you can. If the revision number in the directory has changed, you will have to run a merge first. An svn update is first required to explore what merges might need be done. Update will take care of uncontroversial chages, such as files which exist in the repository which are not present in then local copy. These get copied down local. Changes which cannot handled blindly are considered conflicts which must be resolved by you. Conflicts can occur when two developers work on the file and each modifies and tries to commit their separate modifications.

Once the update is done, you are on an honor system to consider the conflicts found and resolve them. You then commit forcefully.

To delete, run svn del file-or-directory-name, svn commit and (optionally) svn update.

WebDAV and browsing of repositories

The HTTP protocol defined a few "verbs", that is, actions requested on the web server. The most used is GET, to get a page. There is also POST, to place information onto the web server and get a response. There's a few others. However the verb collection was slanted towards serving of pages rather than collaboration, or file hosting. WebDAV is a superset of HTTP that includes new verbs to support a more interactive use of HTTP. Subversion can make use of webDAV so that standard HTTP can be the transport mechanism between subversion clients and servers.

Since our installation of subversion is using webDAV, and webDAV is an extension of HTTP, you can browse the files in the repository using your web browser and the URL,

   http://web.cs.miami.edu/svn/repos/csc524. 
For subdirectories that need a username and password, the browser will invoke its simple http authorization mechanism, HTTP Basic Auth, and as subversion is using exactly this mechanism for authentication, the username and password will work to browse the subdirectories.

Security concerns

The type of authentication used by WebDAV is the simple browser authentication protocol called HTTP Basic Auth. There are a number of reasons why this is very weak authentication. The most obvious is that the protocol sends your password over the wire in the clear.

At one end of the spectrum, browsers support signed X.509 certificates, with root of trust certificates shipped in the browser. Great for amazon.com but useless for us, because we don't have signed X.509 certificates. At the other end is HTTP Basic Auth, where a box pops up and asks for username and password. For various reasons, HTTP Basic Auth security is the equivalent of a box of valuables in the hallway with a sign on it, "Please take no more than one item". It is security under the assumption that everyone is honest.

The short of it: use an isolated password for svn. Definitely not the one you use for on-line banking. And worse, you have to ask the administrator of the svn server to change your password.

Soapbox: There are other reasons besides the lack of link privacy that HTTP Basic Auth is not a trustworthy protocol. Microsoft's Sharepoint, at least as I have seen it used for UM's "UMShare", wraps HTTP Basic Auth inside a link level encryption (SSL) and seems to consider the protocol's defects addressed. I do not believe this is so, and I warn our computer science students about trusting too much UMShare. Do not use an important password on UMShare, and do not use its security to protect something you can't stand to have made public.

The reason for this distrust is that since HTTP has no native notion of a session, the credentials you give your browser to access UMShare using the HTTP Basic Auth protocol will be released to the UMShare server from then on to every page on UMShare, subject to some rules that want to error on the side of convenience, not security. (For instance, I have found that even after "Browser Reset" and "Empty Cache" Camino still remembers HTTP Basic Auth credentials.)

On a proper webapp, a session entity is created. At the initiation of the session, your credentials are presented to an authentication subsystem and exchanged for an opaque session key. This key is presented in a regulated fashion to, and only to, the webapp requiring it, during that session. For example, for many of UM's other web services, the Centralized Authentications System, a.k.a. CAS, is the subsystem which helps webapps exchange credentials for session keys. Under this scenario, you have to come to the judgement that CAS is trustworthy and that the specific webapp you have authenticated is trustworthy, to conclude that the system is trustworthy. Under those two clear and limited assumptions, you can move ahead with confidence.

Under HTTP Basic Auth, such as UMShare, your raw credentials are being sprayed across the UMShare webservice, and you need to trust every page, and every developer of every page, with your password, raw and pure and unencrypted, before you can feel confident in the service. I just don't think this is a reasonable assumption.

History of Subversion

Although source control is important, there are few open source source-control systems available. The first source control system was SCCS, written in 1972 by Marc J. Rochkind at Bell Labs. It was ported to unix and is still part of Unix today. I have never used it. I have used its successor, RCS, which was written for unix in 1982 by Walter Tichy. That was supplanted by CVS, written in 1986 by Dick Grune. CVS is used by many important open source projects such as FreeBSD. To compile FreeBSD I sync my sources to the original files on a remote CVS server, build and install.

People complain about CVS. Subversion was a replacement for CVS intended to make fewer people unhappy.