toofishes.net

What's wrong with SVN

Subversion was the first serious open-source (and free) version control systems to be a worthy rival to CVS. For anyone that has used CVS in the past and has moved on to better tools, you can understand where those who started the Subversion project were coming from. With CVS came no atomic commits, no easy way to rename files, and many other fun things I have since forgotten.

Fast forward 10 or so years. We now have a huge selection of version control systems, many of which have adopted the more distributed model. SVN still fills that niche (especially in the corporate world) of having a centralized repository while not being near as encumbered with restrictions as CVS. You’d think in 10 years of steady development, SVN would have done a pretty good job getting the kinks worked out. Compare this to git, which has only been around for five years. However, I’ve found SVN to have some of the worst performance ever when it comes to doing absolutely nothing, which is a fairly uncomplimentary thing to say.

I decided tonight to gather some basic numbers and performance characteristics of Subversion. As a comparison, I’ve done some similar tests with git and will show those here as well. I should note that both timing and tracing runs here were done after the respective update operation (svn update, git pull) actually did a refresh of the local copy- the timings and traces using strace you see below are of what turns out to be “no-ops”.

Subversion timing and tracing

dmcgee@galway ~/projects/arch-repos
$ time svn up
At revision 62267.
Killed by signal 15.

real    0m13.375s
user    0m1.160s
sys     0m0.600s

dmcgee@galway ~/projects/arch-repos
$ strace -c svn update
At revision 62267.
Killed by signal 15.
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.46    0.019177           2     11598           unlink
 21.88    0.006409           0     46516        52 open
  3.74    0.001095           0     46468           close
  3.43    0.001005           0     23198           getdents
  3.24    0.000948           0     69601           fcntl
  1.79    0.000523           0     46478           read
  0.32    0.000094           1       176           brk
  0.08    0.000024          12         2           wait4
  0.07    0.000021           0       152           mmap
  0.00    0.000000           0        37           write
  0.00    0.000000           0        13         9 stat
<snip>
------ ----------- ----------- --------- --------- ----------------
100.00    0.029296                244451        64 total

Git timing and tracing

dmcgee@galway ~/projects/linux-2.6 (master)
$ time git pull
Already up-to-date.

real    0m0.636s
user    0m0.100s
sys	    0m0.033s

dmcgee@galway ~/projects/linux-2.6 (master)
$ strace -cf git pull
Already up-to-date.
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.30    0.135659        2886        47        21 wait4
  0.36    0.000488          24        20           execve
  0.15    0.000208           8        26           clone
  0.13    0.000181           2       115           munmap
  0.02    0.000032           0       394         6 close
  0.02    0.000027           0       226        79 stat
  0.02    0.000026           4         7         2 connect
  0.00    0.000000           0      1216           read
  0.00    0.000000           0        21           write
  0.00    0.000000           0       438       162 open
  0.00    0.000000           0       233           fstat
  0.00    0.000000           0        55        27 lstat
<snip>
------ ----------- ----------- --------- --------- ----------------
100.00    0.136621                  4672       371 total

Looking for Answers

Let’s sum up the test as the above raw data may not mean much just yet.

VCSRepositoryFilesDirectoriesSyscallsTime
SVNArch Packages14,35112,649244,45113.375 secs
gitLinux Kernel31,5041,7944,6720.636 secs

I’ve highlighted the two figures in the above table that I find astounding. Yes, I know these two repositories aren’t identical. One has more files, the other more directories, and SVN definitely seems to struggle as you add directories as it sticks its own .svn metadirectory in each one. But that doesn’t excuse its awful performance. Why on earth is it making over 11,000 unlink calls to do absolutely nothing? Then there are the other 230,000 syscalls that I haven’t even begun to think about.

I wish I cared more about Subversion to help make it better, but I don’t use it in my personal projects anymore because git is so quick and easy. It looks like it is time to move the Arch Linux package repositories to something that sucks less.

Bonus material

I ran this test too but it didn’t really have a comparable operation in git, so it didn’t fit in above. I’ll put it here and let you draw your own conclusions.

dmcgee@galway ~/projects/arch-repos
$ strace -c svn cleanup
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 36.39    0.038920           1     46388           rmdir
 28.92    0.030931           1     46388           mkdir
 17.87    0.019107           2     11597           unlink
  8.17    0.008734           0    104492     11647 open
  3.26    0.003482           0    115970           getdents
  1.70    0.001821           0     92845           close
  1.69    0.001806           0     72307     11597 lstat
  0.95    0.001014           0     92799           fcntl
  0.91    0.000972           0     58076           read
  0.14    0.000155           0     11598           lseek

Tags

See Also