What's wrong with SVN
Subversion was the first serious open-source (and free) version control systems to be a worthy rival to CVS. For anyone that has used CVS in the past and has moved on to better tools, you can understand where those who started the Subversion project were coming from. With CVS came no atomic commits, no easy way to rename files, and many other fun things I have since forgotten.
Fast forward 10 or so years. We now have a huge selection of version control systems, many of which have adopted the more distributed model. SVN still fills that niche (especially in the corporate world) of having a centralized repository while not being near as encumbered with restrictions as CVS. You’d think in 10 years of steady development, SVN would have done a pretty good job getting the kinks worked out. Compare this to git, which has only been around for five years. However, I’ve found SVN to have some of the worst performance ever when it comes to doing absolutely nothing, which is a fairly uncomplimentary thing to say.
I decided tonight to gather some basic numbers and performance characteristics of Subversion. As a comparison, I’ve done some similar tests with git and will show those here as well. I should note that both timing and tracing runs here were done after the respective update operation (svn update
, git pull
) actually did a refresh of the local copy- the timings and traces using strace
you see below are of what turns out to be “no-ops”.
Subversion timing and tracing
dmcgee@galway ~/projects/arch-repos
$ time svn up
At revision 62267.
Killed by signal 15.
real 0m13.375s
user 0m1.160s
sys 0m0.600s
dmcgee@galway ~/projects/arch-repos
$ strace -c svn update
At revision 62267.
Killed by signal 15.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
65.46 0.019177 2 11598 unlink
21.88 0.006409 0 46516 52 open
3.74 0.001095 0 46468 close
3.43 0.001005 0 23198 getdents
3.24 0.000948 0 69601 fcntl
1.79 0.000523 0 46478 read
0.32 0.000094 1 176 brk
0.08 0.000024 12 2 wait4
0.07 0.000021 0 152 mmap
0.00 0.000000 0 37 write
0.00 0.000000 0 13 9 stat
<snip>
------ ----------- ----------- --------- --------- ----------------
100.00 0.029296 244451 64 total
Git timing and tracing
dmcgee@galway ~/projects/linux-2.6 (master)
$ time git pull
Already up-to-date.
real 0m0.636s
user 0m0.100s
sys 0m0.033s
dmcgee@galway ~/projects/linux-2.6 (master)
$ strace -cf git pull
Already up-to-date.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.30 0.135659 2886 47 21 wait4
0.36 0.000488 24 20 execve
0.15 0.000208 8 26 clone
0.13 0.000181 2 115 munmap
0.02 0.000032 0 394 6 close
0.02 0.000027 0 226 79 stat
0.02 0.000026 4 7 2 connect
0.00 0.000000 0 1216 read
0.00 0.000000 0 21 write
0.00 0.000000 0 438 162 open
0.00 0.000000 0 233 fstat
0.00 0.000000 0 55 27 lstat
<snip>
------ ----------- ----------- --------- --------- ----------------
100.00 0.136621 4672 371 total
Looking for Answers
Let’s sum up the test as the above raw data may not mean much just yet.
VCS | Repository | Files | Directories | Syscalls | Time |
---|---|---|---|---|---|
SVN | Arch Packages | 14,351 | 12,649 | 244,451 | 13.375 secs |
git | Linux Kernel | 31,504 | 1,794 | 4,672 | 0.636 secs |
I’ve highlighted the two figures in the above table that I find astounding. Yes, I know these two repositories aren’t identical. One has more files, the other more directories, and SVN definitely seems to struggle as you add directories as it sticks its own .svn
metadirectory in each one. But that doesn’t excuse its awful performance. Why on earth is it making over 11,000 unlink calls to do absolutely nothing? Then there are the other 230,000 syscalls that I haven’t even begun to think about.
I wish I cared more about Subversion to help make it better, but I don’t use it in my personal projects anymore because git is so quick and easy. It looks like it is time to move the Arch Linux package repositories to something that sucks less.
Bonus material
I ran this test too but it didn’t really have a comparable operation in git, so it didn’t fit in above. I’ll put it here and let you draw your own conclusions.
dmcgee@galway ~/projects/arch-repos
$ strace -c svn cleanup
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
36.39 0.038920 1 46388 rmdir
28.92 0.030931 1 46388 mkdir
17.87 0.019107 2 11597 unlink
8.17 0.008734 0 104492 11647 open
3.26 0.003482 0 115970 getdents
1.70 0.001821 0 92845 close
1.69 0.001806 0 72307 11597 lstat
0.95 0.001014 0 92799 fcntl
0.91 0.000972 0 58076 read
0.14 0.000155 0 11598 lseek
See Also
- Git workflow with pacman - August 15, 2007
- Git smart HTTP transport on nginx - December 8, 2010
- I got caught contributing to open source - September 27, 2010
- Three-way merging for git using vim - September 14, 2010
- Git smart HTTP transport on lighttpd - March 21, 2010