Archive for September, 2010

They Say Yum Is Not Slow, But it Feels That Way

September 23, 2010

When the usual RPM/dpkg flame fest comes around, Debian/Ubuntu advocates will not hesitate to point out that Yum is slower than APT, and RPM is slower than dpkg. That may be true, but its usually iterated without any suggestions for a fix, or even any supporting data.

The flame fest resulted in a link to “Lies, damn lies, and benchmarks”, by James Antill. His point is that benchmarkers rarely standardize their variables, and often compare numbers that have nothing to do with one another, drawing the wrong conclusions in the process. So I guess Yum/RPM is just as fast as APT/dpkg.

Yet as I type this I’m trying to get a simple source RPM for libvirt over a mobile “broadband” connection, and I’ve got:

updates-source/primary_db 77% [============    ] 9.0 kB/s | 610 kB     00:19 ETA

Yet if I claimed that Yum was slower than APT I would be falling in to the exact trap that Antill was describing, so I won’t do that. It still sucks that it’s going to take me 10 minutes to get a simple RPM though, so what can we do? Well, first of all, by default I have 3 repos enabled, I should have used --disablerepo=* --enablerepo="updates". But even that would have taken a while.

The problem is my cache was invalidated, which, as I understand it, is not a problem when using APT, due to what the APT guys would call a design flaw in Yum. However, it is the Fedora users that have to live with this, so how do we mitigate the problem?

I have two potential solutions. One is that we could have a cron job download all your enabled repo metadata whenever you’re on a high
bandwidth link, so that if you’re on the road you can still update necessary packages but won’t need to spend forever downloading
metadata. The other possibility is that primary_db is a database of everything, and isn’t just new data (presumably for the “old” updates I already have the metadata cache). This was a design flaw in up2date, yet Yum choose to repeat it. But I have no idea how Yellowdog Updater worked and what the history is, so I’ll hold my tongue.

We could solve this in a manner similar to delta RPMs, we’ll have deltaMetadata or deltaYum. Diff primary_db every time, and have the server recompose whatever is necessary. Again I’m not positive primary_db is the entire database, so this may be a non-solution to a problem that doesn’t exist.

The APT people may be right about the design flaw however. As far as I know it is against policy to pull packages from yum, or it’s very rare in any case. If true, then the links for the old packages will be there anyway, but yum will go ahead and get the metadata on the off chance that libvirt has been updated in the interim. I haven’t checked, but I doubt in this specific case it was, and even if it was, I would have been fine with the old source RPM. Even if the package was pulled from the repo, we would get a HTTP 404 and could decide to re download metadata then and there.

Possibly there is a way to specify to use cache only with Yum, but it isn’t default and I don’t know how to do it. And let us leave the package churn discussion for another time, that’s hard to solve and has a lot of people smarter than me thinking about it.

In any case, Antill and I are saying the same thing. Don’t benchmark the app, benchmark the primitives. If you do with this with Yum and APT, you probably won’t notice a difference, or it’ll be on the order of seconds. However, in Fedora we don’t address the design issues that make Yum feel slower, and so the perception (perhaps rightly) persists. And perception isn’t benchmarked, so looking for technical solutions and benchmarks for feelings is the wrong thing to do anyway.