Any ideas on how to do a local mirror for this situation?

I'm starting a project to allow ArchLinux to be used on a Cluster environment (autoinstallation of nodes and such). I'm going to implement this where I'm working right now (~25 node cluster). Currently they're using RocksClusters.
The problem is that the connection to internet from work is generally really bad during the day. There's a HTTP proxy in the middle. The other day I tried installing archlinux using the FTP image and I took more than 5 hours just to do an upgrade + installing subversion and other packages, right after an FTP installation (which wasn't fast either).
The idea is that the frontend (the main node of the cluster) would hold a local mirror of packages so that when nodes install use that mirror (the frontend would use this also, because of the bad speed).
As I think it should be better to only update the mirror and perform an upgrade not very often (if something breaks I would leave users stranded until I fix it), I thought I should download a snapshot of extra/ and current/ only once. But the best speed I get from rsync (even at night, where an HTTP transfer from kernel.org goes at 200KB/s) is ~13KB/s this would take days (and when it's done I would have to resync because of any newer package that could have been released in the meantime).
I could download extra/ and current/ at home (I have 250KB/s downstream but I get like ~100KB/s from rsync), record several CDs (6!... ~(3GB + 700MB)/700MB) but that's not very nice. I think that maybe this would be just for the first time. Afterwards an rsync would take a lot less, but I don't know how much less.
Obiously I could speed things a little If I download the full ISO and rsync current using that as a base. But for extra/ I don't have a ISOs.
I think this is a little impractical (to download everything) as I wouldn't need whole extra/ anyways. But it's hard to know all packages needed and their dependencies to download only those.
So... I would like to know if anyone has any ideas on how to make this practical. I wouldn't wan't my whole project to crumble because of this detail.
It's annoying because using pacman at home, always works at max speed.
BTW, I've read that HOWTO that explains how to mount pacman's cache on the nodes to have a shared cache. But I'm not very sure if that's a good option. Anyway, that would imply to download everything at work, which would take years.

V01D wrote:After installation the packages that are in cache are the ones from current. All the stuff from extra/ won't be there until I install something from there.
Anyway, if I installl from a full CD I get old packages which I have to pacman -Syu after installation (that takes long time).
Oh, so that's how is it.
V01D wrote:
I think I'm going to try out this:
* rsync at home (already got current last night)
* burn a DVD
* go to work and then update the packages on DVD using rsync again (this should be fast, if I don't wait long time after recording it)
And to optimize further rsync's:
* Do a first install on all nodes an try it out for a few days (so I install all packages needed)
* Construct a list of packages used by all nodes and frontend
* Remove them from my mirror
* Do further rsync updates only updating the files I already have
This would be the manual approach of the shared cache idea I think.
Hmm... but why do you want to use rsync? You'll need to download the whole repo, which is quite large (current + extra + testing + community > 5.1GB, extra is the largest). I suggest you to download only those packages and their dependencies that you use.
I have similar situation. At work I have unlimited traffic (48kbps at day and 128kbps at night), at home - fast connection (up to 256kbps) but I pay for every megabyte (a little, but after 100-500 megabytes it becomes very noticeable). So I do
yes | pacman -Syuw
or
yes | pacman -Syw pkg1 pkg2 ... pkgN
at work (especially when packages are big), then put new downloaded files on my flash drive, then put them into /var/cache/pacman/pkg/ at home, and then I only need to do pacman -Sy before installing which takes less than a minute.
I have 1GB flashdrive so I can always keep the whole cache on it. Synchronizing work cache <-> flash drive <-> home cache is very easy.
P.S.: Recently I decided to make complete mirror of all i686 packages from archlinux.org with rsync. Not for myself but for my friends that wanted to install Linux. Anyway I don't pay for every megabyte at my work. However it took almost a week to download 5.1 GB of packages.
IMHO for most local mirror solutions using rsync is overkill. How many users are there that use more than 30% of packages from repos? So why to make full mirror with rsync when you can cache only installed packages?

Similar Messages

Maybe you are looking for