Problem: Several gigabytes unnecessary duplicates are fetched and cached Scenario: clients are both i586 and x86_64 Then urpmi-proxy fetch and store .noarch files separately for both So far it is OK Now, if it could for any arch client *look for* already downloaded .noarch files in both the i586 and x86_64 repo urpmi-proxy repo cache trees, and serve from there - then bandwidth, time, ans storage can be saved :) __Sidenotes: .noarch files for i586 and x86_64 are identical. I learned from [discuss] mailing list that the mirrors etc actually hard links the .noarch to save bandwidth and storage there, and it is recommended for a rsync based cache to do the same. Using hard links I guess is probably not a good solution for urpmi-proxy, but instead og looking botj places it could have the file/link both places; it could make a hard linked copy of any fetched .norach in the repo for the other arch. I guess it should then just create that path (folder tree for the hard linked copy) if it do not exist. It will only use very little storage for the dirs and links, and whenever a first client of that other arch use urpmi-proxy that content is there for it. __Manual workaround: For now i use a very simple script to hard link all *.noarch files ; first I update all computers of one arch, run that script, then update the other arch clients. Reproducible: Steps to Reproduce:
Related: Bug 14588 - Urpmi-proxy could clean out old file versions to save cache size
Assignee: bugsquad => alien
hmm, not easy to do something that's flexible and doesn't bite you in the ass later on... and isn't urpmi-specific... i could maybe try to replace i586 and x86_64 in full path names with each other to see if there's such a file... but, won't it just delay all downloads? perhaps it's better to have a filesystem where you can use dedup to spare your size? http transport doesn't actually have the hardlink info...
OK, have the default setting not to do this trick, and it is by default not urpmi-specific :) Missing the hardlink info, we have to go on naming, that is: whenever a *.noarch* file is requested, check if it exist under given path plus under the possible alternate location where .i586/x86_64 in the given path is substituted with x86_64/.i586 The delay to in those cases check two dirs instead of one is very marginal compared to downloading them. Never tried, but I guess deduplication use much more CPU and possible RAM resources https://btrfs.wiki.kernel.org/index.php/Deduplication
CC: (none) => writing.my.life4ever
CC: (none) => marja11
CC: writing.my.life4ever => mageiatools