Bug 20351 - Migrate from Subversion to Git for packaging sources
Summary: Migrate from Subversion to Git for packaging sources
Status: NEW
Alias: None
Product: Infrastructure
Classification: Unclassified
Component: Others (show other bugs)
Version: unspecified
Hardware: All Linux
Priority: Normal normal
Target Milestone: Mageia 7
Assignee: Augier
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-25 17:59 CET by Neal Gompa
Modified: 2020-06-23 10:45 CEST (History)
4 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Neal Gompa 2017-02-25 17:59:03 CET
Description of problem:

We're still using Subversion for maintain the sources of our packaging. We previously migrated our software sources from SVN to Git in 2013[1], while a migration of packaging to Git has been planned for a while.

Other distributions have already done this migration (c.f. Fedora[2][3], CentOS, OpenMandriva).

There are a number of problems with SVN, especially related to how we have difficulty preserving history when we synchronize packages with Fedora, and for picking apart changes locally in a Mageia package or between Fedora and Mageia counterparts of the same package.

Also, now that there are currently efforts to plan to migrate from YOURI to Koji[4] for building packages (possibly for Mageia 7 onward), we should move to Git ahead of that, so that the infrastructure changes can be synchronized.

Proposed solution:

The current proposal is to implement the modern form of Fedora's Dist-Git system[3]. Variations of the Dist-Git system are used in Fedora, CentOS, OpenMandriva, and RDO[5]. 

An example Dist-Git repository: https://src.fedoraproject.org/cgit/rpms/lugaru.git/

A Dist-Git repository is made up of two components:

* A Git repository containing the spec, patches, and a "sources" file containing tagged checksums for each binary file that's "part" of the packaging

* A binary data repository where tarballs and other binary data is stored (equivalent to our binrepo).

The example repository is a "proper" example in that it has a spec file, patches, and a "sources" file with the pointer and branches representing each target release that the package can be built against.

For our migration, we will need to recombine our various SVN branches of packages so that they fit back into a single Git repository representing a package.

That is, we will need to fit back in all the update repos as branches of a package's git repo. Take "nss" for example. We've shipped updates for it for every release of Mageia, and we continue to ship it in Cauldron. Our dist-git structure would look like this (mappings to SVN mentioned)

nss
|- cauldron <- SVN packages/cauldron/
|- mga6 <- SVN packages/updates/6/ (doesn't exist yet!)
|- mga5 <- SVN packages/updates/5/
|- mga4 <- SVN packages/updates/4/
|- mga3 <- SVN packages/updates/3/
|- mga2 <- SVN packages/updates/2/
|- mga1 <- SVN packages/updates/1/
|- misc <- SVN packages/misc/nss

The branching history should obviously be preserved, meaning that "cauldron" is the master branch. Infra branches should be suffixed with "-infra" and backport branches should be suffixed with "-backport".

The "misc" branch is special, as it's going to have a divergent base from "cauldron" because of how we structured things. So that will require some care.

So the maximal configuration that could exist at the time we start doing the conversion would be something like this:

<pkg>
|- cauldron <- SVN packages/cauldron/
|- mga6 <- SVN packages/updates/6/ (doesn't exist yet!)
|- mga6-backport <- SVN packages/backports/6/ (doesn't exist yet!)
|- mga6-infra <- SVN packages/updates/infra_6/ (doesn't exist yet!)
|- mga5 <- SVN packages/updates/5/
|- mga5-backport <- SVN packages/backports/5/
|- mga5-infra <- SVN packages/updates/infra_5/
|- mga4 <- SVN packages/updates/4/
|- mga4-backport <- SVN packages/backports/4/
|- mga4-infra <- SVN packages/updates/infra_4/
|- mga3 <- SVN packages/updates/3/
|- mga3-backport <- SVN packages/backports/3/
|- mga3-infra <- SVN packages/updates/infra_3/
|- mga2 <- SVN packages/updates/2/
|- mga2-infra <- SVN packages/updates/infra_2/
|- mga1 <- SVN packages/updates/1/
|- mga1-infra <- SVN packages/updates/infra_1/
|- misc <- SVN packages/misc/<pkg>

After the conversion of the raw SVN repository to this Git structure is complete, a Dist-Git conversion commit will need to be applied to all branches to convert the sources to match the correct Dist-Git style. This should be applied to *all* branches, including obsolete branches, as it is possible for packages to be revived.

Even obsolete packages need to be migrated to this form. We're not going to have an obsolete branch. Instead, packages that moved from cauldron to obsolete will get marked as a dead package. 

There are two potential approaches:

* Applying a commit that deletes all the files from the cauldron branch and add a "dead.package" file that contains the date and reason for marking it dead. This is what Fedora does. The main downside is that package revivals create a history break. That may not matter, though, and the obviousness can't be discounted and it's not like the history is gone. Here's an example of it: https://src.fedoraproject.org/cgit/rpms/bandwidthd.git/commit/?id=ed5f6573ba29ca197ac5a2f5251eaffb7251f15f

* Just block the package git repo from being used (read-only through gitolite, disabled for builds/rebuilds in build system, etc.). This might be cleaner for the purposes of being able to revive packages, but it makes it far less obvious when a package has been removed from the distribution.

This will also involve developing tooling for mapping users to what git expects as well as future management of packages by contributors. Per Ãyvind Karlsen (proyvind) has developed a mechanism for automated conversion with history preservation for one branch with his repsys[6], which we can use as a base to develop a more sophisticated tool for SVN->Dist-Git conversion. A mass commit for all resultant Git repositories for all branches to reformat them to Dist-Git style would be next, then packages that are "obsolete" need another commit to appropriately mark them so that they don't get built again.

For next-generation packager tooling, we have three choices:

* Extend repsys[6] to handle speaking the new buildsystem parlance
* Use pyrpkg and fork the fedpkg frontend[7] to create mgapkg frontend
* Use rdopkg[8] and create a mgapkg frontend

Once the conversion to Dist-Git is complete, we'll evaluate which of the tooling options to go with. If we simultaneously move to Koji for mga7 onwards, then the tooling change will need to be factored in as part of the tooling work.

For the Git web frontend, I'm leaning towards us using Pagure[9] due to ease of packaging/maintenance and providing the feature set we want (PRs, CI hookup capabilities, nesting for groups, etc.). GitLab CE[10] is another option, though considerably more complex to package and maintain.

[1]: https://wiki.mageia.org/en/Git_Migration
[2]: https://fedoraproject.org/wiki/Dist_Git_Proposal
[3]: https://fedoraproject.org/wiki/Dist_Git_Project
[4]: https://koji.build/
[5]: https://www.rdoproject.org/
[6]: https://github.com/DrakXtools/repsys
[7]: https://pagure.io/fedpkg
[8]: https://github.com/openstack-packages/rdopkg
[9]: https://pagure.io/pagure
[10]: https://about.gitlab.com/
Neal Gompa 2017-02-25 17:59:43 CET

Target Milestone: --- => Mageia 7

Comment 1 Augier 2017-02-25 20:29:49 CET
There's also a third step of porting Mageia's packagers tooling from SVN to git. Like `mgarepo`.
Comment 2 Neal Gompa 2017-02-26 00:05:51 CET
(In reply to Augier from comment #1)
> There's also a third step of porting Mageia's packagers tooling from SVN to
> git. Like `mgarepo`.

That's the part about "packager tooling". :)
Comment 3 Augier 2017-02-26 00:09:08 CET
> That's the part about "packager tooling". :)

Oh ! Well... Héhé... 
*Quickly disappears*
Comment 4 Augier 2017-02-26 17:34:37 CET
By looking at the previous works, in particular `svn-git-migration`[1], I found an  interresting resource explaining how to migrate an SVN repo to a git one[2]. It seems like `svn-git-migration` was used to migrate the software repos. I could totally use some of these bash scripts, but as this is not a hard to develop, and that there are more work than this repo migration[3], I prefer to rewrite them using Python.

I was also pointed a previous work named `sv2git`[4] which is based on a KDE's work but is written in C++. Considering my level in C++ and the fact that this code seems to perform the migration by hand instead of using `git svn`, it does not seem wise for me to use this previous work.

[1]: http://gitweb.mageia.org/software/infrastructure/svn-git-migration/
[2]: http://john.albin.net/git/convert-subversion-to-git
[3]: In particular, there is the work to comply to dist-git structure.
[4]: http://gitweb.mageia.org/software/infrastructure/svn2git/about/
Comment 5 Neal Gompa 2017-02-26 19:04:35 CET
Just to ensure the dist-git structure and its sources are understood, here's the template layout:

<pkg>
|- cauldron <- SVN packages/cauldron/<pkg>
|- mga6 <- SVN packages/updates/6/<pkg> (doesn't exist yet!)
|- mga6-backport <- SVN packages/backports/6/<pkg> (doesn't exist yet!)
|- mga6-infra <- SVN packages/updates/infra_6/<pkg> (doesn't exist yet!)
|- mga5 <- SVN packages/updates/5/<pkg>
|- mga5-backport <- SVN packages/backports/5/<pkg>
|- mga5-infra <- SVN packages/updates/infra_5/<pkg>
|- mga4 <- SVN packages/updates/4/<pkg>
|- mga4-backport <- SVN packages/backports/4/<pkg>
|- mga4-infra <- SVN packages/updates/infra_4/<pkg>
|- mga3 <- SVN packages/updates/3/<pkg>
|- mga3-backport <- SVN packages/backports/3/<pkg>
|- mga3-infra <- SVN packages/updates/infra_3/<pkg>
|- mga2 <- SVN packages/updates/2/<pkg>
|- mga2-infra <- SVN packages/updates/infra_2/<pkg>
|- mga1 <- SVN packages/updates/1/<pkg>
|- mga1-infra <- SVN packages/updates/infra_1/<pkg>
|- misc <- SVN packages/misc/<pkg>
Comment 6 Neal Gompa 2017-02-26 19:06:06 CET
Also of note, our dist-git system's fallback checksum will be sha1 rather than md5, so we can just rename sha1.lst to "sources" as we will have our tools Do The Right Thing(TM) here. Going forward, we may choose to move to sha512, as Fedora did.
Comment 7 Thierry Vignaud 2017-03-01 10:37:05 CET
Colin already has worked on git migration.
You should ask him what he's done.

CC: (none) => mageia, thierry.vignaud

Comment 8 Colin Guthrie 2017-03-02 10:57:33 CET
Yeah I started this a few years back.

The migration is a massive headfuck as it involved quite a lot of history. I think I didn't bother with the Mandriva history as I did with the tools merge as that would just be a total nightmare.

Sadly the packages repo is a different beast to the software repos. The tools used to convert the software repo don't really scale to the size of the packages repo (the git-svn stuff is pretty much horribly inefficient).

Fortunately, the KDE guys wrote a tool to migrate their svn to git some time ago. I forked this and made some changes to make it work for our repos:

http://gitweb.mageia.org/software/infrastructure/svn2git/

I left some notes here in my poorly written/parsed Markdown :D
http://gitweb.mageia.org/software/infrastructure/svn2git/about/

I also spoke to Fedora folks about the sha1 vs md5 vs sha512 a while back. I think we can/should switch to sha512. We're in the fortunately posistion that we've never removed anything from our binrepo.

We could quite easily do the following:

1. Take current binrepo and code in the ability to generate sha512 sums on upload and store both sha1 and sha512 (using hardlinks, but also store a from sha1 -> sha512 (easier than finding matching hardlinks in a filesystem) in some for of DB

2. As part of the conversion above (perhaps as a final filter-branch), we do a lookup of sha1->sha512 and thus "history will be rewritten" as sha512 sums instead.


That's one option. The other is just to slowly migrate after git conversion.


Either way, perhaps if further discussions are needed I can arrange to be around for interactive chats. I tend not to be on IRC much these days and my mail is often filtered so I don't look too often, but I'll try and keep this bug in mind for further chats.

FWIW, I have a (now rather outdated) faked version of our packages SVN repo to run tests on. i.e. it only contains a few packages and has some renames and branches etc. to test some of the corner cases. I can supply this if it helps Neal's testing? Running the script on the real SVN repo is not something you want to do regularly when testing - ideally we'd only do it once!
Comment 9 Colin Guthrie 2017-03-02 11:00:41 CET
Oh, for the avoidance of doubt, the "skipping revisions" feature I added was to tidy up the mistake made when all of cauldron in svn was accidentally svn rm'ed, then restored again in the next commit. That wouldn't look good in git if migrated!
Comment 10 Colin Guthrie 2017-03-02 11:44:21 CET
(In reply to Augier from comment #4)
> By looking at the previous works, in particular `svn-git-migration`[1], I
> found an  interresting resource explaining how to migrate an SVN repo to a
> git one[2]. It seems like `svn-git-migration` was used to migrate the
> software repos. I could totally use some of these bash scripts, but as this
> is not a hard to develop, and that there are more work than this repo
> migration[3], I prefer to rewrite them using Python.
> 
> I was also pointed a previous work named `sv2git`[4] which is based on a
> KDE's work but is written in C++. Considering my level in C++ and the fact
> that this code seems to perform the migration by hand instead of using `git
> svn`, it does not seem wise for me to use this previous work.
> 
> [1]: http://gitweb.mageia.org/software/infrastructure/svn-git-migration/
> [2]: http://john.albin.net/git/convert-subversion-to-git
> [3]: In particular, there is the work to comply to dist-git structure.
> [4]: http://gitweb.mageia.org/software/infrastructure/svn2git/about/

Just to reply to this specifically, I'd strongly suggest NOT using git-svn in any way for this migration. It's is horribly inefficient and would likely take months of computation time to make even a dent in our packages repo. You really do have to go lower level and parse each revision one at a time and split it into packages, rather than taking one package and looking for it's changes across all the revisions (this is the main difference between svn2git and git-svn).

Yes it's written in C++ and I'm sure you could rewrite it in python, but I suspect that's not needed, nor really worth the effort. The tool is mostly working, it just doesn't handle all corner cases well. The main issue is it doesn't handle renames very well. e.g. when a package is renamed with an svn mv. I think adding support for this would be fairly straight forward. The other issue is that it doesn't handle copying specific revisions from svn and preserving history nicely - e.g. when resurrecting a package that was obsolete.

So all that's really needed is for someone to take the code and run with it with some fairly small changes/modification. I suspect rewriting it in python would take considerably longer (and again, you cannot rely on shelling out to git-svn here - it's just not scalable - you pretty much have to take the same approach as the C++ code, but just reimplement it).

There are certain, post-conversion tasks that certainly could be automated in python scripts. e.g. the filter branch to do the final conversion to the dist-git layout on each repo & the importing of the final (static) changelog for example. And the final verification and comparison to final svn state for each package (possibly done before the filter branch so the layouts are easier to compare) could all be automated via a nice python wrapper (although as with the conversion itself, particular care will have to be taken to scalability)

FWIW, I encoded the layouts for dist-git here:
http://gitweb.mageia.org/software/infrastructure/svn2git/tree/rules/mga-pkgs.rules?h=distro/mga

This was before I learned of the term "dist-git" - I was just copying the general fedora layout :D The layout suggested in this bug report is similar but not identical. The rules could be updated and I have no strong opinion of which is best, but I would suggest we try to stick to the distsuffix we use in the rpms (e.g. mga5 and mga5.infra etc. - only difference to above is dots rather than hyphens)

FWIW I still think we want to have the commit messages as our changelogs (cf fedora which encodes them in the .spec). THis requires some changes to our srpm generate which currently uses svn log (and any legacy changelog). This will need changed to work with git log. We currently use svn revprop to "edit" incorrect svn log messages. This isn't possible with git. We can however use git notes which could provide an alternative commit message for any given commit if an error was detected. The generation of the srpm changelog will therefore be a bit more involved, but still totally possible if switched to git log+notes.


Hope all this is useful.
Comment 11 Colin Guthrie 2017-03-02 11:51:31 CET
(In reply to Neal Gompa from comment #5)
> Just to ensure the dist-git structure and its sources are understood, here's
> the template layout:
> 
> <pkg>
> |- cauldron <- SVN packages/cauldron/<pkg>
> |- mga6 <- SVN packages/updates/6/<pkg> (doesn't exist yet!)
> |- mga6-backport <- SVN packages/backports/6/<pkg> (doesn't exist yet!)
> |- mga6-infra <- SVN packages/updates/infra_6/<pkg> (doesn't exist yet!)
> |- mga5 <- SVN packages/updates/5/<pkg>
> |- mga5-backport <- SVN packages/backports/5/<pkg>
> |- mga5-infra <- SVN packages/updates/infra_5/<pkg>
> |- mga4 <- SVN packages/updates/4/<pkg>
> |- mga4-backport <- SVN packages/backports/4/<pkg>
> |- mga4-infra <- SVN packages/updates/infra_4/<pkg>
> |- mga3 <- SVN packages/updates/3/<pkg>
> |- mga3-backport <- SVN packages/backports/3/<pkg>
> |- mga3-infra <- SVN packages/updates/infra_3/<pkg>
> |- mga2 <- SVN packages/updates/2/<pkg>
> |- mga2-infra <- SVN packages/updates/infra_2/<pkg>
> |- mga1 <- SVN packages/updates/1/<pkg>
> |- mga1-infra <- SVN packages/updates/infra_1/<pkg>

All fine IMO, but I would use dots in the git branch names to match the distsuffix used in the generate RPMs (very minor nitpick)

> |- misc <- SVN packages/misc/<pkg>

I don't think we should import this "branch" at all. It's only used to store static changelogs about each package AFIAK. These should IMO be incorporated as part of each package repo's final filtering to inject just the latest version of this changelog file (as it generally doesn't change after initial SRPM import) into each and every branch we have in the git repo as a changelog.txt file). Then the SRPM generation for each can include it (combining it with git log+notes as we do currently with svn log and as mentioned above).
Comment 12 Augier 2017-03-02 18:55:11 CET
> Yes it's written in C++ and I'm sure you could rewrite it in python, but I suspect that's not needed, nor really worth the effort. The tool is mostly working, it just doesn't handle all corner cases well. The main issue is it doesn't handle renames very well. e.g. when a package is renamed with an svn mv. I think adding support for this would be fairly straight forward. The other issue is that it doesn't handle copying specific revisions from svn and preserving history nicely - e.g. when resurrecting a package that was obsolete.

You really overestimate my C++ skills. It could take ages before I understand the code and how to use it.
Yes, git-svn takes a lot of time. But we perfectly can leverage the problem by parallelising the task, which is pretty straightforward using Python 3.6.
Comment 13 Augier 2017-03-02 18:58:11 CET
Furthermore, I know very few about SVN (which is why I'm *really* motivated porting the stuff to git).
Comment 14 Neal Gompa 2017-03-02 19:02:42 CET
(In reply to Colin Guthrie from comment #11)
> (In reply to Neal Gompa from comment #5)
> > Just to ensure the dist-git structure and its sources are understood, here's
> > the template layout:
> > 
> > <pkg>
> > |- cauldron <- SVN packages/cauldron/<pkg>
> > |- mga6 <- SVN packages/updates/6/<pkg> (doesn't exist yet!)
> > |- mga6-backport <- SVN packages/backports/6/<pkg> (doesn't exist yet!)
> > |- mga6-infra <- SVN packages/updates/infra_6/<pkg> (doesn't exist yet!)
> > |- mga5 <- SVN packages/updates/5/<pkg>
> > |- mga5-backport <- SVN packages/backports/5/<pkg>
> > |- mga5-infra <- SVN packages/updates/infra_5/<pkg>
> > |- mga4 <- SVN packages/updates/4/<pkg>
> > |- mga4-backport <- SVN packages/backports/4/<pkg>
> > |- mga4-infra <- SVN packages/updates/infra_4/<pkg>
> > |- mga3 <- SVN packages/updates/3/<pkg>
> > |- mga3-backport <- SVN packages/backports/3/<pkg>
> > |- mga3-infra <- SVN packages/updates/infra_3/<pkg>
> > |- mga2 <- SVN packages/updates/2/<pkg>
> > |- mga2-infra <- SVN packages/updates/infra_2/<pkg>
> > |- mga1 <- SVN packages/updates/1/<pkg>
> > |- mga1-infra <- SVN packages/updates/infra_1/<pkg>
> 
> All fine IMO, but I would use dots in the git branch names to match the
> distsuffix used in the generate RPMs (very minor nitpick)
> 

I don't think backports and infra have distsuffixes that differ, so I don't think it matters from that point of view. If they *do have different disttags*, then I agree, and I'd go for that.

And as for core/tainted/nonfree, it doesn't really exist as a separate entity from the VCS point of view anyway. We'll probably wire up some fancy magic for double-submit for YOURI/Koji to handle this when certain conditionals exist in the spec file.

> > |- misc <- SVN packages/misc/<pkg>
> 
> I don't think we should import this "branch" at all. It's only used to store
> static changelogs about each package AFIAK. These should IMO be incorporated
> as part of each package repo's final filtering to inject just the latest
> version of this changelog file (as it generally doesn't change after initial
> SRPM import) into each and every branch we have in the git repo as a
> changelog.txt file). Then the SRPM generation for each can include it
> (combining it with git log+notes as we do currently with svn log and as
> mentioned above).

That is certainly another approach. The main thing I want to avoid is unnecessary duplication of content, especially for something that's (generally) frozen on import. Though, it occurs to me that with how git checkouts work, it might be tricky to simultaneously check out the content of both for the required re-merge of changelogs at spec file rebuild time.

I'd also just call it <pkg>.rpmchangelog or something like that, just to keep it unique and obvious.
Comment 15 Thierry Vignaud 2017-03-02 20:41:34 CET
(In reply to Augier from comment #12)
> You really overestimate my C++ skills. It could take ages before I
> understand the code and how to use it.
> Yes, git-svn takes a lot of time. But we perfectly can leverage the problem
> by parallelising the task, which is pretty straightforward using Python 3.6.

That won't help as much as you think:
1) the SVN server is a slowness contention point
2) svn2git already run fastimport processes concurrently

Lot of people who do know svn & git spend time on speeding up the conversion.
I don't remember the details nor can I find a link but I do remember that svn2git did make a _HUGE_ difference back in the days when KDE did the big migration to KDE

So if you "know very few about SVN", I doubt you'll be able to beat them...
Please don't reinvent the wheel.
There's a working tool, let's just use it.
Comment 16 Augier 2017-03-02 20:45:49 CET
> So if you "know very few about SVN", I doubt you'll be able to beat them...
> Please don't reinvent the wheel.
> There's a working tool, let's just use it.

As far as I understand, the tool needs polishing. That's what bothering me a bit.
Comment 17 Augier 2017-03-04 15:47:16 CET
@Thierry Vignaud: do you know where the rules language of the svn2git is documented?
Comment 18 Olav Vitters 2017-03-06 09:52:14 CET
(In reply to Neal Gompa from comment #6)
> Also of note, our dist-git system's fallback checksum will be sha1 rather
> than md5, so we can just rename sha1.lst to "sources" as we will have our
> tools Do The Right Thing(TM) here. Going forward, we may choose to move to
> sha512, as Fedora did.

Recommend the following:
- Upgrade Python to 3.6, so you gain sha3 functions
- Use SHA3
- Store and verify file length together with the hash

SHA1 is known to be insecure for a pretty long time. SHA2 is also weakened at this stage. 

See http://valerieaurora.org/hash.html

CC: (none) => olav

Comment 19 Colin Guthrie 2017-03-07 15:37:27 CET
(In reply to Augier from comment #16)
> > So if you "know very few about SVN", I doubt you'll be able to beat them...
> > Please don't reinvent the wheel.
> > There's a working tool, let's just use it.
> 
> As far as I understand, the tool needs polishing. That's what bothering me a
> bit.

(In reply to Augier from comment #17)
> @Thierry Vignaud: do you know where the rules language of the svn2git is
> documented?

I can completely understand where you're coming from, but now that you know the svn repo is ~700Gb I think you're appreciating the scale of the issue. FYI, I did also do a git-svn of our packages repo some time ago but it only included the specs, not anything else. After several days of running it, I had to give up. Even once it had been converted, the performance of the git repo was itself horrendous. A git log took >30s to start and that was with a reasonably fast machine and an SSD!

So all things told, git-svn is definitely not the right tool for the job. As I said, you could rewrite svn2git in python but I think that would be a pretty time consuming task.

I won't be able to do much in the short term, but after mid-April I might be able to help polish the tool a bit in terms of C++ work, provided you're able to do the testing and verification stuff? To be honest, the code itself is quite simple and I'm sure even with moderate python skills you should be able to dip in and do some tweaks even if larger tweaks are tricky.


I'll look out a partial svn repo dump for you that I used during my testing.
Comment 20 Neal Gompa 2017-03-08 13:30:44 CET
So apparently there's an actual "dist-git" project[1] that provides an implementation of the Dist-Git backend used in Fedora (both the main project and for COPR). Might be worth leveraging?

[1]: https://github.com/release-engineering/dist-git
Comment 21 Augier 2017-04-09 18:06:07 CEST
Ok, I was able to test svn2git thanks to Colin's instruction. Seems to work quite well and it is awsomely fast. For those who wants to see what it's like, I've made up a test repo. Just do the following:

    $ git clone --recursive https://github.com/christophehenry/svn-2-dist-git.git
    $ cd svn-2-dist-git/svn2git
    $ qmake && make
    $ cd ../test/mga-packages-git
    $ ./test.sh

Next I'm going to dig up the tool a bit to generate git repos in the good dist git format. Should not be difficult as Colin already wrote rules that are close to it.

BTW, Colin, can you tell me where that rules language is documented?
Comment 22 Colin Guthrie 2017-04-17 12:51:19 CEST
(In reply to Colin Guthrie from comment #11)
> (In reply to Neal Gompa from comment #5)
> > Just to ensure the dist-git structure and its sources are understood, here's
> > the template layout:
> > 
> > <pkg>
> > |- cauldron <- SVN packages/cauldron/<pkg>
> > |- mga6 <- SVN packages/updates/6/<pkg> (doesn't exist yet!)
> > |- mga6-backport <- SVN packages/backports/6/<pkg> (doesn't exist yet!)
> > |- mga6-infra <- SVN packages/updates/infra_6/<pkg> (doesn't exist yet!)
> > |- mga5 <- SVN packages/updates/5/<pkg>
> > |- mga5-backport <- SVN packages/backports/5/<pkg>
> > |- mga5-infra <- SVN packages/updates/infra_5/<pkg>
> > |- mga4 <- SVN packages/updates/4/<pkg>
> > |- mga4-backport <- SVN packages/backports/4/<pkg>
> > |- mga4-infra <- SVN packages/updates/infra_4/<pkg>
> > |- mga3 <- SVN packages/updates/3/<pkg>
> > |- mga3-backport <- SVN packages/backports/3/<pkg>
> > |- mga3-infra <- SVN packages/updates/infra_3/<pkg>
> > |- mga2 <- SVN packages/updates/2/<pkg>
> > |- mga2-infra <- SVN packages/updates/infra_2/<pkg>
> > |- mga1 <- SVN packages/updates/1/<pkg>
> > |- mga1-infra <- SVN packages/updates/infra_1/<pkg>

Just to reply to my own comment about this. I realised I missed a rather important detail here. This layout does not specify a master branch. This would be pretty strange for at git repo to not have such a branch. Lots of git tools assume this is the case (e.g. git itself on clone, cgit, and likely a lot of others - sure they can be configured and you can manually work around this, but I think this just creates unnecessary work for ourselves).

I can't think of any reason to deviate from the fedora approach and would therefore strongly suggest that the "cauldron" branch above is actually just "master".
Comment 23 Neal Gompa 2017-05-26 17:03:49 CEST
(In reply to Colin Guthrie from comment #22)
> 
> I can't think of any reason to deviate from the fedora approach and would
> therefore strongly suggest that the "cauldron" branch above is actually just
> "master".

I'm okay with the default branch being called "master".
Comment 24 Neal Gompa 2020-06-23 10:45:19 CEST
(In reply to Neal Gompa from comment #23)
> (In reply to Colin Guthrie from comment #22)
> > 
> > I can't think of any reason to deviate from the fedora approach and would
> > therefore strongly suggest that the "cauldron" branch above is actually just
> > "master".
> 
> I'm okay with the default branch being called "master".

Fedora is considering undergoing the process of migrating all "master" branches to "rawhide"[1]. I'd rather us be ahead of the game here and have our development branch be called "cauldron" for similar reasons.

[1]: https://pagure.io/fesco/issue/2410

Note You need to log in before you can comment on or make changes to this bug.