attempt to fix merge ff with multiple remotes and multiple lfs commits#6171
attempt to fix merge ff with multiple remotes and multiple lfs commits#6171fwag wants to merge 1 commit into
Conversation
…s with LFS object changes
|
Hey, thanks for this proposed change, and I'm sorry you're having trouble. I'll try to take a closer look over the holidays, if at all possible.
This option enables the automatic selection of a Git remote from which to derive a Git LFS remote URL based on the output of a command like As a consequence, to maintain backwards compatibility, a new configuration option had to be introduced and the additional logic made conditional on that option being explicitly set, rather than being the default. When and if we release another major version such as v4.0.0, we would then have an opportunity to revisit the existing defaults for our configuration options and remote selection logic and break with the previously established behaviour of the Git LFS client, although we'd still want to try to minimize the consequences for our users. For instance, we might only try to determine the remote automatically from the current commit if no All of which is to say that we are unfortunately not able to make |
chrisd8088
left a comment
There was a problem hiding this comment.
Hey, thanks again for this PR, and for clearly documenting the issue you encountered! It's taken me a while to think through some of the implications of the changes you've proposed; my apologies for the delay.
In the conditions you describe, I believe our existing tools at least provide a manual step users can take to fetch the necessary Git LFS objects, which is to run a git lfs fetch --all <remote> <remote>/<ref> command. (I'll put a longer example in a separate comment with some other notes.)
However, I acknowledge that this is less than ideal as it does require a separate step on the part of users.
As I've noted already, we can't change the default setting of our lfs.remote.autoDetect or lfs.remote.searchAll configuration options, at least not at this time.
As for the changes to our post-merge hook program, though, those could be valuable for some users. However, I think they (a) need to respect the existing remote selection logic, and (b) need to run only when a new lfs.fetchPostMerge configuration option is also enabled.
In other words, if a user has lfs.fetchPostMerge enabled, our post-merge hook program would try to fetch any Git LFS objects found between ORIG_HEAD and HEAD, using our existing logic to select a remote. That means respecting lfs.remote.autoDetect if it's enabled, and otherwise using the remote selected by remote.lfsDefault and lfs.url, etc., but also respecting the lfs.remote.searchAll fallback mechanism if it's enabled.
It's a bit unfortunate that Git doesn't (yet) provide a post-fetch hook, which would be great place for us to register a program that would automatically fetch any Git LFS objects corresponding to the Git objects that were fetched. (If you feel like proposing such a hook on the Git mailing list, please do so!)
We'll also need this PR to include an appropriate set of new tests to verify the expanded behaviour of the post-merge hook, likely in our t/t-post-merge.sh test script.
As well, we'll want this PR to document the new operations of the post-merge hook in our git-lfs-post-merge(1) manual page, and we'll need an additional entry in our git-lfs-config(5) manual page to document the new lfs.fetchPostMerge option.
Thank you again for taking an interest in trying to improve the experience of our users, and for writing up this proposal!
| os.Exit(1) | ||
| } | ||
|
|
||
| fetchMissingLfsObjects("ORIG_HEAD", "HEAD") |
There was a problem hiding this comment.
If we want to introduce this type of behaviour to our post-merge hook program, I think we should make it optional, and also not the default. Perhaps we could introduce an lfs.fetchPostMerge configuration option for this purpose, whose default value is false.
My rationale for this suggestion is based on two concerns. First, I suspect that users may be surprised if a git merge operation, which typically requires only local access, now requires connectivity to a remote in order to retrieve Git LFS objects, and possibly even new credentials.
Second, while some users may wish to retrieve all the Git LFS objects referenced in intermediate commits in the histories of the merged branches (for instance, if they wish to then push the merged result to another remote, as in your case), other users may want to keep as few Git LFS objects as possible in local storage. These users typically deal with large repositories where the Git LFS client's duplicative local storage is a burden, and so they may not want to automatically fetch a full history's worth of objects when they perform a merge.
There was a problem hiding this comment.
What if we move it to pre push hook ?
There was a problem hiding this comment.
What if we move it to pre push hook ?
Interesting question! I think my comments would largely all still apply. In part that's because it might be surprising to a user if they push to remote A and are asked for credentials for remote B in order to download Git LFS objects, especially if remote A is just another copy of the repository on their local disk.
I skimmed through the Git mailing list archives for discussions of a post-fetch hook, and the topic has come up a few times. One relevant early message from the Git maintainer outlines conditions for a new hook, one of which would seem to broadly apply to the Git LFS use case:
There are five valid reasons you might want a hook to a git operation:
...
(2) A hook that operates on data generated after the command starts to run. The ability to munge the commit log message by the commit-msg hook is an example.
In another message from a thread about a potential "tweak-fetch" hook the Git maintainer points out that a typical git pull operation invokes git fetch and then git merge. So if we have to work with just the existing hooks provided by Git, I think the post-merge hook is (somewhat) more appropriate than the pre-push hook.
In practice, of course, we primarily rely on the "smudge" operation to download the Git LFS objects the user requires in order to perform the final updates of the working tree, both when git pull is run but also when the user runs git checkout or git merge.
The challenge with this, as you've pointed out, is that we expect Git LFS objects to be available from the default Git LFS remote the user has configured, and we only "see" the objects in the current checkout, not any intermediate objects in the Git history (with the caveat that not all Git LFS users want those intermediate objects to be automatically retrieved).
A post-fetch hook, if one existed, would presumably inform the hook program about the remote from which Git objects were fetched, as well as the IDs of those objects. Hypothetically, a Git LFS post-fetch hook might then be able to download any corresponding Git LFS objects before any subsequent checkout stage, so our "smudge" filter would find that all the Git LFS objects necessary for the working tree checkout already exist locally and thus wouldn't need to download anything at all.
We'd want such a hook program to respect all the lfs.fetch* configuration options, like lfs.fetchExclude.
I'm slightly uncertain, though, as to how our existing remote server discovery process could or should apply if the remote provided to the post-fetch hook differs from, say, an explicitly configured lfs.url setting. I think we'd need to work through the full range of potential conditions here and make sure we weren't doing anything too surprising for users.
But that's all hypothetical, of course, since Git doesn't have a post-fetch hook right now.
| if r := git.FirstRemoteForTreeish(post); r != "" { | ||
| remote = r | ||
| } |
There was a problem hiding this comment.
Our only other use of the FirstRemoteForTreeish() function is governed by the lfs.remote.autoDetect configuration option, and so I think this one should be as well.
The FirstRemoteForTreeish() function will run a git branch -r --contains HEAD command and then select the first reference from that command's output to select a remote. While this works, it's not the established behaviour of the Git LFS client, and as I've noted already, we shouldn't change that behaviour (at least not for the present).
All of which is to say that I think if we are going to fetch Git LFS objects in situations where we did not previously do so, we should select the remote(s) from which to fetch following the same logic we do now when "smudging".
For one thing, as was pointed out during the review of PR #5066 when the lfs.remote.autoDetect and lfs.remote.searchAll options were introduced, if a user has multiple remotes and we choose one other than what we've selected previously, the user may be prompted for credentials. We should therefore make sure they've opted into this behaviour by enabling the relevant options and that we don't otherwise try to use remotes different than the one we would normally select.
|
|
||
| func (c *Configuration) AutoDetectRemoteEnabled() bool { | ||
| return c.Git.Bool("lfs.remote.autodetect", false) | ||
| return c.Git.Bool("lfs.remote.autodetect", true) |
There was a problem hiding this comment.
Alas, as I noted in my initial comments on this PR, we can't change this option's default setting at this time.
In general, we try to avoid altering the defaults for our configuration options unless there's a critical bug which we need to correct. Barring that type of problem, if we do change the default settings for our options, that change should be part of a new major version of the Git LFS client and should be accompanied by a suitably lengthy period of advance notice to our users.
|
For my own future reference, here's a reproduction case of the issue described in this PR, using only local repositories: We can resolve the immediate issue by using To fix that issue, we need to run |
This code is meant to fix the following issue reproducer
Clone the same repo just from two different git instances
Add new LFS objects to OSS
Now we have two new commits with 2 LFS objects to push
Move to the repo hosted on another instance with a branch in common
lfs.remote.autodetect fixes the issue, why is the default setting not true ?
OK, so things look like they merged… Try to push
With a fast forward merge, only the HEAD LFS object is fetched, leaving the LFS objects from intermediate commits unfetched.