Jonathan Corbet | d95ea1a | 2019-05-28 17:12:51 -0600 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ==================== |
| 4 | Rebasing and merging |
| 5 | ==================== |
| 6 | |
| 7 | Maintaining a subsystem, as a general rule, requires a familiarity with the |
| 8 | Git source-code management system. Git is a powerful tool with a lot of |
| 9 | features; as is often the case with such tools, there are right and wrong |
| 10 | ways to use those features. This document looks in particular at the use |
| 11 | of rebasing and merging. Maintainers often get in trouble when they use |
| 12 | those tools incorrectly, but avoiding problems is not actually all that |
| 13 | hard. |
| 14 | |
| 15 | One thing to be aware of in general is that, unlike many other projects, |
| 16 | the kernel community is not scared by seeing merge commits in its |
| 17 | development history. Indeed, given the scale of the project, avoiding |
| 18 | merges would be nearly impossible. Some problems encountered by |
| 19 | maintainers result from a desire to avoid merges, while others come from |
| 20 | merging a little too often. |
| 21 | |
| 22 | Rebasing |
| 23 | ======== |
| 24 | |
| 25 | "Rebasing" is the process of changing the history of a series of commits |
| 26 | within a repository. There are two different types of operations that are |
| 27 | referred to as rebasing since both are done with the ``git rebase`` |
| 28 | command, but there are significant differences between them: |
| 29 | |
| 30 | - Changing the parent (starting) commit upon which a series of patches is |
| 31 | built. For example, a rebase operation could take a patch set built on |
| 32 | the previous kernel release and base it, instead, on the current |
| 33 | release. We'll call this operation "reparenting" in the discussion |
| 34 | below. |
| 35 | |
| 36 | - Changing the history of a set of patches by fixing (or deleting) broken |
| 37 | commits, adding patches, adding tags to commit changelogs, or changing |
| 38 | the order in which commits are applied. In the following text, this |
| 39 | type of operation will be referred to as "history modification" |
| 40 | |
| 41 | The term "rebasing" will be used to refer to both of the above operations. |
| 42 | Used properly, rebasing can yield a cleaner and clearer development |
| 43 | history; used improperly, it can obscure that history and introduce bugs. |
| 44 | |
| 45 | There are a few rules of thumb that can help developers to avoid the worst |
| 46 | perils of rebasing: |
| 47 | |
| 48 | - History that has been exposed to the world beyond your private system |
| 49 | should usually not be changed. Others may have pulled a copy of your |
| 50 | tree and built on it; modifying your tree will create pain for them. If |
| 51 | work is in need of rebasing, that is usually a sign that it is not yet |
| 52 | ready to be committed to a public repository. |
| 53 | |
| 54 | That said, there are always exceptions. Some trees (linux-next being |
| 55 | a significant example) are frequently rebased by their nature, and |
| 56 | developers know not to base work on them. Developers will sometimes |
| 57 | expose an unstable branch for others to test with or for automated |
| 58 | testing services. If you do expose a branch that may be unstable in |
| 59 | this way, be sure that prospective users know not to base work on it. |
| 60 | |
| 61 | - Do not rebase a branch that contains history created by others. If you |
| 62 | have pulled changes from another developer's repository, you are now a |
| 63 | custodian of their history. You should not change it. With few |
| 64 | exceptions, for example, a broken commit in a tree like this should be |
| 65 | explicitly reverted rather than disappeared via history modification. |
| 66 | |
| 67 | - Do not reparent a tree without a good reason to do so. Just being on a |
| 68 | newer base or avoiding a merge with an upstream repository is not |
| 69 | generally a good reason. |
| 70 | |
| 71 | - If you must reparent a repository, do not pick some random kernel commit |
| 72 | as the new base. The kernel is often in a relatively unstable state |
| 73 | between release points; basing development on one of those points |
| 74 | increases the chances of running into surprising bugs. When a patch |
| 75 | series must move to a new base, pick a stable point (such as one of |
| 76 | the -rc releases) to move to. |
| 77 | |
| 78 | - Realize that reparenting a patch series (or making significant history |
| 79 | modifications) changes the environment in which it was developed and, |
| 80 | likely, invalidates much of the testing that was done. A reparented |
| 81 | patch series should, as a general rule, be treated like new code and |
| 82 | retested from the beginning. |
| 83 | |
| 84 | A frequent cause of merge-window trouble is when Linus is presented with a |
| 85 | patch series that has clearly been reparented, often to a random commit, |
| 86 | shortly before the pull request was sent. The chances of such a series |
| 87 | having been adequately tested are relatively low - as are the chances of |
| 88 | the pull request being acted upon. |
| 89 | |
| 90 | If, instead, rebasing is limited to private trees, commits are based on a |
| 91 | well-known starting point, and they are well tested, the potential for |
| 92 | trouble is low. |
| 93 | |
| 94 | Merging |
| 95 | ======= |
| 96 | |
| 97 | Merging is a common operation in the kernel development process; the 5.1 |
| 98 | development cycle included 1,126 merge commits - nearly 9% of the total. |
| 99 | Kernel work is accumulated in over 100 different subsystem trees, each of |
| 100 | which may contain multiple topic branches; each branch is usually developed |
| 101 | independently of the others. So naturally, at least one merge will be |
| 102 | required before any given branch finds its way into an upstream repository. |
| 103 | |
| 104 | Many projects require that branches in pull requests be based on the |
| 105 | current trunk so that no merge commits appear in the history. The kernel |
| 106 | is not such a project; any rebasing of branches to avoid merges will, most |
| 107 | likely, lead to trouble. |
| 108 | |
| 109 | Subsystem maintainers find themselves having to do two types of merges: |
| 110 | from lower-level subsystem trees and from others, either sibling trees or |
| 111 | the mainline. The best practices to follow differ in those two situations. |
| 112 | |
| 113 | Merging from lower-level trees |
| 114 | ------------------------------ |
| 115 | |
| 116 | Larger subsystems tend to have multiple levels of maintainers, with the |
| 117 | lower-level maintainers sending pull requests to the higher levels. Acting |
| 118 | on such a pull request will almost certainly generate a merge commit; that |
| 119 | is as it should be. In fact, subsystem maintainers may want to use |
| 120 | the --no-ff flag to force the addition of a merge commit in the rare cases |
| 121 | where one would not normally be created so that the reasons for the merge |
| 122 | can be recorded. The changelog for the merge should, for any kind of |
| 123 | merge, say *why* the merge is being done. For a lower-level tree, "why" is |
| 124 | usually a summary of the changes that will come with that pull. |
| 125 | |
| 126 | Maintainers at all levels should be using signed tags on their pull |
| 127 | requests, and upstream maintainers should verify the tags when pulling |
| 128 | branches. Failure to do so threatens the security of the development |
| 129 | process as a whole. |
| 130 | |
| 131 | As per the rules outlined above, once you have merged somebody else's |
| 132 | history into your tree, you cannot rebase that branch, even if you |
| 133 | otherwise would be able to. |
| 134 | |
| 135 | Merging from sibling or upstream trees |
| 136 | -------------------------------------- |
| 137 | |
| 138 | While merges from downstream are common and unremarkable, merges from other |
| 139 | trees tend to be a red flag when it comes time to push a branch upstream. |
| 140 | Such merges need to be carefully thought about and well justified, or |
| 141 | there's a good chance that a subsequent pull request will be rejected. |
| 142 | |
| 143 | It is natural to want to merge the master branch into a repository; this |
| 144 | type of merge is often called a "back merge". Back merges can help to make |
| 145 | sure that there are no conflicts with parallel development and generally |
| 146 | gives a warm, fuzzy feeling of being up-to-date. But this temptation |
| 147 | should be avoided almost all of the time. |
| 148 | |
| 149 | Why is that? Back merges will muddy the development history of your own |
| 150 | branch. They will significantly increase your chances of encountering bugs |
| 151 | from elsewhere in the community and make it hard to ensure that the work |
| 152 | you are managing is stable and ready for upstream. Frequent merges can |
| 153 | also obscure problems with the development process in your tree; they can |
| 154 | hide interactions with other trees that should not be happening (often) in |
| 155 | a well-managed branch. |
| 156 | |
| 157 | That said, back merges are occasionally required; when that happens, be |
| 158 | sure to document *why* it was required in the commit message. As always, |
| 159 | merge to a well-known stable point, rather than to some random commit. |
| 160 | Even then, you should not back merge a tree above your immediate upstream |
| 161 | tree; if a higher-level back merge is really required, the upstream tree |
| 162 | should do it first. |
| 163 | |
| 164 | One of the most frequent causes of merge-related trouble is when a |
| 165 | maintainer merges with the upstream in order to resolve merge conflicts |
| 166 | before sending a pull request. Again, this temptation is easy enough to |
| 167 | understand, but it should absolutely be avoided. This is especially true |
| 168 | for the final pull request: Linus is adamant that he would much rather see |
| 169 | merge conflicts than unnecessary back merges. Seeing the conflicts lets |
| 170 | him know where potential problem areas are. He does a lot of merges (382 |
| 171 | in the 5.1 development cycle) and has gotten quite good at conflict |
| 172 | resolution - often better than the developers involved. |
| 173 | |
| 174 | So what should a maintainer do when there is a conflict between their |
| 175 | subsystem branch and the mainline? The most important step is to warn |
| 176 | Linus in the pull request that the conflict will happen; if nothing else, |
| 177 | that demonstrates an awareness of how your branch fits into the whole. For |
| 178 | especially difficult conflicts, create and push a *separate* branch to show |
| 179 | how you would resolve things. Mention that branch in your pull request, |
| 180 | but the pull request itself should be for the unmerged branch. |
| 181 | |
| 182 | Even in the absence of known conflicts, doing a test merge before sending a |
| 183 | pull request is a good idea. It may alert you to problems that you somehow |
| 184 | didn't see from linux-next and helps to understand exactly what you are |
| 185 | asking upstream to do. |
| 186 | |
| 187 | Another reason for doing merges of upstream or another subsystem tree is to |
| 188 | resolve dependencies. These dependency issues do happen at times, and |
| 189 | sometimes a cross-merge with another tree is the best way to resolve them; |
| 190 | as always, in such situations, the merge commit should explain why the |
| 191 | merge has been done. Take a moment to do it right; people will read those |
| 192 | changelogs. |
| 193 | |
| 194 | Often, though, dependency issues indicate that a change of approach is |
| 195 | needed. Merging another subsystem tree to resolve a dependency risks |
| 196 | bringing in other bugs and should almost never be done. If that subsystem |
| 197 | tree fails to be pulled upstream, whatever problems it had will block the |
| 198 | merging of your tree as well. Preferable alternatives include agreeing |
| 199 | with the maintainer to carry both sets of changes in one of the trees or |
| 200 | creating a topic branch dedicated to the prerequisite commits that can be |
| 201 | merged into both trees. If the dependency is related to major |
| 202 | infrastructural changes, the right solution might be to hold the dependent |
| 203 | commits for one development cycle so that those changes have time to |
| 204 | stabilize in the mainline. |
| 205 | |
| 206 | Finally |
| 207 | ======= |
| 208 | |
| 209 | It is relatively common to merge with the mainline toward the beginning of |
| 210 | the development cycle in order to pick up changes and fixes done elsewhere |
| 211 | in the tree. As always, such a merge should pick a well-known release |
| 212 | point rather than some random spot. If your upstream-bound branch has |
| 213 | emptied entirely into the mainline during the merge window, you can pull it |
| 214 | forward with a command like:: |
| 215 | |
| 216 | git merge v5.2-rc1^0 |
| 217 | |
| 218 | The "^0" will cause Git to do a fast-forward merge (which should be |
| 219 | possible in this situation), thus avoiding the addition of a spurious merge |
| 220 | commit. |
| 221 | |
| 222 | The guidelines laid out above are just that: guidelines. There will always |
| 223 | be situations that call out for a different solution, and these guidelines |
| 224 | should not prevent developers from doing the right thing when the need |
| 225 | arises. But one should always think about whether the need has truly |
| 226 | arisen and be prepared to explain why something abnormal needs to be done. |