Skip to content

Tags: rextge/git

Tags

pr-497/garimasi514/coreGit-bloomFilters-v2

Toggle pr-497/garimasi514/coreGit-bloomFilters-v2's commit message
Changed Paths Bloom Filters

Hey!

The commit graph feature brought in a lot of performance improvements across
multiple commands. However, file based history continues to be a performance
pain point, especially in large repositories.

Adopting changed path bloom filters has been discussed on the list before,
and a prototype version was worked on by SZEDER Gábor, Jonathan Tan and Dr.
Derrick Stolee [1]. This series is based on Dr. Stolee's proof of concept in
[2]

Performance Gains: We tested the performance of git log -- path on the git
repo, the linux repo and some internal large repos, with a variety of paths
of varying depths.

On the git and linux repos: We observed a 2x to 5x speed up.

On a large internal repo with files seated 6-10 levels deep in the tree: We
observed 10x to 20x speed ups, with some paths going up to 28 times faster.

Future Work (not included in the scope of this series):

 1. Supporting multiple path based revision walk
 2. Adopting it in git blame logic.
 3. Interactions with line log git log -L

----------------------------------------------------------------------------

Updates since the last submission

 * Removed all the RFC callouts, this is a ready for full review version
 * Added unit tests for the bloom filter computation layer
 * Added more evolved functional tests for git log
 * Fixed a lot of the bugs found by the tests
 * Reacted to other miscellaneous feedback on the RFC series.

Cheers! Garima Singh

[1] https://lore.kernel.org/git/20181009193445.21908-1-szeder.dev@gmail.com/
[2]
https://lore.kernel.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/

Derrick Stolee (2):
  diff: halt tree-diff early after max_changes
  commit-graph: examine commits by generation number

Garima Singh (8):
  commit-graph: use MAX_NUM_CHUNKS
  bloom: core Bloom filter implementation for changed paths
  commit-graph: compute Bloom filters for changed paths
  commit-graph: write Bloom filters to commit graph file
  commit-graph: reuse existing Bloom filters during write.
  commit-graph: add --changed-paths option to write subcommand
  revision.c: use Bloom filters to speed up path based revision walks
  commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag

Jeff King (1):
  commit-graph: examine changed-path objects in pack order

 Documentation/git-commit-graph.txt            |   5 +
 .../technical/commit-graph-format.txt         |  24 ++
 Makefile                                      |   2 +
 bloom.c                                       | 277 ++++++++++++++++++
 bloom.h                                       |  58 ++++
 builtin/commit-graph.c                        |  10 +-
 ci/run-build-and-tests.sh                     |   1 +
 commit-graph.c                                | 211 ++++++++++++-
 commit-graph.h                                |   9 +-
 diff.h                                        |   5 +
 revision.c                                    | 124 +++++++-
 revision.h                                    |  11 +
 t/README                                      |   5 +
 t/helper/test-bloom.c                         |  84 ++++++
 t/helper/test-read-graph.c                    |   4 +
 t/helper/test-tool.c                          |   1 +
 t/helper/test-tool.h                          |   1 +
 t/t0095-bloom.sh                              | 113 +++++++
 t/t4216-log-bloom.sh                          | 143 +++++++++
 t/t5318-commit-graph.sh                       |   2 +
 t/t5324-split-commit-graph.sh                 |   1 +
 tree-diff.c                                   |   6 +
 22 files changed, 1088 insertions(+), 9 deletions(-)
 create mode 100644 bloom.c
 create mode 100644 bloom.h
 create mode 100644 t/helper/test-bloom.c
 create mode 100755 t/t0095-bloom.sh
 create mode 100755 t/t4216-log-bloom.sh

base-commit: 5b0ca87

Submitted-As: https://lore.kernel.org/git/pull.497.v2.git.1580943390.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.497.git.1576879520.gitgitgadget@gmail.com

pr-git-695/Masmiseim36/dev/CloneWithSubmodule-v2

Toggle pr-git-695/Masmiseim36/dev/CloneWithSubmodule-v2's commit message
clone: use submodules.recurse option for automatically clone submodules

From: Markus Klein <masmiseim@gmx.de>

Simplify cloning repositories with submodules when the option
submodules.recurse is set by the user. This makes it transparent to the
user if submodules are used. The user doesn’t have to know if he has to add
an extra parameter to get the full project including the used submodules.
This makes clone behave identical to other commands like fetch, pull,
checkout, ... which include the submodules automatically if this option is
set.

It is implemented analog to the pull command by using an own config
function instead of using just the default config. In contrast to the pull
command, the submodule.recurse state is saved as an array of strings as it
can take an optionally pathspec argument which describes which submodules
should be recursively initialized and cloned. To recursively initialize and
clone all submodules a pathspec of "." has to be used.
The regression test is simplified compared to the test for "git clone
--recursive" as the general functionality is already checked there.

Changes since v1:
* Fixed the commit author to match the Signed-off-by line

Signed-off-by: Markus Klein <masmiseim@gmx.de>

Submitted-As: https://lore.kernel.org/git/pull.695.v2.git.git.1580851963616.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.695.git.git.1580505092071.gitgitgadget@gmail.com

pr-539/hanwen/reftable-v3

Toggle pr-539/hanwen/reftable-v3's commit message
Reftable support git-core

This adds the reftable library, and hooks it up as a ref backend.

At this point, I am mainly interested in feedback on the spots marked with
XXX in the Git source code, in particular, how to handle reflog expiry in
this backend.

v2

 * address Jun's nits.
 * address Dscho's portability comments
 * more background in commit messages.

Han-Wen Nienhuys (6):
  refs.h: clarify reflog iteration order
  setup.c: enable repo detection for reftable
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add reftable library
  Reftable support for git-core

 Makefile                |   24 +-
 builtin/init-db.c       |   42 +-
 cache.h                 |    2 +
 refs.c                  |   22 +-
 refs.h                  |    5 +-
 refs/files-backend.c    |    5 +
 refs/refs-internal.h    |    6 +
 refs/reftable-backend.c |  880 +++++++++++++++++++++++++++++++
 reftable/LICENSE        |   31 ++
 reftable/README.md      |   19 +
 reftable/VERSION        |    5 +
 reftable/basics.c       |  196 +++++++
 reftable/basics.h       |   37 ++
 reftable/block.c        |  401 ++++++++++++++
 reftable/block.h        |   71 +++
 reftable/blocksource.h  |   20 +
 reftable/bytes.c        |    0
 reftable/config.h       |    1 +
 reftable/constants.h    |   27 +
 reftable/dump.c         |   97 ++++
 reftable/file.c         |   97 ++++
 reftable/iter.c         |  229 ++++++++
 reftable/iter.h         |   56 ++
 reftable/merged.c       |  286 ++++++++++
 reftable/merged.h       |   34 ++
 reftable/pq.c           |  114 ++++
 reftable/pq.h           |   34 ++
 reftable/reader.c       |  708 +++++++++++++++++++++++++
 reftable/reader.h       |   52 ++
 reftable/record.c       | 1107 +++++++++++++++++++++++++++++++++++++++
 reftable/record.h       |   79 +++
 reftable/reftable.h     |  399 ++++++++++++++
 reftable/slice.c        |  199 +++++++
 reftable/slice.h        |   39 ++
 reftable/stack.c        |  983 ++++++++++++++++++++++++++++++++++
 reftable/stack.h        |   40 ++
 reftable/system.h       |   57 ++
 reftable/tree.c         |   66 +++
 reftable/tree.h         |   24 +
 reftable/writer.c       |  622 ++++++++++++++++++++++
 reftable/writer.h       |   46 ++
 reftable/zlib-compat.c  |   92 ++++
 repository.c            |    4 +
 repository.h            |    3 +
 setup.c                 |   27 +-
 45 files changed, 7255 insertions(+), 33 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

base-commit: 5b0ca87

Submitted-As: https://lore.kernel.org/git/pull.539.v3.git.1580848060.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.539.git.1579808479.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.539.v2.git.1580134944.gitgitgadget@gmail.com

pr-git-700/newren/fill-directory-exponential-v2

Toggle pr-git-700/newren/fill-directory-exponential-v2's commit message
Avoid multiple recursive calls for same path in read_directory_recurs…

…ive()

This patch series builds on en/fill-directory-fixes-more. This series should
be considered an RFC because of the untracked-cache changes (see the last
two commits), for which I'm hoping to get an untracked-cache expert to
comment. This series does provide some modest speedups (see second to last
commit message), and should allow 'git status --ignored' to complete in a
more reasonable timeframe for Martin Melka (see
https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/
)

Changes since v1:

 * Replaced patch 4 with improved version from Stolee (with additional
   improvement of my own)
 * Clarifications, wording fixes, and more about linear perf in commit
   message to patch 5
 * More detail in patch 5 about why "whackamole" particularly makes me
   uneasy for dir.c

Stuff clearly still missing from v2:

 * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in
   https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/
   which I think would make the code cleaner & clearer.
 * I still have not addressed the untracked-cache issue mentioned in the
   last two commits. I looked at it very, very briefly, but I was really
   close to doing something similar to [1] and just dropping my patches in
   this series before even submitting them on Wednesday[2] (dir.c is a
   really unpleasant to work in). Other than wording fixes, I just need a
   week or two off from this area before I dig further, unless someone else
   wants to dive in and needs me to provide pointers on what I've done so
   far.

[1]
https://lore.kernel.org/git/pull.676.v3.git.git.1576571586.gitgitgadget@gmail.com/
[2] I was inches from doing that Wednesday morning. I had done several
rounds of "Okay, I fixed all the tests that broke with my changes last time,
let's re-run the testsuite -- wow, four totally different tests from
testfiles I hadn't looked at before now break", and decided that I would
only do one more before dropping it an maybe coming back in a month or two.
That time happened to work, minus the untracked-cache, so I decided to put
it in front of other eyeballs.

Derrick Stolee (1):
  dir: refactor treat_directory to clarify control flow

Elijah Newren (5):
  dir: consolidate treat_path() and treat_one_path()
  dir: fix broken comment
  dir: fix confusion based on variable tense
  dir: replace exponential algorithm with a linear one
  t7063: blindly accept diffs

 dir.c                             | 331 +++++++++++++++++-------------
 t/t7063-status-untracked-cache.sh |  50 ++---
 2 files changed, 208 insertions(+), 173 deletions(-)

base-commit: 0cbb605

Submitted-As: https://lore.kernel.org/git/pull.700.v2.git.git.1580495486.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.700.git.git.1580335424.gitgitgadget@gmail.com

pr-git-698/seraphire/seraphire/p4-hook-v2

Toggle pr-git-698/seraphire/seraphire/p4-hook-v2's commit message
git-p4: add hook p4-pre-edit-changelist

Our company's workflow requires that our P4 check-in messages have a
specific format. A helpful feature in the GIT-P4 program would be a hook
that occurs after the P4 change list is created but before it is displayed
in the editor that would allow an external program to possibly edit the
changelist text.

v1:My suggestion for the hook name is p4-pre-edit-changelist.

It would take a single parameter, the full path of the temporary file. If
the hook returns a non-zero exit code, it would cancel the current P4
submit.

The hook should be optional.

v2:Instead of a single hook, p4-pre-edit-changelist, follow the git
convention for hook names and add the trio of hooks that work together,
similar to git commit.

The hook names are:

 * p4-prepare-changelist
 * p4-changelist
 * p4-post-changelist

The hooks should follow the same convention as git commit, so a new command
line option for the git-p4 submit function --no-verify should also be added.

Ben Keene (4):
  git-p4: rewrite prompt to be Windows compatible
  git-p4: create new method gitRunHook
  git-p4: add hook p4-pre-edit-changelist
  git-p4: add p4 submit hooks

 Documentation/git-p4.txt   |  44 ++++++++-
 Documentation/githooks.txt |  46 +++++++++
 git-p4.py                  | 191 ++++++++++++++++++++++++++-----------
 3 files changed, 225 insertions(+), 56 deletions(-)

base-commit: 5b0ca87

Submitted-As: https://lore.kernel.org/git/pull.698.v2.git.git.1580507895.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.698.git.git.1579555036314.gitgitgadget@gmail.com

pr-git-695/Masmiseim36/dev/CloneWithSubmodule-v1

Toggle pr-git-695/Masmiseim36/dev/CloneWithSubmodule-v1's commit message
clone: use submodules.recurse option for automatically clone submodules

From: Markus <masmiseim@gmx.de>

Simplify cloning repositories with submodules when the option
submodules.recurse is set by the user. This makes it transparent to the
user if submodules are used. The user doesn’t have to know if he has to add
an extra parameter to get the full project including the used submodules.
This makes clone behave identical to other commands like fetch, pull,
checkout, ... which include the submodules automatically if this option is
set.

It is implemented analog to the pull command by using an own config
function instead of using just the default config. In contrast to the pull
command, the submodule.recurse state is saved as an array of strings as it
can take an optionally pathspec argument which describes which submodules
should be recursively initialized and cloned. To recursively initialize and
clone all submodules a pathspec of "." has to be used.
The regression test is simplified compared to the test for "git clone
--recursive" as the general functionality is already checked there.

Signed-off-by: Markus Klein <masmiseim@gmx.de>

Submitted-As: https://lore.kernel.org/git/pull.695.git.git.1580505092071.gitgitgadget@gmail.com

pr-513/derrickstolee/sparse-harden-v4

Toggle pr-513/derrickstolee/sparse-harden-v4's commit message
Harden the sparse-checkout builtin

This series is based on ds/sparse-list-in-cone-mode.

This series attempts to clean up some rough edges in the sparse-checkout
feature, especially around the cone mode.

Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
clone --sparse" option when using a URL instead of a local path. This is
fixed and properly tested here.

Also, let's improve Git's response to these more complicated scenarios:

 1. Running "git sparse-checkout init" in a worktree would complain because
    the "info" dir doesn't exist.
 2. Tracked paths that include "*" and "\" in their filenames.
 3. If a user edits the sparse-checkout file to have non-cone pattern, such
    as "**" anywhere or "*" in the wrong place, then we should respond
    appropriately. That is: warn that the patterns are not cone-mode, then
    revert to the old logic.

Updates in V2:

 * Added C-style quoting to the output of "git sparse-checkout list" in cone
   mode.
 * Improved documentation.
 * Responded to most style feedback. Hopefully I didn't miss anything.
 * I was lingering on this a little to see if I could also fix the issue
   raised in [1], but I have not figured that one out, yet.

Update in V3:

 * Input now uses Peff's recommended pattern: unquote C-style strings over
   stdin and otherwise do not un-escape input.

[1]
https://lore.kernel.org/git/062301d5d0bc$c3e17760$4ba46620$@Frontier.com/

Thanks, -Stolee

Derrick Stolee (14):
  t1091: use check_files to reduce boilerplate
  t1091: improve here-docs
  sparse-checkout: create leading directories
  clone: fix --sparse option with URLs
  sparse-checkout: cone mode does not recognize "**"
  sparse-checkout: detect short patterns
  sparse-checkout: warn on globs in cone patterns
  sparse-checkout: properly match escaped characters
  sparse-checkout: write escaped patterns in cone mode
  sparse-checkout: unquote C-style strings over --stdin
  sparse-checkout: use C-style quotes in 'list' subcommand
  sparse-checkout: escape all glob characters on write
  sparse-checkout: improve docs around 'set' in cone mode
  sparse-checkout: fix cone mode behavior mismatch

Jeff King (1):
  sparse-checkout: fix documentation typo for core.sparseCheckoutCone

 Documentation/git-sparse-checkout.txt |  19 +-
 builtin/clone.c                       |   2 +-
 builtin/sparse-checkout.c             |  48 +++-
 dir.c                                 |  79 +++++-
 t/t1091-sparse-checkout-builtin.sh    | 352 +++++++++++++++-----------
 unpack-trees.c                        |   2 +-
 6 files changed, 346 insertions(+), 156 deletions(-)

base-commit: 4fd683b

Submitted-As: https://lore.kernel.org/git/pull.513.v4.git.1580501775.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.513.git.1579029962.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.513.v2.git.1579900782.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.513.v3.git.1580236003.gitgitgadget@gmail.com

pr-540/phil-blain/grep-no-index-ignore-recurse-submodule-v2

Toggle pr-540/phil-blain/grep-no-index-ignore-recurse-submodule-v2's commit message
grep: ignore --recurse-submodules if --no-index is given

From: Philippe Blain <levraiphilippeblain@gmail.com>

Since grep learned to recurse into submodules in 0281e48
(grep: optionally recurse into submodules, 2016-12-16),
using --recurse-submodules along with --no-index makes Git
die().

This is unfortunate because if submodule.recurse is set in a user's
~/.gitconfig, invoking `git grep --no-index` either inside or outside
a Git repository results in

    fatal: option not supported with --recurse-submodules

Let's allow using these options together, so that setting submodule.recurse
globally does not prevent using `git grep --no-index`.

Using `--recurse-submodules` should not have any effect if `--no-index`
is used inside a repository, as Git will recurse into the checked out
submodule directories just like into regular directories.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>

Submitted-As: https://lore.kernel.org/git/pull.540.v2.git.1580391448318.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.540.git.1580000298097.gitgitgadget@gmail.com

pr-537/HebaWaly/git_dir_doc-v4

Toggle pr-537/HebaWaly/git_dir_doc-v4's commit message
git: update documentation for --git-dir

From: Heba Waly <heba.waly@gmail.com>

git --git-dir <path> is a bit confusing and sometimes doesn't work as
the user would expect it to.

For example, if the user runs `git --git-dir=<path> status`, git
will skip the repository discovery algorithm and will assign the
work tree to the user's current work directory unless otherwise
specified. When this assignment is wrong, the output will not match
the user's expectations.

This patch updates the documentation to make it clearer.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>

Submitted-As: https://lore.kernel.org/git/pull.537.v4.git.1580346841614.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.537.git.1579745811615.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.537.v2.git.1580091855792.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.537.v3.git.1580185440512.gitgitgadget@gmail.com

pr-508/HebaWaly/formatting_hints-v3

Toggle pr-508/HebaWaly/formatting_hints-v3's commit message
add: use advice API to display hints

From: Heba Waly <heba.waly@gmail.com>

In the "add" command, use the advice API to display hints to users,
as it provides a neat and a standard format for hint messages, and
the message visibility will be configurable.

Signed-off-by: Heba Waly <heba.waly@gmail.com>

Submitted-As: https://lore.kernel.org/git/pull.508.v3.git.1580346702203.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.508.git.1577934241.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.508.v2.git.1578438752.gitgitgadget@gmail.com