forked from git/git
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PATH WALK IV: Add 'git survey' command #1821
Open
derrickstolee
wants to merge
1,560
commits into
gitgitgadget:api-upstream
Choose a base branch
from
derrickstolee:survey-upstream
base: api-upstream
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
PATH WALK IV: Add 'git survey' command #1821
derrickstolee
wants to merge
1,560
commits into
gitgitgadget:api-upstream
from
derrickstolee:survey-upstream
+50,526
−19,127
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
95e5c93
to
01f3080
Compare
97d669a
to
5252076
Compare
01f3080
to
61b1397
Compare
5252076
to
0bb607e
Compare
61b1397
to
38e1168
Compare
38e1168
to
b5c4a74
Compare
0bb607e
to
e716672
Compare
b5c4a74
to
ffec88a
Compare
8a7b7e6
to
781b2ea
Compare
358e4e6
to
4d496d5
Compare
781b2ea
to
ef54342
Compare
4d496d5
to
2d7e24f
Compare
"git init" to reinitialize a repository that already exists cannot change the hash function and ref backends; such a request is silently ignored now. * ps/setup-reinit-fixes: setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT t0001: remove duplicate test
"git apply" internally uses unsigned long for line numbers and uses strtoul() to parse numbers on the hunk headers. It however forgot to check parse errors. * pw/apply-ulong-overflow-check: apply: detect overflow when parsing hunk header
Two CI tasks, whitespace check and style check, work on the difference from the base version and the version being checked, but the base was computed incorrectly in GitLab CI in some cases, which has been corrected. * jt/gitlab-ci-base-fix: ci: fix base commit fallback for check-whitespace and check-style
Further code clean-up on the use of hash functions. Now the context object knows what hash function it is working with. * ps/hash-cleanup: global: adapt callers to use generic hash context helpers hash: provide generic wrappers to update hash contexts hash: stop typedeffing the hash context hash: convert hashing context to a structure
Convert a handful of unit tests to work with the clar framework. * sk/unit-tests-0130: t/unit-tests: convert strcmp-offset test to use clar test framework t/unit-tests: convert strbuf test to use clar test framework t/unit-tests: adapt example decorate test to use clar test framework t/unit-tests: convert hashmap test to use clar test framework
CI update to make Coverity job work again. * jk/ci-coverity-update: ci: set CI_JOB_IMAGE for coverity job
Signed-off-by: Junio C Hamano <[email protected]>
The use of "echo -e" is not portable and not specified by POSIX. dash does not support any options except "-n", and so this script will not work on operating systems which use that as /bin/sh. Fortunately, the solution is easy: switch to printf(1), which is specified by POSIX and allows the escape sequences we want to use. This will allow the script to work with any POSIX shell. Signed-off-by: brian m. carlson <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/update-server-info.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_update_server_info()` function with `repo` set to NULL and then early in the function, "parse_options()" call will give the options help and exit, without having to consult much of the configuration file. So it is safe to omit reading the config when `repo` argument the caller gave us is NULL. Mentored-by: Christian Couder <[email protected]> Signed-off-by: Usman Akinyemi <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
When rebase rewords a commit it picks the commit and then runs "git commit --amend" to reword it. When the commit is picked the sequencer tries to reuse existing commits by fast-forwarding if the parents are unchanged. Rewording an empty commit that has been fast-forwarded fails because "git commit --amend" is called without "--allow-empty". This happens because when a commit is fast-forwarded the logic that checks whether we should pass "--allow-empty" is skipped. Fix this by always passing "--allow-empty" when rewording a commit. This is safe because we are amending a commit that has already been picked so if it had become empty when it was picked we'd have already returned an error. As "git commit" will happily create empty merge commits without "--allow-empty" we do not need to pass that flag when rewording merge commits. Signed-off-by: Phillip Wood <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
We do not seem to centrally document exhaustively ways to spell Boolean values. The description in the Environment Variables of git(1) section assumes that the reader is already familiar with how "Boolean valued configuration variables" are specified, without referring to anything, so there is no way for the readers to find out more. The description of `bool` in the section on "--type <type>" in "git config --help" might be the place to do so, but it is not telling us all that much. The description of Boolean valued placeholders in the pretty formats section of "git log --help" enumerates the possible values with "etc." implying there may be other synonyms; shrink the list of samples and instead refer to the canonical and authoritative source of truth, which now is git-config(1). Signed-off-by: Junio C Hamano <[email protected]>
The -X renormalize (or merge.renormalize config) option is intended to reduce conflicts due to normalization of newer versions of history. It does so by renormalizing files that it is about to do a three-way content merge on. Some folks thought it would renormalize all files throughout the tree, and the previous wording wasn't clear enough to dispell that misconception. Update the docs to make it clear that the merge machinery will only apply renormalization to files which need a three-way content merge. (Technically, the merge machinery also does renormalization on modify/delete conflicts, in order to see if the modification was merely a normalization; if so, it can accept the delete and not report a conflict. But it's not clear that this piece needs to be explained to users, and trying to distinguish it might feel like splitting hairs and overcomplicating the explanation, so we leave it out.) Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Allow each file to fix the warnings guarded by the macro separately by moving the definition from the shared xinclude.h into each file that needs it. xmerge.c and xprepare.c do not contain any signed vs. unsigned comparisons so the definition was not included in these files. Signed-off-by: David Aguilar <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
The loop iteration variable is non-negative and only used in comparisons against other size_t values. Signed-off-by: David Aguilar <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
* 'vi-2.49' of github.com:Nekosha/git-po: l10n: Updated translation for vi-2.49
Signed-off-by: Emir SARI <[email protected]>
Co-authored-by: Kate Golovanova <[email protected]> Co-authored-by: Mikhail T. <[email protected]> Co-authored-by: Tamara Lazerka <[email protected]> Signed-off-by: Arkadii Yakovets <[email protected]> Signed-off-by: Kate Golovanova <[email protected]> Signed-off-by: Mikhail T. <[email protected]> Signed-off-by: Tamara Lazerka <[email protected]>
* '2.49-uk-update' of github.com:arkid15r/git-ukrainian-l10n: l10n: uk: add 2.49 translation
Helped-by: 依云 <[email protected]> Helped-by: Jiang Xin <[email protected]> Signed-off-by: Teng Long <[email protected]>
* 'tl/zh_CN_2.49.0_rnd' of github.com:dyrone/git: l10n: zh_CN: updated translation for 2.49
Doc markup fix. * ma/clone-doc-markup-fix: git-clone doc: fix indentation
"git version --build-options" stopped showing zlib version by mistake due to recent refactoring, which has been corrected. * tc/zlib-ng-fix: help: print zlib-ng version number help: include git-zlib.h to print zlib version
Doc updates. * pb/doc-follow-remote-head: config/remote.txt: improve wording for 'remote.<name>.followRemoteHEAD' config/remote.txt: reunite 'severOption' description paragraphs
Signed-off-by: Junio C Hamano <[email protected]>
Update following components: * builtin/clone.c * builtin/commit.c * builtin/fetch.c * builtin/index-pack.c * builtin/pack-objects.c * builtin/refs.c * builtin/repack.c * builtin/unpack-objects.c * command-list.h * diff.c * object-file.c * parse-options.c * promisor-remote.c * refspec.c * remote.c Translate following new components: * path-walk.c * builtin/backfill.c * t/helper/test-path-walk.c Signed-off-by: Bagas Sanjaya <[email protected]>
Signed-off-by: Ralf Thielow <[email protected]>
* 'l10n-de-2.49' of github.com:ralfth/git: l10n: update German translation
Co-authored-by: Lumynous <[email protected]> Signed-off-by: Yi-Jyun Pan <[email protected]>
* 'l10n/zh-TW/2025-03-09' of github.com:l10n-tw/git-po: l10n: zh_TW: Git 2.49.0 round 1
l10n-2.49.0-rnd1 * tag 'l10n-2.49.0-rnd1' of https://github.com/git-l10n/git-po: l10n: zh_TW: Git 2.49.0 round 1 l10n: update German translation l10n: po-id for 2.49 l10n: zh_CN: updated translation for 2.49 l10n: uk: add 2.49 translation l10n: tr: Update Turkish translations for 2.49.0 l10n: ko: fix minor typo in Korean translation l10n: it: fix spelling of "sorgente" (Italian for "source") l10n: sv.po: Fix Swedish typos l10n: sv.po: Update Swedish translation l10n: fr: 2.49 round 2 l10n: bg.po: Updated Bulgarian translation (5836t) l10n: Updated translation for vi-2.49
Signed-off-by: Junio C Hamano <[email protected]>
46f4ec4
to
299f3c3
Compare
Start work on a new 'git survey' command to scan the repository for monorepo performance and scaling problems. The goal is to measure the various known "dimensions of scale" and serve as a foundation for adding additional measurements as we learn more about Git monorepo scaling problems. The initial goal is to complement the scanning and analysis performed by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool. It is hoped that by creating a builtin command, we may be able to take advantage of internal Git data structures and code that is not accessible from GO to gain further insight into potential scaling problems. Co-authored-by: Derrick Stolee <[email protected]> Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
By default we will scan all references in "refs/heads/", "refs/tags/" and "refs/remotes/". Add command line opts let the use ask for all refs or a subset of them and to include a detached HEAD. Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
When 'git survey' provides information to the user, this will be presented in one of two formats: plaintext and JSON. The JSON implementation will be delayed until the functionality is complete for the plaintext format. The most important parts of the plaintext format are headers specifying the different sections of the report and tables providing concreted data. Create a custom table data structure that allows specifying a list of strings for the row values. When printing the table, check each column for the maximum width so we can create a table of the correct size from the start. The table structure is designed to be flexible to the different kinds of output that will be implemented in future changes. Signed-off-by: Derrick Stolee <[email protected]>
At the moment, nothing is obvious about the reason for the use of the path-walk API, but this will become more prevelant in future iterations. For now, use the path-walk API to sum up the counts of each kind of object. For example, this is the reachable object summary output for my local repo: REACHABLE OBJECT SUMMARY ======================== Object Type | Count ------------+------- Tags | 1343 Commits | 179344 Trees | 314350 Blobs | 184030 Signed-off-by: Derrick Stolee <[email protected]>
Now that we have explored objects by count, we can expand that a bit more to summarize the data for the on-disk and inflated size of those objects. This information is helpful for diagnosing both why disk space (and perhaps clone or fetch times) is growing but also why certain operations are slow because the inflated size of the abstract objects that must be processed is so large. Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
In future changes, we will make use of these methods. The intention is to keep track of the top contributors according to some metric. We don't want to store all of the entries and do a sort at the end, so track a constant-size table and remove rows that get pushed out depending on the chosen sorting algorithm. Co-authored-by: Jeff Hostetler <[email protected]> Signed-off-by; Jeff Hostetler <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
Since we are already walking our reachable objects using the path-walk API, let's now collect lists of the paths that contribute most to different metrics. Specifically, we care about * Number of versions. * Total size on disk. * Total inflated size (no delta or zlib compression). This information can be critical to discovering which parts of the repository are causing the most growth, especially on-disk size. Different packing strategies might help compress data more efficiently, but the toal inflated size is a representation of the raw size of all snapshots of those paths. Even when stored efficiently on disk, that size represents how much information must be processed to complete a command such as 'git blame'. Since the on-disk size is likely to be fragile, stop testing the exact output of 'git survey' and check that the correct set of headers is output. Signed-off-by: Derrick Stolee <[email protected]>
The 'git survey' builtin provides several detail tables, such as "top files by on-disk size". The size of these tables defaults to 100, currently. Allow the user to specify this number via a new --top=<N> option or the new survey.top config key. Signed-off-by: Derrick Stolee <[email protected]>
While this command is definitely something we _want_, chances are that upstreaming this will require substantial changes. We still want to be able to experiment with this before that, to focus on what we need out of this command: To assist with diagnosing issues with large repositories, as well as to help monitoring the growth and the associated painpoints of such repositories. To that end, we are about to integrate this command into `microsoft/git`, to get the tool into the hands of users who need it most, with the idea to iterate in close collaboration between these users and the developers familar with Git's internals. However, we will definitely want to avoid letting anybody have the impression that this command, its exact inner workings, as well as its output format, are anywhere close to stable. To make that fact utterly clear (and thereby protect the freedom to iterate and innovate freely before upstreaming the command), let's mark its output as experimental in all-caps, as the first thing we do. Signed-off-by: Johannes Schindelin <[email protected]>
299f3c3
to
2663d6a
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WIP
Thanks, -Stolee