Skip to content

commit-graph: use timestamp_t for max parent generation accumulator#2148

Open
newren wants to merge 1 commit into
gitgitgadget:masterfrom
newren:commit-graph-fix-ccd-uint32-truncation
Open

commit-graph: use timestamp_t for max parent generation accumulator#2148
newren wants to merge 1 commit into
gitgitgadget:masterfrom
newren:commit-graph-fix-ccd-uint32-truncation

Conversation

@newren

@newren newren commented Jun 12, 2026

Copy link
Copy Markdown

We found a few repositories in the wild with commits whose authors were apparently on a computer in the year 2120 when they recorded their commits. Apparently, in a century from now, some folks are going to have a really weird timezone as well (-13068837), though the timezone doesn't factor into this patch at all.

cc: Patrick Steinhardt ps@pks.im
cc: Derrick Stolee stolee@gmail.com

@newren newren force-pushed the commit-graph-fix-ccd-uint32-truncation branch 3 times, most recently from bdd1ae5 to 83f51ca Compare June 12, 2026 20:55
compute_reachable_generation_numbers() computes each commit's
generation as

    max(c->date, max(parent.generation)) + 1

by walking its parents and accumulating their generations into a
local

    uint32_t max_gen = 0;

while info->get_generation() returns timestamp_t and
compute_generation_from_max() already takes its max_gen parameter
as timestamp_t.  For v1 (topological levels) the narrowing is
harmless because GENERATION_NUMBER_V1_MAX is less than 2^30, but
for v2 (corrected committer dates) it silently truncates any
parent generation that does not fit in 32 bits, i.e. any parent
whose committer timestamp is at or beyond 2106-02-07 UTC
(>= 2^32).

The truncated max then causes child commits to end up with a
corrected committer date that matches the parent's instead of being
at least 1 higher.  The bad value gets written into the commit-graph
and causes problems later, and can be noticed by running `git
commit-graph verify`.

Widen the accumulator to timestamp_t.

This is solely an in-memory arithmetic fix with no on-disk format
change: the on-disk format already encodes timestamp_t values and
existing readers handle them unchanged.  This merely allows the code to
compute the correct value to write to disk.

The narrowing was introduced in 80c928d (commit-graph:
simplify compute_generation_numbers(), 2023-03-20), which rewired
v2 to use the shared compute_reachable_generation_numbers()
helper; the helper's local accumulator had been declared uint32_t
in the immediately preceding 368d19b (commit-graph: refactor
compute_topological_levels(), 2023-03-20) when only v1 was using
it, where it was harmless.

Add a new test with a future-dated parent and a present-day child;
without the above fix, `git commit-graph verify` reports the
descendant's stored generation as below parent + 1.

Signed-off-by: Elijah Newren <newren@gmail.com>
@newren newren force-pushed the commit-graph-fix-ccd-uint32-truncation branch from 83f51ca to d063a77 Compare June 14, 2026 03:57
@newren

newren commented Jun 14, 2026

Copy link
Copy Markdown
Author

/submit

@gitgitgadget

gitgitgadget Bot commented Jun 14, 2026

Copy link
Copy Markdown

Submitted as pull.2148.git.1781420271100.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2148/newren/commit-graph-fix-ccd-uint32-truncation-v1

To fetch this version to local tag pr-2148/newren/commit-graph-fix-ccd-uint32-truncation-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2148/newren/commit-graph-fix-ccd-uint32-truncation-v1

@gitgitgadget

gitgitgadget Bot commented Jun 15, 2026

Copy link
Copy Markdown

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Sun, Jun 14, 2026 at 06:57:50AM +0000, Elijah Newren via GitGitGadget wrote:
>     commit-graph: use timestamp_t for max parent generation accumulator
>     
>     We found a few repositories in the wild with commits whose authors were
>     apparently on a computer in the year 2120 when they recorded their
>     commits. Apparently, in a century from now, some folks are going to have
>     a really weird timezone as well (-13068837), though the timezone doesn't
>     factor into this patch at all.

I'd really be curious which other parts of Git will start to break once
we cross that threshold. Would it make sense if we maybe expanded our
linux-TEST-VARS job to create commits with a date beyond UINT32_MAX?
Something like the patch at the end of this mail. And yes, many tests
break with the patch applied. From all I've seen though many of those
failures are benign, even though I'd bet that there might even be some
"proper" failures in there.

Anyway, this is of course outside the scope of this patch series.

> diff --git a/commit-graph.c b/commit-graph.c
> index 9abe62bd5a..4b7156fd76 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1669,7 +1669,7 @@ static void compute_reachable_generation_numbers(
>  			struct commit *current = list->item;
>  			struct commit_list *parent;
>  			int all_parents_computed = 1;
> -			uint32_t max_gen = 0;
> +			timestamp_t max_gen = 0;
>  
>  			for (parent = current->parents; parent; parent = parent->next) {
>  				repo_parse_commit(info->r, parent->item);

This looks obviously correct.

> diff --git a/t/t5328-commit-graph-64bit-time.sh b/t/t5328-commit-graph-64bit-time.sh
> index d8891e6a92..bc651b69de 100755
> --- a/t/t5328-commit-graph-64bit-time.sh
> +++ b/t/t5328-commit-graph-64bit-time.sh
> @@ -74,6 +74,15 @@ test_expect_success 'single commit with generation data exceeding UINT32_MAX' '
>  	git -C repo-uint32-max commit-graph verify
>  '
>  
> +test_expect_success 'descendant of commit with date exceeding UINT32_MAX' '
> +	git init repo-uint32-max-descendant &&
> +	test_commit -C repo-uint32-max-descendant \
> +		--date "@4294967300 +0000" future-parent &&
> +	test_commit -C repo-uint32-max-descendant present-day-child &&
> +	git -C repo-uint32-max-descendant commit-graph write --reachable &&
> +	git -C repo-uint32-max-descendant commit-graph verify
> +'

Makes sense. Thanks!

Patrick

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 809c662124..e78902b671 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -136,12 +136,19 @@ sane_unset () {
 test_tick () {
 	if test -z "${test_tick+set}"
 	then
-		test_tick=1112911993
+		if test_bool_env GIT_TEST_FUTURE false
+		then
+			test_tick=4294697600
+			test_tick_prefix=@
+		else
+			test_tick=1112911993
+			test_tick_prefix=
+		fi
 	else
 		test_tick=$(($test_tick + 60))
 	fi
-	GIT_COMMITTER_DATE="$test_tick -0700"
-	GIT_AUTHOR_DATE="$test_tick -0700"
+	GIT_COMMITTER_DATE="$test_tick_prefix$test_tick -0700"
+	GIT_AUTHOR_DATE="$test_tick_prefix$test_tick -0700"
 	export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 }
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 4a7357b547..54798fb3f1 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -558,12 +558,26 @@ TEST_AUTHOR_LOCALNAME=author
 TEST_AUTHOR_DOMAIN=example.com
 GIT_AUTHOR_EMAIL=${TEST_AUTHOR_LOCALNAME}@${TEST_AUTHOR_DOMAIN}
 GIT_AUTHOR_NAME='A U Thor'
-GIT_AUTHOR_DATE='1112354055 +0200'
 TEST_COMMITTER_LOCALNAME=committer
 TEST_COMMITTER_DOMAIN=example.com
 GIT_COMMITTER_EMAIL=${TEST_COMMITTER_LOCALNAME}@${TEST_COMMITTER_DOMAIN}
 GIT_COMMITTER_NAME='C O Mitter'
-GIT_COMMITTER_DATE='1112354055 +0200'
+
+case "${GIT_TEST_FUTURE:-false}" in
+1|on|true|yes)
+	GIT_AUTHOR_DATE="${GIT_TEST_DATE:-@4294697300 +0200}"
+	GIT_COMMITTER_DATE="${GIT_TEST_DATE:-@4294697300 +0200}"
+	;;
+0|off|false|no)
+	GIT_AUTHOR_DATE="${GIT_TEST_DATE:-1112354055 +0200}"
+	GIT_COMMITTER_DATE="${GIT_TEST_DATE:-1112354055 +0200}"
+	;;
+*)
+	echo "GIT_TEST_FUTURE requires a boolean" >&2
+	exit 1
+	;;
+esac
+
 GIT_MERGE_VERBOSITY=5
 GIT_MERGE_AUTOEDIT=no
 export GIT_MERGE_VERBOSITY GIT_MERGE_AUTOEDIT

@gitgitgadget

gitgitgadget Bot commented Jun 15, 2026

Copy link
Copy Markdown

User Patrick Steinhardt <ps@pks.im> has been added to the cc: list.

@gitgitgadget

gitgitgadget Bot commented Jun 15, 2026

Copy link
Copy Markdown

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 6/15/26 4:11 AM, Patrick Steinhardt wrote:
> On Sun, Jun 14, 2026 at 06:57:50AM +0000, Elijah Newren via GitGitGadget wrote:
>>      commit-graph: use timestamp_t for max parent generation accumulator
>>      >>      We found a few repositories in the wild with commits whose authors were
>>      apparently on a computer in the year 2120 when they recorded their
>>      commits. Apparently, in a century from now, some folks are going to have
>>      a really weird timezone as well (-13068837), though the timezone doesn't
>>      factor into this patch at all.

>> @@ -1669,7 +1669,7 @@ static void compute_reachable_generation_numbers(
>>   			struct commit *current = list->item;
>>   			struct commit_list *parent;
>>   			int all_parents_computed = 1;
>> -			uint32_t max_gen = 0;
>> +			timestamp_t max_gen = 0;
>>   >>   			for (parent = current->parents; parent; parent = parent->next) {
>>   				repo_parse_commit(info->r, parent->item);
> > This looks obviously correct.

I agree. I was surprised this was the only necessary change, but
your message clearly describes how the timing of the patch that
delivered this change contributed to the mismatch.

Thanks,
-Stolee

@gitgitgadget

gitgitgadget Bot commented Jun 15, 2026

Copy link
Copy Markdown

User Derrick Stolee <stolee@gmail.com> has been added to the cc: list.

@gitgitgadget

gitgitgadget Bot commented Jun 15, 2026

Copy link
Copy Markdown

This patch series was integrated into seen via git@cc33e45.

@gitgitgadget gitgitgadget Bot added the seen label Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant