Skip to content

feat: add custom walker support#698

Open
tobim wants to merge 12 commits into
numtide:mainfrom
tobim:push-spzzmzrpkzzq
Open

feat: add custom walker support#698
tobim wants to merge 12 commits into
numtide:mainfrom
tobim:push-spzzmzrpkzzq

Conversation

@tobim

@tobim tobim commented May 15, 2026

Copy link
Copy Markdown

This adds configurable custom walkers. A custom walker runs a configured command, reads newline- or NUL-delimited paths from stdout, normalizes them relative to treefmt's tree root, and feeds existing regular files into the formatter pipeline.

This PR is stacked on #705, which adds the shared path stream reader used by custom walkers.

Example: Format a git repo + a specific submodule:

walk = "repoPlusSubmodule"

[walker.repoPlusSubmodule]
command = "bash"
options = [
  "-c",
  '''
    submodule="path/to/submodule"
    {
      git ls-files --cached --others --exclude-standard --full-name
      git -C "$submodule" ls-files --cached --others --exclude-standard --full-name \
        | sed "s#^#$submodule/#"
    }
  ''',
]

Example: Format a pijul repo:

walk = "pijul"

[walker.pijul]
command = "pijul"
options = ["ls"]

@jfly jfly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobim, thanks for the contribution! I am supportive of this change, but haven't had a chance to read the code yet.

I wanted to warn you that #694 touches some of the same code you've touched here. It may land before this PR, apologies if the conflicts are annoying to deal with.

@jfly jfly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks like a solid approach.

Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?

Accept documented camel-case walker names even when Viper normalizes table keys

What is camel-case? That's kebab-case. This is camelCase (or CamelCase).

Fix custom walker pipe handling

I did not have time to grok this commit during my review. I'll need some more handholding before I can approve it. Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").

Comment thread docs/site/getting-started/configure.md Outdated
Comment on lines +466 to +468
When you pass directory paths to `treefmt`, the walker command still runs for the tree root.
`treefmt` filters the command output to the requested directories.
The walker command doesn't need to implement path argument handling.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate the simplicity of this decision, I'm not sure it's the right decision. I imagine people are most inclined to use the "format a directory" feature in very large repos, where the time spent for some VCS to determine which files are in the entire repo might be a lot slower than discovering just the files in a specific directory. #694 is an example of someone reporting just how slow it is for git to report all files in a repo.

How crazy would it be for the walk command to have to implement path argument handling? IIUC, it would be straightforward for git.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that I didn't pay too much attention to optimizing for subtrees. It's no concern for my own use cases.

For a regular git walker this might not be a problem, but as soon as you want a more custom file list - like in the first example from the PR description - implementing the filtering yourself becomes error-prone rather quickly.

Maybe we'll add a toggle so the user can switch between both methods?

Comment thread cmd/init/init.toml Outdated
# You can also set this to the name of a configured custom walker
# Env $TREEFMT_WALK
# walk = "filesystem"
# walk = "myWalker"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should instead do walker.myWalker (or perhaps custom-walker.myWalker) here. Then we wouldn't have to worry about (or detect) conflicts with the builtin walkers.

Comment thread config/config.go Outdated
Comment thread config/config.go Outdated
Comment thread walk/custom.go Outdated
Comment thread walk/walk.go Outdated
reader, err = NewReader(Git, root, path, db, statz)
if err != nil {
reader, err = NewReader(Jujutsu, root, path, db, statz)
if selector.Custom != nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff would be a lot simpler, and the if selector.Custom == nil && selector.Type == Stdin change below wouldn't have had to happen if we introduced a new selector.Type == Custom. Did you consider that?

Comment thread cmd/root_test.go Outdated
@tobim

tobim commented May 27, 2026

Copy link
Copy Markdown
Author

Thank you for the review. I hope to be able to address the individual comments in the next few days.

Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?

I didn't really consider that because I wanted to keep the new path somewhat isolated.
We could also replace git and jj with built in custom walkers now that the machinery is there.

Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").

I'll have to look into that.

@jfly

jfly commented May 27, 2026

Copy link
Copy Markdown
Collaborator

I didn't really consider that because I wanted to keep the new path somewhat isolated.
We could also replace git and jj with built in custom walkers now that the machinery is there.

This sounds reasonable to me

@tobim tobim force-pushed the push-spzzmzrpkzzq branch from 7e5f3af to 69c5a22 Compare June 1, 2026 07:53
@tobim

tobim commented Jun 1, 2026

Copy link
Copy Markdown
Author

So I did somewhat of a refactor of the walker component now. Now there is a generic PathStreamReader abstraction that consumes a stream of paths that are separated by either a newline or a NULL character. jj and git are now constructed in terms of that PathStreamReader.
After that are a couple of clean-up commits that simplify the state representation and related control flow.

@jfly

jfly commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

So I did somewhat of a refactor of the walker component now. Now there is a generic PathStreamReader abstraction that consumes a stream of paths that are separated by either a newline or a NULL character. jj and git are now constructed in terms of that PathStreamReader.

I am excited to read this. I'll be travelling for the next ~1 week, though, and don't expect to be able to take a look until I'm back. Could you rework this change as separate PR we could land before we add the custom walker support?

@brianmcgee

Copy link
Copy Markdown
Member

Been on holiday myself this past week or so. Just getting up to speed with this. I'm liking the direction, need to give it some proper brain time though, as it's a core change.

tobim added 5 commits June 7, 2026 15:17
Document that treefmt does not support paths containing newlines before
introducing command-backed path stream walkers.
Introduce a shared reader for walker commands that emit paths on stdout.

Parse newline- and NUL-delimited records, normalize emitted paths inside
the tree root, post-filter requested path filters, and convert existing
regular files into walk.File values.

Own the stdout and stderr pipes instead of using Cmd.StdoutPipe so the
command can be waited for while Read drains stdout. Close cancels the
command to unblock producers when callers stop before EOF, and treats
post-cancel signal exits as expected.
Build the uncached reader in a helper and apply the cache wrapper once
in NewReader.

Keep auto fallback inside uncached construction so trying built-in
walkers does not recurse through cache setup.
Store filesystem path filters as a slice and walk them sequentially in
the filesystem reader.

Use one filesystem reader for multiple requested paths instead of
building one reader per path.

Add coverage for multi-path filesystem walking.
Run git ls-files from the tree root and parse its NUL-delimited output
with the shared path stream reader.

Pass requested paths to git as path filters after -- so subtree
formatting can prune the git output inside git.
@tobim tobim force-pushed the push-spzzmzrpkzzq branch from 63f7764 to 63d350b Compare June 7, 2026 14:35
Run jj file list from the tree root and parse its output with the
shared path stream reader.

Pass requested paths to jj after -- so paths that start with dashes are
handled as path filters.

Remove the generic CompositeReader now that all built-in walkers consume
the same validated path-filter list.
@tobim tobim force-pushed the push-spzzmzrpkzzq branch from 63d350b to 871c19b Compare June 7, 2026 14:47
@tobim

tobim commented Jun 7, 2026

Copy link
Copy Markdown
Author

I opened #705 to split of the refactoring. This PR should now be stacked on top of that, but GitHub is crappy and you can't do that.

tobim added 6 commits June 7, 2026 17:44
When no explicit path filters are passed, run `jj file list` with `.`
as the positional filter. In non-colocated jj workspaces, running from
a subdirectory can otherwise emit paths outside the treefmt tree root.

Add a regression test for the jj-only workspace shape from numtide#704.
Document selecting a custom walker with the global walk option and
defining its command and options under [walker.<name>].

Document that custom walker commands run from the tree root, receive
requested path filters as positional arguments, and emit newline- or
NUL-delimited path records.
Parse [walker.<name>] tables into the configuration model and validate
walker names, commands, and walk values.

Treat custom walker names as exact configuration keys and keep the
default walk value in NewViper.
Represent walker selection as either a built-in walk type or a custom
walker configuration instead of passing only the built-in enum through
the format command.

Keep selector fields private and expose IsBuiltin for callers that need
to test for stdin.
Implement custom walkers as a small adapter around the shared path
stream reader.

Run the configured command from the tree root and pass validated path
filters as positional arguments after configured options.
Cover selecting a custom walker from treefmt.toml, passing configured
walker options, and forwarding requested path filters to the walker
command.

Use sandbox-compatible bash commands, exact lowercase walker names, and
the long-form --clear-cache option in CLI coverage.
@tobim tobim force-pushed the push-spzzmzrpkzzq branch from 871c19b to 4e1f4b8 Compare June 7, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants