feat: add custom walker support#698
Conversation
jfly
left a comment
There was a problem hiding this comment.
Thanks! This looks like a solid approach.
Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?
Accept documented camel-case walker names even when Viper normalizes table keys
What is camel-case? That's kebab-case. This is camelCase (or CamelCase).
Fix custom walker pipe handling
I did not have time to grok this commit during my review. I'll need some more handholding before I can approve it. Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").
| When you pass directory paths to `treefmt`, the walker command still runs for the tree root. | ||
| `treefmt` filters the command output to the requested directories. | ||
| The walker command doesn't need to implement path argument handling. |
There was a problem hiding this comment.
While I appreciate the simplicity of this decision, I'm not sure it's the right decision. I imagine people are most inclined to use the "format a directory" feature in very large repos, where the time spent for some VCS to determine which files are in the entire repo might be a lot slower than discovering just the files in a specific directory. #694 is an example of someone reporting just how slow it is for git to report all files in a repo.
How crazy would it be for the walk command to have to implement path argument handling? IIUC, it would be straightforward for git.
There was a problem hiding this comment.
I admit that I didn't pay too much attention to optimizing for subtrees. It's no concern for my own use cases.
For a regular git walker this might not be a problem, but as soon as you want a more custom file list - like in the first example from the PR description - implementing the filtering yourself becomes error-prone rather quickly.
Maybe we'll add a toggle so the user can switch between both methods?
| # You can also set this to the name of a configured custom walker | ||
| # Env $TREEFMT_WALK | ||
| # walk = "filesystem" | ||
| # walk = "myWalker" |
There was a problem hiding this comment.
I wonder if we should instead do walker.myWalker (or perhaps custom-walker.myWalker) here. Then we wouldn't have to worry about (or detect) conflicts with the builtin walkers.
| reader, err = NewReader(Git, root, path, db, statz) | ||
| if err != nil { | ||
| reader, err = NewReader(Jujutsu, root, path, db, statz) | ||
| if selector.Custom != nil { |
There was a problem hiding this comment.
This diff would be a lot simpler, and the if selector.Custom == nil && selector.Type == Stdin change below wouldn't have had to happen if we introduced a new selector.Type == Custom. Did you consider that?
|
Thank you for the review. I hope to be able to address the individual comments in the next few days.
I didn't really consider that because I wanted to keep the new path somewhat isolated.
I'll have to look into that. |
This sounds reasonable to me |
|
So I did somewhat of a refactor of the walker component now. Now there is a generic |
I am excited to read this. I'll be travelling for the next ~1 week, though, and don't expect to be able to take a look until I'm back. Could you rework this change as separate PR we could land before we add the custom walker support? |
|
Been on holiday myself this past week or so. Just getting up to speed with this. I'm liking the direction, need to give it some proper brain time though, as it's a core change. |
Document that treefmt does not support paths containing newlines before introducing command-backed path stream walkers.
Introduce a shared reader for walker commands that emit paths on stdout. Parse newline- and NUL-delimited records, normalize emitted paths inside the tree root, post-filter requested path filters, and convert existing regular files into walk.File values. Own the stdout and stderr pipes instead of using Cmd.StdoutPipe so the command can be waited for while Read drains stdout. Close cancels the command to unblock producers when callers stop before EOF, and treats post-cancel signal exits as expected.
Build the uncached reader in a helper and apply the cache wrapper once in NewReader. Keep auto fallback inside uncached construction so trying built-in walkers does not recurse through cache setup.
Store filesystem path filters as a slice and walk them sequentially in the filesystem reader. Use one filesystem reader for multiple requested paths instead of building one reader per path. Add coverage for multi-path filesystem walking.
Run git ls-files from the tree root and parse its NUL-delimited output with the shared path stream reader. Pass requested paths to git as path filters after -- so subtree formatting can prune the git output inside git.
Run jj file list from the tree root and parse its output with the shared path stream reader. Pass requested paths to jj after -- so paths that start with dashes are handled as path filters. Remove the generic CompositeReader now that all built-in walkers consume the same validated path-filter list.
|
I opened #705 to split of the refactoring. This PR should now be stacked on top of that, but GitHub is crappy and you can't do that. |
When no explicit path filters are passed, run `jj file list` with `.` as the positional filter. In non-colocated jj workspaces, running from a subdirectory can otherwise emit paths outside the treefmt tree root. Add a regression test for the jj-only workspace shape from numtide#704.
Document selecting a custom walker with the global walk option and defining its command and options under [walker.<name>]. Document that custom walker commands run from the tree root, receive requested path filters as positional arguments, and emit newline- or NUL-delimited path records.
Parse [walker.<name>] tables into the configuration model and validate walker names, commands, and walk values. Treat custom walker names as exact configuration keys and keep the default walk value in NewViper.
Represent walker selection as either a built-in walk type or a custom walker configuration instead of passing only the built-in enum through the format command. Keep selector fields private and expose IsBuiltin for callers that need to test for stdin.
Implement custom walkers as a small adapter around the shared path stream reader. Run the configured command from the tree root and pass validated path filters as positional arguments after configured options.
Cover selecting a custom walker from treefmt.toml, passing configured walker options, and forwarding requested path filters to the walker command. Use sandbox-compatible bash commands, exact lowercase walker names, and the long-form --clear-cache option in CLI coverage.
This adds configurable custom walkers. A custom walker runs a configured command, reads newline- or NUL-delimited paths from stdout, normalizes them relative to treefmt's tree root, and feeds existing regular files into the formatter pipeline.
This PR is stacked on #705, which adds the shared path stream reader used by custom walkers.
Example: Format a git repo + a specific submodule:
Example: Format a pijul repo: