Inspired by #24510 , this proposal aims to be the minimal CLI implementation mentioned there. Forking off of that issue for more isolated discussion, this stdlib CLI parser should have a generally useful API regardless of its possible inclusion in "juicy main".
This CLI parser proposal is:
- minimal
- opinionated
- appeals to the general intuition of the Python
argparse family (rather than Go flag or Bash getopt)
The configuration is a struct definition:
const Args = struct {
named: struct {
// named parameter config goes here.
},
positional: []const []const u8,
};
The front-door API looks like this:
// std.cli
pub fn parse(comptime Args: type, allocator: Allocator, iter: std.process.ArgIterator, options: Options) Error!Args {
// ...
}
/// argv supports [][:0]u8, []const []const u8, etc.
pub fn parseSlice(comptime Args: type, allocator: Allocator, argv: anytype, options: Options) Error!Args {
// ...
}
pub const Options = struct {
/// Recognize --help, which will return error.Help.
help: bool = true,
/// error.InvalidArgument and error.Help will also print usage information to stderr.
print_errors: bool = true,
};
pub const Error = error{
/// Includes unrecognized option names and values that cannot be parsed into the desired value.
InvalidArgument,
/// The --help argument was given.
Help,
} || Allocator.Error;
For each field of Args.named, the field name determines the --name recognized during parsing, and the field type and possible default value determine parsing behavior.
- Field names are prefixed by
--, never single - and never /. This is true even if you name a field with a single letter, e.g. --n=100. Motivation: sometimes single - means that multiple single-letter options can be grouped together, like ls -lA, but double -- never has this ambiguity; although /-prefixed names are common on Windows, --prefixed names are also common, and CLI users will just need to deal with it.
- Field names are used verbatim; no translation between
- and _ (and no Unicode normalization). You can use @"field-name" if you really want those hyphens in there. Motivation: lower complexity with respect to name mangling; there is no name mangling.
- Any field that does not have a default value is required to be supplied on the command line. Alternate proposal: all struct fields must have a default value. Motivation for supporting required options: it's useful for e.g.
--output options, and it semantically matches struct initialization in Zig.
- Providing the same (scalar) argument multiple times overrides previous values with later values. Alternate proposal: the same (scalar) argument multiple times is an error. Motivation for override behavior: it's useful and matches the behavior that people are used to, e.g. your
git alias might include --color=auto then you can additionally give --color=never to override it.
- A lone double hyphen
-- stops recognizing hyphen-prefixed arguments for the rest of the args array. Before that, any hyphen-prefixed argument must be a recognized option name, if even if it's single-hyphen prefixed, which will never match anything.
- A parameterless option
--help is generated (unless options.help is false) that prints the help (unless options.print_errors is false) and returns an error. (Note that it is not possible to access the /// doc comments at compile time, but that would be a cool way of giving documentation on options, wouldn't it.) Any usage error will result in an error printed that concisely describes the error and a prompt to use --help for more info; usage errors will not print the full usage. Alternate proposal: also include -h as an alias, like in Python's argparse. Motivation for excluding -h: this system doesn't do any single-hyphen short aliases. The result of giving -h would be something like -h not reocngized. try --help, which still indirectly gives the user what they were looking for. Alternate proposal: remove the options and have --help print unconditionally.
- Either a space or an equal sign can separate the name and the value, e.g.
--name value or --name=value. (Any literal = in a field name, e.g. @"conf=usion" would cause a compile error.) Motivation for space separation: shell tab-completion works best on space-separated tokens, e.g. for file paths. Motivation for equals-separated: when constructing an args array to launch a child process, a single append call possibly including string concatenation is simpler than two append calls or an extend call with two items. Remember that this is a consideration for all programming languages that might call a Zig executable, not just relevant from within Zig code. Additional motivation for equals: the relationship between names and values is self-documenting, e.g. --a=b c d is more self-documenting than --a b c d.
- Parsing integers would be done with
std.fmt.parseInt() with base 0 to support 0x prefixes and such. Parsing floats would be done with std.fmt.parseFloat(), which means hex floats, NaN, -Inf, etc. would be supported. (And parsing strings would also be trivially supported.)
Departing from the "minimal" zone, but features I think are still important:
- If a field type is
bool, then parameterless --name and --no-name are generated to turn the option on and off respectively. All field names are forbidden to start with @"no-" to avoid possible collisions with generated names. Alternate proposal: --name=true and --name=false. Motivation for --name and --no-name: despite higher complexity, parameterless boolean options are more familiar, e.g. --verbose and --force for git push or mv. Additionally, it's common to want a boolean parameter in a CLI to include the word no, e.g. --no-clobber, and a negatively-named boolean is unnecessarily confusing; the code would read "if not no clobber".
- Parsing enums would match the name of the enum, never the integer value. Motivation: simple.
- Parsing slices
[]const T (other than []const u8) would mean that multiple of the same option appends to the slice, e.g. --exclude=".git" --exclude=".DS_Store". A field type []const bool would be a compile error, because it's not clear how that should be supported. Motivation for array options: it's a common feature, e.g. grep -e, gcc -I, kubectl get -l. Motivation against array options: it's significantly more complex to implement than scalar values.
- Parsing optional types
?T would not be supported, always a compile error. It could be useful to initialize optional fields to null to track whether they ever get provided on the command line, but I believe this is a misfeature. I believe it should always be possible to override previous values to restore an option to its default value, even if that means designating special values like -1 or "" for this purpose. Consider git --color=auto can explicitly override --color=always, which is more useful than auto being the behavior only when --color is never specified.
UPDATE
thanks for the discussions everyone! here are some responses to your contributions:
- Sub commands: there are many ways to do sub commands / sub parsers, and it's definitely out of scope for a minimal API. Compared to all the other features in the above proposal, sub commands are profoundly more complex (nesting the API within itself is a fairly obvious approach, which is definitely not minimal, and then it leads to all sorts of questions like whether parent arguments can be provided from within a sub command, etc.). We want people to have sub command CLI parsing, but it doesn't belong in the minimal API. (foreshadowing)
--help docs: every proposal for giving textual help has been very reasonable, and thanks yall for the suggestions. So far, I don't think any of the ideas qualify as minimal, and it seems like the idea of having struct-field-driven configuration simply can't have help docs for options in the most obvious, minimal, way, which is using the /// doc comments for the fields; it's not allowed by the Zig type system (intentionally to discourage DSL's in doc comments for meta programming). how to give a --help string still seems unresolved. let's keep discussing it.
Build.option: this minimal CLI parser is incompatible with Build.option because of the description_raw parameter not being supported here. Although having multiple ways to do the same-ish thing is frustrating, I don't see a path for unifying the two systems. This concern is still unresolved.
- user-defined
parseCLI extensibility: definitely not minimal, but a cool idea. (second foreshadowing)
There's an important idea I didn't include above that's worth articulating: This minimal API gives users a path to migrate to a more advanced API. This API has intentional limitations that are compile errors. For example, declaring a named argument with a struct type is not allowed; that could suggest that you want a sub command or a custom parser or something else, but this minimal API declares that out of bounds. This allows third-party competitor libraries to jump in and support these advanced use cases while being backward compatible with the minimal API; you can drop in a third-party replacement, everything still works, and then you can start using the third-party extended functionality right away. This is an API designed to help users abandon it.
So then what is this API even for? What counts as a "minimal" use case? I think that your contributions to this discussion so far have been very thoughtful and productive insights about CLI parsing behavior, but it's also been largely the non-minimal non-obvious behavior that is, well, worth discussing. The fact that no one has really objected to the minimal functionality originally proposed here is a clue. I think that supporting parsing integers into i32 struct fields with a matching name is a fairly obvious and unobjectionable feature, which is probably why there's been no discussion of it, and those kinds of obvious features are the minimal design.
Parsing into []const T is for sure a useful feature; i don't think there's really any disagreement on that; the question for this discussion thread is whether it's a minimal feature. Sub commands are also useful, but are not minimal. Parsing enums is minimal; parsing unions is not; etc.
Inspired by #24510 , this proposal aims to be the minimal CLI implementation mentioned there. Forking off of that issue for more isolated discussion, this stdlib CLI parser should have a generally useful API regardless of its possible inclusion in "juicy main".
This CLI parser proposal is:
argparsefamily (rather than Goflagor Bashgetopt)The configuration is a struct definition:
The front-door API looks like this:
For each field of
Args.named, the field name determines the--namerecognized during parsing, and the field type and possible default value determine parsing behavior.--, never single-and never/. This is true even if you name a field with a single letter, e.g.--n=100. Motivation: sometimes single-means that multiple single-letter options can be grouped together, likels -lA, but double--never has this ambiguity; although/-prefixed names are common on Windows,--prefixed names are also common, and CLI users will just need to deal with it.-and_(and no Unicode normalization). You can use@"field-name"if you really want those hyphens in there. Motivation: lower complexity with respect to name mangling; there is no name mangling.--outputoptions, and it semantically matches struct initialization in Zig.gitalias might include--color=autothen you can additionally give--color=neverto override it.--stops recognizing hyphen-prefixed arguments for the rest of the args array. Before that, any hyphen-prefixed argument must be a recognized option name, if even if it's single-hyphen prefixed, which will never match anything.--helpis generated (unlessoptions.helpisfalse) that prints the help (unlessoptions.print_errorsisfalse) and returns an error. (Note that it is not possible to access the///doc comments at compile time, but that would be a cool way of giving documentation on options, wouldn't it.) Any usage error will result in an error printed that concisely describes the error and a prompt to use--helpfor more info; usage errors will not print the full usage. Alternate proposal: also include-has an alias, like in Python'sargparse. Motivation for excluding-h: this system doesn't do any single-hyphen short aliases. The result of giving-hwould be something like-h not reocngized. try --help, which still indirectly gives the user what they were looking for. Alternate proposal: remove the options and have--helpprint unconditionally.--name valueor--name=value. (Any literal=in a field name, e.g.@"conf=usion"would cause a compile error.) Motivation for space separation: shell tab-completion works best on space-separated tokens, e.g. for file paths. Motivation for equals-separated: when constructing anargsarray to launch a child process, a single append call possibly including string concatenation is simpler than two append calls or an extend call with two items. Remember that this is a consideration for all programming languages that might call a Zig executable, not just relevant from within Zig code. Additional motivation for equals: the relationship between names and values is self-documenting, e.g.--a=b c dis more self-documenting than--a b c d.std.fmt.parseInt()with base0to support0xprefixes and such. Parsing floats would be done withstd.fmt.parseFloat(), which means hex floats,NaN,-Inf, etc. would be supported. (And parsing strings would also be trivially supported.)Departing from the "minimal" zone, but features I think are still important:
bool, then parameterless--nameand--no-nameare generated to turn the option on and off respectively. All field names are forbidden to start with@"no-"to avoid possible collisions with generated names. Alternate proposal:--name=trueand--name=false. Motivation for--nameand--no-name: despite higher complexity, parameterless boolean options are more familiar, e.g.--verboseand--forceforgit pushormv. Additionally, it's common to want a boolean parameter in a CLI to include the wordno, e.g.--no-clobber, and a negatively-named boolean is unnecessarily confusing; the code would read "if not no clobber".[]const T(other than[]const u8) would mean that multiple of the same option appends to the slice, e.g.--exclude=".git" --exclude=".DS_Store". A field type[]const boolwould be a compile error, because it's not clear how that should be supported. Motivation for array options: it's a common feature, e.g.grep -e,gcc -I,kubectl get -l. Motivation against array options: it's significantly more complex to implement than scalar values.?Twould not be supported, always a compile error. It could be useful to initialize optional fields tonullto track whether they ever get provided on the command line, but I believe this is a misfeature. I believe it should always be possible to override previous values to restore an option to its default value, even if that means designating special values like-1or""for this purpose. Considergit --color=autocan explicitly override--color=always, which is more useful thanautobeing the behavior only when--coloris never specified.UPDATE
thanks for the discussions everyone! here are some responses to your contributions:
--helpdocs: every proposal for giving textual help has been very reasonable, and thanks yall for the suggestions. So far, I don't think any of the ideas qualify as minimal, and it seems like the idea of having struct-field-driven configuration simply can't have help docs for options in the most obvious, minimal, way, which is using the///doc comments for the fields; it's not allowed by the Zig type system (intentionally to discourage DSL's in doc comments for meta programming). how to give a--helpstring still seems unresolved. let's keep discussing it.Build.option: this minimal CLI parser is incompatible withBuild.optionbecause of thedescription_rawparameter not being supported here. Although having multiple ways to do the same-ish thing is frustrating, I don't see a path for unifying the two systems. This concern is still unresolved.parseCLIextensibility: definitely not minimal, but a cool idea. (second foreshadowing)There's an important idea I didn't include above that's worth articulating: This minimal API gives users a path to migrate to a more advanced API. This API has intentional limitations that are compile errors. For example, declaring a named argument with a struct type is not allowed; that could suggest that you want a sub command or a custom parser or something else, but this minimal API declares that out of bounds. This allows third-party competitor libraries to jump in and support these advanced use cases while being backward compatible with the minimal API; you can drop in a third-party replacement, everything still works, and then you can start using the third-party extended functionality right away. This is an API designed to help users abandon it.
So then what is this API even for? What counts as a "minimal" use case? I think that your contributions to this discussion so far have been very thoughtful and productive insights about CLI parsing behavior, but it's also been largely the non-minimal non-obvious behavior that is, well, worth discussing. The fact that no one has really objected to the minimal functionality originally proposed here is a clue. I think that supporting parsing integers into
i32struct fields with a matching name is a fairly obvious and unobjectionable feature, which is probably why there's been no discussion of it, and those kinds of obvious features are the minimal design.Parsing into
[]const Tis for sure a useful feature; i don't think there's really any disagreement on that; the question for this discussion thread is whether it's a minimal feature. Sub commands are also useful, but are not minimal. Parsing enums is minimal; parsing unions is not; etc.