Improved C# support#1258
Conversation
|
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D94396824. (Because this pull request was imported automatically, there will not be any future comments.) |
|
Hello @KapJI. You looked at some of my PRs in the past, can you take a look at this one? |
38eb284 to
28ddbf5
Compare
|
Hello @jtbraun, sorry about the random ping. Can you look at this PR? |
jtbraun
left a comment
There was a problem hiding this comment.
I wouldn't say this review's comprehensive. My biggest concerns are the auto-finding of tools, and how it exposes you to different configs across different machines. It wasn't clear to me what providing the roslyn tools vs a csc directly on the toolchain gives you, who wins, why?
| # Run the C# compiler to produce the output artifact. | ||
| ctx.actions.run(cmd, category = "csharp_compile") | ||
|
|
||
| return DotNetLibraryInfo( |
There was a problem hiding this comment.
It's not really a DotNetLibraryInfo if we're producing an "exe" anymore. Unless you can use the "exe" output ANYWHERE you could use the "library" output, this should be a DotNetExecutableInfo or something.
There was a problem hiding this comment.
This should be fixed now. See the comment below.
|
|
||
| return [ | ||
| DefaultInfo(default_output = dotNetLibraryInfo.object), | ||
| RunInfo(args = cmd_args(dotNetLibraryInfo.object)), |
There was a problem hiding this comment.
I suppose if you NEVER export the dotNetLibraryInfo from here, it's sortof okay (see dotNetExecutableInfo` above). Still, I think they represent two different things, and should be separate. Is there precedence elsewhere in the prelude for this sort of thing?
I'd expect if you can use exectuables where you can use libraries, but not the other way around, that executables would actually expose both providers, in addition to the DefaultInfo and RunInfo here.
There was a problem hiding this comment.
I had used DotNetLibraryInfo just to make it easier to implement but I've now changed the code so the common code only returns an Artifact and a list of DllDepTSet and the DotNetLibraryInfo is only created in the csharp_library.
I don't think you can use executables where you can use libraries.
| def _resources_arg(): | ||
| return { | ||
| "resources": attrs.dict(key = attrs.string(), value = attrs.source(), sorted = False, default = {}, doc = """ | ||
| Resources that should be embedded within the built DLL. The format |
There was a problem hiding this comment.
Does this indenting pattern match other attribute values (like cxx_common) in the prelude already?
There was a problem hiding this comment.
I think the indentation is different everywhere :). I've just updated the indentation to match the one in _RUST_EXECUTABLE_ATTRIBUTES. Let me know if you'd like a different one.
| the value being the resource that should be merged. This allows | ||
| non-unique keys to be identified quickly. |
There was a problem hiding this comment.
What do you mean by "identified quickly"? Who is doing said identification, and what is the response? As far as the rule is concerned, there won't be duplicated keys, by definition of this being a dict.
Are these "string resources" or "non-string resources": https://learn.microsoft.com/en-us/dotnet/core/extensions/create-resource-files#resources-in-text-files
As a user, I always appreciate details about how this attribute maps to .net tool concepts.
There was a problem hiding this comment.
I see now that this was just moved from below. Fine.
There was a problem hiding this comment.
Yes, I don't know what resources are, it was just moved code.
| The version of the .Net framework that this library targets. This is | ||
| one of 'net35', 'net40', 'net45' and 'net46'. |
There was a problem hiding this comment.
I'm not 100% familiar with .net. Are these canonical strings that the .Net runtime goes by, or just "common shorthand"?
AH, I see, much of this was moved...
There was a problem hiding this comment.
They are defined in dotnet_common so no, I don't think they are canonical strings used by the .net runtime
| if ctx.attrs.use_path_compilers: | ||
| csc_exe_script = "csc.exe" |
There was a problem hiding this comment.
This sort of thing is a bad idea. You have no control now over who has what csc.exe installed, what version, etc. I wouldn't do this. Is there precedent?
|
|
||
|
|
||
| def find_with_vswhere_exe(): | ||
| def run_vswhere(required_component): |
There was a problem hiding this comment.
Typing :string on the argument. Typing everwhere really...
There was a problem hiding this comment.
I've added typing. I'm not sure that's a great idea because it adds minimum Python version requirements. But the script already had the type annotations list[Path] which I think require Python 3.9 and, with from __future__ import annotations hopefully it still works with Python 3.9. But I'm very far from an expert in Python type annotations.
| ) | ||
| sys.exit(1) | ||
|
|
||
| vswhere_exe = ( |
There was a problem hiding this comment.
Though this does set precedent for using the "one from path". I'd still avoid that, and require (if you can) that we still discover csc through there vswhere/whatever tools, and (optionally) pull THOSE from PATH.
Also keep in mind that this runs in the context of the daemon, NOT the client. So PATH may be from some preexisting client command in another terminal with a different starting environment. I don't believe we do a lot of sanitizing of the environment, so depending on it is risky.
There was a problem hiding this comment.
Ideally, since you added vswhere on the toolchain, we'd pass that to this and call the vswhere on the toolchain, not just "whatever random one we can find".
| if not host_info().os.is_windows: | ||
| fail("csharp toolchain only supported on windows for now") | ||
|
|
||
| csc = ctx.attrs._csharp_tools_info[CSharpToolchainInfo].csc if ctx.attrs.csc == None else ctx.attrs.csc |
There was a problem hiding this comment.
csc = ctx.attrs.csc or thing.from.toolchain works too
| "csc": attrs.option(attrs.string(), default = None, doc = "Executable name or a path to the C# compiler frequently referred to as csc.exe"), | ||
| "framework_dirs": attrs.dict(key = attrs.string(), value = attrs.one_of(attrs.source(), attrs.string()), doc = "Dictionary of .NET framework assembly directories, where each key is a supported value in `framework_ver` and the value is a path to a directory containing .net assemblies such as System.dll matching the given framework version"), | ||
| "_csharp_tools_info": attrs.exec_dep(providers = [CSharpToolchainInfo], default = "prelude//toolchains/msvc:roslyn_tools"), |
There was a problem hiding this comment.
I'm not sure how to interpret roslyn_tools being provided by default here AND the csc being separate. Which should be used, when, why?
There was a problem hiding this comment.
I've fixed all the other issues and the only one remaining is this one. Which is the same as the comment #1258 (comment) and #1258 (comment).
The behavior should be the same as the one in C++. As for usage, in my own usage (I only use the C++ rules currently because the C# ones are not implemented yet but the behavior should be the same) I use the below toolchains:
def is_remote_enabled() -> bool:
re_enabled = read_config("buck2_re_client", "enabled", "false")
return re_enabled == "true"
find_msvc_tools(
name = "msvc_tools",
use_path_compilers = is_remote_enabled(),
use_path_linkers = is_remote_enabled() and not host_info().os.is_windows,
target_compatible_with = ["config//os:windows"],
visibility = ["PUBLIC"],
)
path_clang_tools(
name = "clang_tools",
target_compatible_with = ["config//os:linux"],
visibility = ["PUBLIC"],
)
cxx_tools_info_toolchain(
cxx_tools_info = select({
"config//os:windows": ":msvc_tools",
"config//os:linux": ":clang_tools",
}),
...
Basically the PATH tool lookup is used on RE and local Linux (where g++ and the other tools are in the path by default) and the find_msvc_tools for the local execution to work.
How would you setup the toolchains? My main goal was that buck2 build :main worked every machine, with or without RE, without the developer having to launch any special command line.
There was a problem hiding this comment.
Ah. No, the missing thing here is that the doc comment on the csc attribute just needs to say "If not provided, the csc compiler is inferred from the CSharpToolchainInfo provided in the _csharp_tools_info attribute.
I'll also note that I'm not at all familiar with our csharp support, csharp in general, or who is using it inside or outside the company. Github's diff didn't make it very clear what lines were moved vs created, so I may have commented on things that have been around for a long, long time. I'm also a little concerned about backcompat, assuming that there are users today of these rules, we don't want to break them. |
|
Thanks a lot for the review.
Here https://buck2.build/docs/about/language_support/ it says C# is "Unavailable". So I don't think there are many users of it.
Either way, I think it is reasonably backwards compatible. The only change that is not is that if the user has a csc.exe in the PATH that is not the same as the one reported by vswhere.py, it will start using the vswhere.py one. But I think this change is desirable because: C# is not really working at the moment so I doubt it is used much, this makes it so it works without any special PATH configuration and bring it in line with what the C++ rules do. Either way, if the user really wants to use the csc.exe in the path they can add the "csc" = "csc.exe" attribute so it uses the csc.exe in the PATH. I've responded to every comment (hopefully) @jtbraun. Let me know what you think and if I've missed any comment. |
|
Hello @jtbraun, can you take a look at my changes again? |
bafa538 to
9bdac41
Compare
|
@jtbraun Sorry to ping you again. Can you take a look at the changes I made to address your comments? Thanks |
…lations. This works on Windows and is the same behavior that the C++ rules already have
…h older Python versions
prelude\toolchains\msvc\vswhere.py:56: error: Argument 1 to "Tool" has incompatible type "str"; expected "Path" [arg-type]
prelude\toolchains\msvc\vswhere.py:277: error: Argument 1 to "check_output" has incompatible type "list[object]"; expected "str | bytes | PathLike[str] | PathLike[bytes] | Sequence[str | bytes | PathLike[str] | PathLike[bytes]]" [arg-type]
9bdac41 to
9864ad3
Compare
|
I had dropped all the commits fixing the issues highlighted by jtbraun in my latest rebase by mistake. I've just fixed that. |
|
Hello @ndmitchell, you reviewed some of my PRs in the past, can you take a look at this one? |
|
Sorry, all the automated mails go into a folder I don't look at often. I can take another look. |
|
|
||
| new_runtime_files = [] | ||
| for child in child_deps: | ||
| new_runtime_files.extend([ctx.actions.symlink_file(dll_dep.reference.basename, dll_dep.reference) for dll_dep in child.traverse()]) |
There was a problem hiding this comment.
There's a hazard here that can't be directly fixed. In a very large project, you may have MANY entries here, and the size of this can blow up. You do have to enumerate these at some point and create the symlinks, though.
Another approach is to farm out the symlink creation to a script/tool/whatever that would create all the symlinks for you, and instead of calling child.traverse() here, you'd give that script the result of a "projection" on the transitive set. In your implementation, new_runtime_files will be the size of the graph represented by the TSet. In the other approach, the projection is just a refernce to the TSet root that says "run the X projection", and the TSet is only walked when the action that runs the script is run. The python_binary rules do this sort of thing to avoid building these (huge) lists in prelude/python/python.bzl, in the args_projections. These are later used to build manifest files or symlink trees (like you're doing here) with the help of external commands that take the list of arguments as inputs in some form. The python rules used .traverse() at one time, consuming single-digit gigabytes of memory for each python_binary target.
This isn't functionally wrong, but anywhere you can avoid copying the contents of a TSet is an opportunity to not waste memory. Because artifacts need to be declared, doing this work with .symlink_file has to do this, I think. I don't believe there's a way currently to say "build up a bunch of symlinks here by walking this TSet", hence the external tools that take the list of symlinks as arguments in some form and create them.
There was a problem hiding this comment.
I figured the optimal way would be to avoid traverse but I didn't see a way of doing that. I guess using a separate tool would work but it seems harder to maintain rather than using Starlark's native symlink_file.
Could we merge this like this and if people start raising issues with this taking a lot of memory we can implement this external tool improvement? With any luck by that time there will be a better way of doing this.
| _CSHARP_LIBRARY_OR_EXE_ATTRIBUTES = { | ||
| "srcs": attrs.list(attrs.source(), default = [], doc = """ | ||
| The set of C# source files to be compiled, and assembled by this rule. | ||
| Each element must a string specifying a source file. |
There was a problem hiding this comment.
"must a string specifying a source file" is wrong. You're missing "be", and it isn't a string specifying a source file in all instances...
attrs.source entries can be target names, too. Imagine you have a script that generates a source file from some typing information (like an *.idl file). This would be a reference to an output of that target. https://buck2.build/docs/api/build/attrs/#source
There was a problem hiding this comment.
Fixed, basically copied the sentences from that documentation link
| "resources": attrs.dict(key = attrs.string(), value = attrs.source(), sorted = False, default = {}, doc = """ | ||
| Resources that should be embedded within the built DLL. The format | ||
| is the name of the resource once mapped into the DLL as the key, and | ||
| the value being the resource that should be merged. This allows | ||
| non-unique keys to be identified quickly. |
There was a problem hiding this comment.
What does "once mapped" mean?
What does it mean for a resource to be "merged"?
"This allows non-unique keys to be identified quickly"... this is pertinent to the operation of the end binary rule, but it doesn't mean much to the user. You might instead say:
"The X rule merges the resource attributes of all of it's transitive dependencies. Duplicated key/value pairs are ignored, while duplicate keys with inequal values are errors"... or whatever the rules do so the user can reason about it.
There was a problem hiding this comment.
I've removed the resources attribute since it has not been implemented yet.
| The version of the .Net framework that this library targets. This is | ||
| one of 'net35', 'net40', 'net45' and 'net46'. |
There was a problem hiding this comment.
You don't want to have to keep this list up to date. Either generate the string here from FrameworkVersion, or just say "one of the values from the FrameworkVersion global". The latter is easier, the former is nicer for the user.
There was a problem hiding this comment.
I've generated the list from FrameworkVersion
| The set of targets or system-provided assemblies to rely on. Any | ||
| values that are targets must be either csharp\\_library or `prebuilt_dotnet_library` | ||
| instances. | ||
| """), |
There was a problem hiding this comment.
"to rely on" => "this target depends on."
What does it mean for this to be an attrs.string()? That seems wrong.
If you depend on a particular provider being in here, your attrs.dep() should have the providers=[...] listed, presumably DotNetLibraryInfo?
There was a problem hiding this comment.
The attrs.string is for "system-provided assemblies" like System.IO.dll
| load(":dotnet_common.bzl", "FrameworkVersion") | ||
|
|
||
| FrameworkVersion = ["net35", "net40", "net45", "net46"] | ||
| _CSHARP_LIBRARY_OR_EXE_ATTRIBUTES = { |
There was a problem hiding this comment.
All of the comments below are almost all just improvements to what was already pre-existing. While we can fix it, we should. If you're not certain about this, when we go to merge it I can make edits, too.
There was a problem hiding this comment.
I think I've fixed all the issues except for the traverse. I've also left the resources implementation for a possible future PR.
| and dependencies by invoking csc. | ||
| """, | ||
| examples = """ | ||
| ``` |
There was a problem hiding this comment.
I think you can do "```python" here to get python syntax highlighting in the generated docs
| the dll exactly. When this is not set, the dll will be named after | ||
| the short name of the target. |
There was a problem hiding this comment.
"the dll filename will be the target's name attribute, suffixed with...X". I think X depends on whether it's an exe or dll? Or will this always be dll?
There was a problem hiding this comment.
Thanks for noticing, I had just copied that from csharp_library, I've fixed it now by replacing dll with executable.
| attrs = { | ||
| "run_msvc_tool": attrs.default_only(attrs.dep(providers = [RunInfo], default = "prelude//toolchains/msvc:run_msvc_tool")), | ||
| "use_path_compilers": attrs.bool(default = False), | ||
| "vswhere": attrs.default_only(attrs.dep(providers = [RunInfo], default = "prelude//toolchains/msvc:vswhere")), |
There was a problem hiding this comment.
No other rule() declaration has that "sharing" of attrs.
That's not true? You yourself are sharing attribute definitions between csharp_library and csharp_binary with | _CSHARP_LIBRARY_OR_EXE_ATTRIBUTES in their prelude_rule() calls, rule() works the same, attrs is just a list of attribute declarations.
| "csc": attrs.option(attrs.string(), default = None, doc = "Executable name or a path to the C# compiler frequently referred to as csc.exe"), | ||
| "framework_dirs": attrs.dict(key = attrs.string(), value = attrs.one_of(attrs.source(), attrs.string()), doc = "Dictionary of .NET framework assembly directories, where each key is a supported value in `framework_ver` and the value is a path to a directory containing .net assemblies such as System.dll matching the given framework version"), | ||
| "_csharp_tools_info": attrs.exec_dep(providers = [CSharpToolchainInfo], default = "prelude//toolchains/msvc:roslyn_tools"), |
There was a problem hiding this comment.
Ah. No, the missing thing here is that the doc comment on the csc attribute just needs to say "If not provided, the csc compiler is inferred from the CSharpToolchainInfo provided in the _csharp_tools_info attribute.
I've just made all the new changes you requested (hopefully). The only issue remaining is tset |
|
Hello @ndmitchell, can you do your magic and summon @jtbraun to take another look at this PR please? The notification e-mail might have gotten lost in @jtbraun 's mailbox again. Thanks. |
|
I saw the comments earlier about backwards compatibility and that may not be that important. I can add that csharp internally is used but the open source csharp rules here are not really used to a large extent. But maybe you'd want to get this landed, then do another pr on top with whatever cleanup you want to get rid of the backwards compatibility and we can see what ci says about that one internally separately. |
Are you saying I should change this PR to remove the (very small in my opinion) backwards incompatibility? Or that we can make another PR after this one to clean up some hacky code that was used to maintain backwards compatibility? If I remember correctly, and looking at the previous comments, the only difference in behavior will be if you have an installation with two csc.exe's and vswhere.exe now finds a different one than the first one used in the PATH. It might also break for users with broken Visual Studio installations that don't have a working vswhere.exe for some reason. So in my opinion the backwards incompatibility is very small and the final state is not "hacky" due to backwards compatibility so there would be no need for a further cleanup. Do you think the approach should be another one? |
|
@jtbraun Pinging you again in the hopes you see the notification :) |
|
I meant that the backwards incompatibility is most likely fine. We'll see when we run internal ci. I can try to review this soon |
Most importantly:
I used the following example project to test these changes:
https://github.com/Overhatted/Buck2_CSharp_Example
There are two examples/tests there:
buck2 run :mainis a csharp_binary with a 3rd party dependency fetched from nuget.org and the csc.exe compiler found "automatically" using vswhere. It should print '{"Name":"Apple","Expiry":"2008-12-28T00:00:00","Price":3.99,"Sizes":["Small","Medium","Large"]}'. It tests that the runtime dependencies also work because newtonsoft is in a separate DLL.buck2 run :main_generator_usertests that code generation using csharp_binary also works