Add an LLM policy for rust-lang/rust#1040
Conversation
|
r? @jieyouxu rustbot has assigned @jieyouxu. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
@rustbot label T-libs T-compiler T-rustdoc T-bootstrap |
## Summary [summary]: #summary This document establishes a policy for how LLMs can be used when contributing to `rust-lang/rust`. Subtrees, submodules, and dependencies from crates.io are not in scope. Other repositories in the `rust-lang` organization are not in scope. This policy is intended to live in [Forge](https://forge.rust-lang.org/) as a living document, not as a dead RFC. It will be linked from `CONTRIBUTING.md` in rust-lang/rust as well as from the rustc- and std-dev-guides. ## Moderation guidelines This PR is preceded by [an enormous amount of discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/channel/588130-project-llm-policy). Almost every conceivable angle has been discussed to death; there have been upwards of 3000 messages, not even counting discussion on GitHub. We initially doubted whether we could reach consensus at all. Therefore, we ask to bound the scope of this PR specifically to the policy itself. In particular, we mark several topics as out of scope below. We still consider these topics to be important, we simply do not believe this is the right place to discuss them. No comment on this PR may mention the following topics: - Long-term social or economic impact of LLMs - The environmental impact of LLMs - Anything to do with the copyright status of LLM output - Moral judgements about people who use LLMs We have asked the moderation team to help us enforce these rules. ## Feedback guidelines We are aware that parts of this policy will make some people very unhappy. As you are reading, we ask you to consider the following. - Can you think of a *concrete* improvement to the policy that addresses your concern? Consider: - Whether your change will make the policy harder to moderate - Whether your change will make it harder to come to a consensus - Does your concern need to be addressed before merging or can it be addressed in a follow-up? - Keep in mind the cost of *not* creating a policy. ### If your concern is for yourself or for your team - What are the *specific* parts of your workflow that will be disrupted? - In particular we are *only* interested in workflows involving `rust-lang/rust`. Other repositories are not affected by this policy and are therefore not in scope. - Can you live with the disruption? Is it worth blocking the policy over? --- Previous versions of this document were discussed on Zulip, and we have made edits in responses to suggestions there. ## Motivation [motivation]: #motivation - Many people find LLM-generated code and writing deeply unpleasant to read or review. - Many people find LLMs to be a significant aid to learning and discovery. - `rust-lang/rust` is currently dealing with a deluge of low-effort "slop" PRs primarily authored by LLMs. - Having *a* policy makes these easier to moderate, without having to take every single instance on a case-by-case basis. This policy is *not* intended as a debate over whether LLMs are a good or bad idea, nor over the long-term impact of LLMs. It is only intended to set out the future policy of `rust-lang/rust` itself. ## Drawbacks [drawbacks]: #drawbacks - This bans some valid usages of LLMs. We intentionally err on the side of banning too much rather than too little in order to make the policy easy to understand and moderate. - This intentionally does not address the moral, social, and environmental impacts of LLMs. These topics have been extensively discussed on Zulip without reaching consensus, but this policy is relevant regardless of the outcome of these discussions. - This intentionally does not attempt to set a project-wide policy. We have attempted to come to a consensus for upwards of a month without significant process. We are cutting our losses so we can have *something* rather than adhoc moderation decisions. - This intentionally does not apply to subtrees of rust-lang/rust. We don't have the same moderation issues there, so we don't have time pressure to set a policy in the same way. ## Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives - We could create a project-wide policy, rather than scoping it to `rust-lang/rust`. This has the advantage that everyone knows what the policy is everywhere, and that it's easy to make things part of the mono-repo at a later date. It has the disadvantage that we think it is nigh-impossible to get everyone to agree. There are also reasons for teams to have different policies; for example, the standard for correctness is much higher within the compiler than within Clippy. - We could have a more strict policy that removes the [threshold of originality](https://fsfe.org/news/2025/news-20250515-01.en.html) condition. This has the advantage that our policy becomes easier to moderate and understand. It has the disadvantage that it becomes easy for people to intend to follow the policy, but be put in a position where their only choices are to either discard the PR altogether, rewrite it from scratch, or tell "white lies" about whether an LLM was involved. - We could have a more strict policy that bans LLMs altogether. It seems unlikely we will be able to agree on this, and we believe attempting it will cause many people to leave the project. ## Prior art [prior-art]: #prior-art This prior art section is taken almost entirely from [Jane Lusby's summary of her research](rust-lang/leadership-council#273 (comment)), although we have taken the liberty of moving the Rust project's prior art to the top. We thank her for her help. ### Rust - [Moderation team's spam policy](https://github.com/rust-lang/moderation-team/blob/main/policies/spam.md/#fully-or-partially-automated-contribs) - [Compiler team's "burdensome PRs" policy](rust-lang/compiler-team#893) ### Other organizations These are organized along a spectrum of AI friendliness, where top is least friendly, and bottom is most friendly. - full ban - [postmarketOS](https://docs.postmarketos.org/policies-and-processes/development/ai-policy.html) - also explicitly bans encouraging others to use AI for solving problems related to postmarketOS - multi point ethics based rational with citations included - [zig](https://ziglang.org/code-of-conduct/) - philosophical, cites [Profession (novella)](https://en.wikipedia.org/wiki/Profession_(novella)) - rooted in concerns around the construction and origins of original thought - [servo](https://book.servo.org/contributing/getting-started.html#ai-contributions) - more pragmatic, directly lists concerns around ai, fairly concise - [qemu](https://www.qemu.org/docs/master/devel/code-provenance.html#use-of-ai-content-generators) - pragmatic, focuses on copyright and licensing concerns - explicitly allows AI for exploring api, debugging, and other non generative assistance, other policies do not explicitly ban this or mention it in any way - allowed with supervision, human is ultimately responsible - [scipy](https://github.com/scipy/scipy/pull/24583/changes) - strict attribution policy including name of model - [llvm](https://llvm.org/docs/AIToolPolicy.html) - [blender](https://devtalk.blender.org/t/ai-contributions-policy/44202) - [linux kernel](https://kernel.org/doc/html/next/process/coding-assistants.html) - quite concise but otherwise seems the same as many in this category - [mesa](https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/docs/submittingpatches.rst) - framed as a contribution policy not an AI policy, AI is listed as a tool that can be used but emphasizes same requirements that author must understand the code they contribute, seems to leave room for partial understanding from new contributors. > Understand the code you write at least well enough to be able to explain why your changes are beneficial to the project. - [forgejo](https://codeberg.org/forgejo/governance/src/branch/main/AIAgreement.md) - bans AI for review, does not explicitly require contributors to understand code generated by ai. One could interpret the "accountability for contribution lies with contributor even if AI is used" line as implying this requirement, though their version seems poorly worded imo. - [firefox](https://firefox-source-docs.mozilla.org/contributing/ai-coding.html) - [ghostty](https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md) - pro-AI but views "bad users" as the source of issues with it and the only reason for what ghostty considers a "strict AI policy" - [fedora](https://communityblog.fedoraproject.org/council-policy-proposal-policy-on-ai-assisted-contributions/) - clearly inspired and is cited by many of the above, but is definitely framed more pro-ai than the derived policies tend to be - [curl](https://curl.se/dev/contribute.html#on-ai-use-in-curl) - does not explicitly require humans understand contributions, otherwise policy is similar to above policies - [linux foundation](https://www.linuxfoundation.org/legal/generative-ai) - encourages usage, focuses on legal liability, mentions that tooling exists to help automate managing legal liability, does not mention specific tools - In progress - NixOS - NixOS/nixpkgs#410741 ## Unresolved questions [unresolved-questions]: #unresolved-questions See the "Moderation guidelines" and "Drawbacks" section for a list of topics that are out of scope.
There was a problem hiding this comment.
I really like this version, and thanks a ton for working on it. Specifically:
- It doesn't try to dump entire walls of text, which is unfortunately a good way to be sure nobody reads it. Instead, it gives you concrete examples, and a guiding rule-of-thumb for uncovered scenarios, and acknowledges upfront that it surely cannot be exhaustive.
- I also like where it points out the nuance and recognizes the uncertainties.
- I like that it covers both "producers" and "consumers" (with nuance that reviewers can also technically use LLMs in ways that are frustrating to the PR authors!)
I left a few suggestions / nits, but even without them this is still a very good start IMO.
(Will not leave an explicit approval until we establish wider consensus, which likely will take the form of 4-team joint FCP.)
|
The links to Zulip are project-private, FWIW. |
I'm aware. This PR is targeted towards Rust project members moreso than the broad community. |
|
Extensive thanks to @jyn514 both for a thoughtful policy and for carefully taking feedback into account. Thank you as well to @jackh726 for detailed and nuanced conversations on where to draw boundaries, and for multiple inspired ideas on how to find well-balanced solutions. @rfcbot resolved require-disclosure-even-for-draft-prs @rfcbot reviewed 🚢 |
|
I would also like to say thank you already to the incredible number people who have reviewed this policy and found it acceptable! 25 checked boxes already 🎁 (-> Convenience link to jump to the checkboxes without pressing "Load more" many many times <-) Despite this progress, rfcbot will1 actually treat this multi-team FCP proposal naively as a plain union of all members of all included teams, the final comment period will not start automatically unless N-2 boxes are checked in total. This stands in contrast to the observation that each team taken separately - i.e. FCPing independently from each other - would have each already met the quorum (as we had manually tracked in this comment), and currently there are not any remaining concerns registered either.
Edit: Apparently this plea wasn't a desired approach to solve this, so nevermind the suggestion. Footnotes
|
|
@steffahn i would like to not ask people to check a box if they don't wish to. i believe Cargo's workaround for this is to have separate issues for each team with a separate FCP for each; i think that would work here. |
|
Since this is now effectively in FCP: I consider this policy "frozen". I do not plan to make any additional changes until the PR is merged, unless someone on the FCP raises a concern that would prevent it from merging. |
I've had people ask me what changes were made, that this has the appearance of back-channeling. First, all changes to this PR are public. You can see the changes I made in response to Josh's comments here: https://github.com/rust-lang/rust-forge/pull/1040/changes/f716356205795cbca88af27f6bf4cc5908b6006e..f692c4786ae580c13751144c2b05e2f4a41e9007 The summary of those changes is:
|
|
Yeah, to be clear, per my summary: #1040 (comment) We do have the N-2 from each team, but would want to split the FCP into parts. |
|
Given the recent (and non-trivial) changes to the policy after all the boxes had been checked I think it's worth giving this a bit of additional time over the normal FCP period just to make sure everyone is aware and had the time to review those changes. I realise this is somewhat non-standard (atleast with FCPs im usually on), usually things do get tweaked a bit after FCPs start and it's kind of the point of an FCP to find things that need changing and then change them after the FCP started. However, in this case:
@rfcbot concern wait-time-for-josh-policy-changes I don't know how long is good to wait here, maybe a week or two before going into the final comment period again? |
|
Thanks @BoxyUwU for hitting the pause button here. I agree that the recent changes are large and I have substantial concerns with them.
That's a material difference in policy to be thrown in at the last minute. Are we actually going to ban merging a category of PRs for ten days because we hit an arbitrary limit? The goal of this policy, as I understood it, was to mitigate the harmful effects of LLMs on the project. This addition shifts the goal entirely to enforcing a kind of aesthetic limit. Now it says: "If adoption of LLMs organically grows beyond a limit that I am uncomfortable with, I will force it all to stop." That imposes a level of control beyond what I consider reasonable, and sends a strong and clear signal to a class of contributors that they are second-class and not welcome here. I don't see how we can support that.
That would make this policy incoherent: This is a policy about accepting code to rust-lang/rust. Under what logic does it extend to meetings? To be clear, I would find it extremely distasteful for an LLM to "attend" a meeting on behalf of someone. I'm not defending that. But that falls entirely outside the scope of this policy. To be effective, policies need to be legible to the people they apply to. We can't sneak provision about what's allowed in meetings into a policy about issues and PRs to rust-lang/rust. |
I agree this is out of scope for the policy, I'll revert this. |
The goal of this policy is also to have an experiment for supporting contribution to Rust using LLMs. The problem is that that experiment (previously) had no experimental parameters, no success or failure criteria, no end, and no motivation to establish further non-experimental policies for such contributions (in any direction, whether to support or restrict any given kind of such contribution). The net result of that would be a permanent "experiment" that isn't actually conducting an experiment. And adding criteria/bounds/etc to the experiment immediately creates the problem of "when we hit that, if we can't agree on what to do, which policy is in effect?". No option for that default answer is going to satisfy everyone, because no option for that is going to bring everyone to the table to discuss and establish a non-experimental policy. So, any kind of "after X time" or "after X PRs" would not work there. But at the very least, if more than half of PRs to Rust are using LLMs, it's not an experiment anymore, and a policy setting out to do an experiment is no longer appropriate. One of the important harmful effects to avoid is driving people who don't use LLMs out of the project, or making them second-class contributors. |
| If more than half of PRs merged in a 6-week window are LLM-created, we disallow merging new LLM-created PRs until we go back below 50%, with a minimum cooldown of 10 days. | ||
| This window is chosen to align with our existing release cycle, and the cooldown is to avoid flip-flopping between allowed and disallowed. |
There was a problem hiding this comment.
A minimum cooldown makes the flip-flopping problem worse. You would get a large backlog of PRs that flood in at the end of the cooldown period, immediately triggering the circuit breaker again. It's the thundering herd problem.
There was a problem hiding this comment.
As I understand it, the intent here is for it to be an experiment, not to have LLM contributions go through the same procedure as normal ones.
It is about accepting a very limited amount of LLM contributions, as an experiment to see what the project would like to do in the future.
At the point where we would get to half of the merged PRs being LLM-generated, this would definitely not be an experiment anymore. This thundering herd problem would be by design, as the whole point is for there not to be many of them.
Hoping I'm not putting words in anyone's mouth, feel free to call me out if my interpretation is not correct.
There was a problem hiding this comment.
A minimum cooldown makes the flip-flopping problem worse. You would get a large backlog of PRs that flood in at the end of the cooldown period, immediately triggering the circuit breaker again. It's the thundering herd problem.
Without having hysteresis prevention, we can flip-flop on and off with every couple of PRs, or every bors merge; that would be substantially more chaotic. A longer period is more predictable, and gives us time to have interesting conversations and design parameters for how we want to handle LLM contributions going forward.
A longer period will also give time for other contributions to make their way in, and give time to experience whether things feel qualitatively different during the pause.
This isn't meant to be a "rate limit" that just gratuitously slows down LLM contributions, and a circuit breaker without hysteresis would act more like a rate limit. It's meant to be a guardrail on the experiment: an indicator of whether it is still, in fact, an experiment that we are evaluating, rather than business-as-usual. And it's meant to preserve the other important property: it is important that we continue to support non-LLM contributions.
In declining to call this policy interim, the author explained:
That seems applicable here too — like interim, experimental is forward looking. We may not intend for some parts to be permanent, but we have to be prepared for the fact that they might be permanent, or at least semipermanent. While we might wish otherwise, changing this policy is likely to be hard. Right now, if the provisions on LLM use were a bit more limiting, some would raise blocking concerns. And if they were a bit less limiting, others would raise blocking concerns. Changing the policy later is going to require that nobody raises a blocking concern.1 Whether such a future compromise is possible on different terms is not something we can know today. Consequently, when someone raises a concern about harmful effects, as @tmandry is doing,23 I don't think pointing to intentions is enough. Despite our intentions, these provisions may be with us for a long time. Similarly:
This rejects certain thresholds, as no default will bring everyone to the table when the threshold is reached. But the recent change to add a 50% threshold and set a default of halting work at that point creates the same problem described: the default won't necessarily bring everyone to the table later, at the threshold, and so won't satisfy everyone today. And @tmandry is suggesting it's a default we shouldn't be willing to live with, which we might have to if we reach it. That this was added so late makes this more of a problem, as @tmandry also points out. Footnotes
|
|
I want to dig into what I think may be a core difference in how people see and interpret this policy and its presentation of an experiment in allowing LLM contributions. I think this difference might be key to and how those be more explicit about a dichotomy here, because I think it's important:
For people who see the experiment as an experiment, I think the circuit-breaker makes perfect sense as a way to set a boundary on that experiment that isn't an ideal and desirable end state for any stakeholders, thus bringing people to the table to talk about how the experiment is going and what additional steps we should take next. For people who see the experiment as the enduring policy for LLM contributions, then it makes sense to feel like the circuit-breaker gets in the way of having it be an ideal enduring policy. But one thing that doesn't work is to have the experiment be a quantum superposition of both of those things, and pass a policy with different people having different interpretations of one of its core pillars, without surfacing those differing interpretations as a problem waiting to happen. If some people see the experiment as an experiment, and some people see it as a thing likely to continue indefinitely, that expectation mismatch is going to blow up in our faces later. |
Rendered
View all comments
FCP link
Summary
This document establishes a policy for how LLMs can be used when contributing to
rust-lang/rust. Subtrees, submodules, and dependencies from crates.io are not in scope. Other repositories in therust-langorganization are not in scope.This policy is intended to live in Forge as a living document, not as a dead RFC. It will be linked from
CONTRIBUTING.mdin rust-lang/rust as well as from the rustc- and std-dev-guides.Ethical issues
See this thread.
Moderation guidelines
This PR is preceded by an enormous amount of discussion on Zulip. Almost every conceivable angle has been discussed to death; there have been upwards of 3000 messages, not even counting discussion on GitHub. We initially doubted whether we could reach consensus at all.
Therefore, we ask to bound the scope of this PR specifically to the policy itself. In particular, we mark several topics as out of scope below. We still consider these topics to be important, we simply do not believe this is the right place to discuss them.
So, the following are considered off topic for this PR specifically:
We have asked the moderation team to help us enforce these rules. For an extended rationale, please see this comment.
Feedback guidelines
We are aware that parts of this policy will make some people very unhappy. As you are reading, we ask you to consider the following.
If your concern is for yourself or for your team
rust-lang/rust. Other repositories are not affected by this policy and are therefore not in scope.Previous versions of this document were discussed on Zulip, and we have made edits in responses to suggestions there.
Motivation
rust-lang/rustis currently dealing with a deluge of low-effort "slop" PRs primarily authored by LLMs.This policy is not intended as a debate over whether LLMs are a good or bad idea, nor over the long-term impact of LLMs. It is only intended to set out the future policy of
rust-lang/rustitself.Drawbacks
Rationale and alternatives
rust-lang/rust. This has the advantage that everyone knows what the policy is everywhere, and that it's easy to make things part of the mono-repo at a later date. It has the disadvantage that we think it is nigh-impossible to get everyone to agree. There are also reasons for teams to have different policies; for example, the standard for correctness is much higher within the compiler than within Clippy.Prior art
This prior art section is taken almost entirely from Jane Lusby's summary of her research, although we have taken the liberty of moving the Rust project's prior art to the top. We thank her for her help.
Rust
Other organizations
These are organized along a spectrum of AI friendliness, where top is least friendly, and bottom is most friendly.
Unresolved questions
See the "Moderation guidelines" and "Drawbacks" section for a list of topics that are out of scope.