Skip to content

Daemon-managed proxy can ignore current LEAN_CTX_OPENAI_UPSTREAM and keep using a stale upstream #449

@cowwoc

Description

@cowwoc

Summary

LEAN_CTX_OPENAI_UPSTREAM works in a clean lean-ctx proxy start repro, but a daemon-managed proxy can still come up with an older OpenAI upstream because the daemon starts the proxy from the daemon's own environment, not necessarily from the caller environment that expects the override.

In practice this means the active shell/container can have:

LEAN_CTX_OPENAI_UPSTREAM=http://host.docker.internal:2455

but requests sent to the live proxy still route to an older upstream (in my case https://host.docker.internal:24455).

Why this looks like a bug

The effective contract implied by LEAN_CTX_OPENAI_UPSTREAM is that setting it in the active environment should make the proxy use that upstream.

What actually happens is:

  • the current container/session environment contains the correct LEAN_CTX_OPENAI_UPSTREAM
  • the running lean-ctx daemon does not have that variable in its own environment
  • the daemon auto-spawns lean-ctx proxy start
  • the proxy inherits the daemon environment and comes up with a stale upstream

So the runtime behavior depends on which process previously started the daemon, not on the environment of the current session that is relying on the proxy.

Environment

  • lean-ctx 3.8.8
  • Linux container
  • proxy used by Codex via OPENAI_BASE_URL=http://127.0.0.1:4444
  • intended OpenAI upstream: http://host.docker.internal:2455

Evidence

1. The container/session env is correct

The live Codex container had:

LEAN_CTX_OPENAI_UPSTREAM=http://host.docker.internal:2455
OPENAI_BASE_URL=http://127.0.0.1:4444

2. The daemon env is not the same

The live daemon process environment did not include LEAN_CTX_OPENAI_UPSTREAM.

The live proxy process was a child of the daemon, not of the startup shell that had the override.

3. The live proxy routed to the wrong upstream

~/.lean-ctx/proxy.log showed:

OpenAI:    POST /v1/chat/completions → https://host.docker.internal:24455
OpenAI:    POST /v1/responses → https://host.docker.internal:24455

4. Clean repro shows the binary itself is fine

In an isolated temporary LEAN_CTX_DATA_DIR, these both worked correctly:

  • lean-ctx config set proxy.openai_upstream http://host.docker.internal:2455
  • LEAN_CTX_OPENAI_UPSTREAM=http://host.docker.internal:2455 lean-ctx proxy start --port=4555

and the proxy announced:

OpenAI:    POST /v1/chat/completions → http://host.docker.internal:2455
OpenAI:    POST /v1/responses → http://host.docker.internal:2455

So this does not look like a general parsing bug in proxy.openai_upstream or LEAN_CTX_OPENAI_UPSTREAM. It looks specific to the daemon/proxy lifecycle.

Relevant code path

The daemon start path appears to just spawn a new process from the daemon's current environment:

rust/src/daemon.rs

let child = Command::new(&exe)
    .args(&cmd_args)
    .stdin(std::process::Stdio::null())
    .stdout(std::process::Stdio::null())
    .stderr(stderr_cfg)
    .spawn()

and the proxy auto-start path does the same from the current lean-ctx process:

rust/src/cli/dispatch.rs

match std::process::Command::new(&binary)
    .args(["proxy", "start", &format!("--port={port}")])
    ...
    .spawn()

If the daemon itself was started without LEAN_CTX_OPENAI_UPSTREAM, a later session with the correct env can still end up talking to a proxy started from the daemon's stale env.

Expected behavior

One of these should be true:

  1. A daemon-managed proxy should reliably reflect the current configured upstream override.
  2. The daemon should persist and reuse the configured proxy upstream instead of depending on inherited env.
  3. The daemon/proxy startup path should document that env-only upstream overrides are not reliable once a daemon already exists.

Actual behavior

A live daemon/proxy can keep serving requests against an old upstream even though the active environment contains the correct LEAN_CTX_OPENAI_UPSTREAM.

Impact

This causes confusing failures in setups that route through a local gateway/proxy such as codex-lb.

In my case it produced misleading authentication errors because traffic that should have gone to the local upstream on 2455 was still going to an older TLS endpoint on 24455.

Suggested direction

I think the fix probably needs to be in daemon/proxy startup ownership, not in the core upstream resolver. The clean isolated repro suggests the resolver is behaving correctly when invoked directly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions