Skip to content

Matchmaking returns "Session not found" (401) for a valid JWT after a transient MQTT disconnect, with no recovery path #32

Description

@ChronoFinale

Matchmaking returns "Session not found" (401) for a player with a valid JWT, with no recovery path

Summary

A player who is correctly authenticated (valid JWT) can be permanently locked out of matchmaking with Session not found (401) after a single transient MQTT disconnect. Retrying does not help, because nothing recreates the in-memory session and the still-valid JWT means the client never re-authenticates. The only workaround is a full client restart.

Root cause

The in-memory session and the stateless JWT have decoupled lifetimes, and the matchmaking route is the only flow that requires the session strictly instead of self-healing.

  1. Sessions are in-memory only. src/state/index.tsexport const sessions = new Map<string, PlayerSession>(). No persistence, no rehydration on startup.

  2. JWT auth never checks for a session. src/middleware/authenticate.ts only runs verifyJwt(token) and sets req.player. A valid token passes even when the session is gone.

  3. A non-lobby disconnect reaps the session immediately. src/features/emqx/emqx.route.ts:

    async function releasePlayerLobbyOrSession(clientid) {
      const session = getSession(clientid)
      if (!session) return
      leaveAllQueues(clientid)
      if (session.lobbyCode) {
        await startGracePeriod(clientid) // lobby members get a grace period
      } else {
        removeSession(clientid)          // everyone else is removed on the spot
      }
    }

    The MQTT clientid is the playerId (the client sends player_id as the MQTT client id), so getSession(clientid) / removeSession(clientid) resolve the real session. A player who is matchmaking — and therefore not yet in a lobby — has their session deleted the instant EMQX reports client.disconnected.

  4. Matchmaking requires the session strictly. src/features/matchmaking/matchmaking.route.ts:

    const session = getSession(req.player.playerId)
    if (!session) throw new AppError('Session not found', 401)

    By contrast, ~5 other endpoints call ensureSession() (src/features/auth/auth.service.ts), which rebuilds the session from the DB when it's missing. Matchmaking is the asymmetric one that does not self-heal.

Reproduction

  1. Authenticate (Steam) — session created, MQTT connected.
  2. Drop the MQTT connection briefly (ordinary network blip). The client default is reconnect = false, so it does not auto-reconnect.
  3. EMQX fires client.disconnected; since the player isn't in a lobby, the server calls removeSession immediately.
  4. Click matchmake → POST /api/matchmaking/queue with the still-valid JWT → getSession returns nothing → Session not found (401).
  5. Retrying never recovers: the JWT is still valid, so the client doesn't re-auth, and nothing else recreates the session.

Impact

A single transient disconnect strands an authenticated player. From the user's side it looks like matchmaking is simply broken; the only fix they have is to fully quit and relaunch the game (which forces a fresh /auth and recreates the session).

Suggested fixes (in order of smallest blast radius)

  1. Make matchmaking self-heal like the other routes — use ensureSession(req.player.playerId) instead of strict getSession. A DB-backed (Steam-authed) player would transparently get their session rebuilt instead of a 401. This is the minimal, consistent fix.
  2. Don't immediately reap non-lobby sessions on disconnect — give them the same grace period lobby members get, so a brief blip doesn't destroy the session.
  3. (Client-side, optional) Treat a 401 Session not found as "re-authenticate, then retry" rather than surfacing it as a terminal error.

Notes / unverified

The mechanism above is confirmed against the code end-to-end (in-memory map, stateless JWT, immediate non-lobby reap, strict matchmaking lookup, clientid == playerId). What is not independently confirmed is that a given user report was caused by this exact path versus another session-loss trigger (e.g. a server restart/redeploy wiping the in-memory map, or a grace-period expiry). All of those funnel into the same end state — valid JWT, no session, strict 401 — so the fixes apply regardless.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions