Smollm3-3b chat completion fails: chat_template.jinja uses unsupported .replace() / .rstrip() methods

 ## Description

  `smollm3-3b-generic-gpu:1` fails during chat completion before inference starts. The failure appears to happen
  while Foundry Local / ONNX Runtime GenAI renders the model's `chat_template.jinja`.

  The bundled chat template uses Python/Jinja-style string methods such as `.replace()`, `.rstrip()`, and
  `.lstrip()`, but the runtime template parser reports these methods as unsupported.

  This means the model cannot currently be evaluated or used through normal `complete_chat(...)` calls with system/
  user messages.

  ## Environment

  - OS: macOS 14.6.1, Apple Silicon / arm64
  - Python: 3.13.14
  - Package: `foundry-local-sdk==1.2.1`
  - Model: `smollm3-3b-generic-gpu:1`
  - Model cache path observed locally:
    - `.foundry_local/models/Microsoft/smollm3-3b-generic-gpu-1/v1/`

  ## Reproduction Steps

  ```python
  from pathlib import Path

  from foundry_local_sdk import Configuration, FoundryLocalManager

  workspace = Path.cwd()
  app_data = workspace / ".foundry_local" / "app_data"
  model_cache = workspace / ".foundry_local" / "models"
  logs = workspace / ".foundry_local" / "logs"

  app_data.mkdir(parents=True, exist_ok=True)
  model_cache.mkdir(parents=True, exist_ok=True)
  logs.mkdir(parents=True, exist_ok=True)

  config = Configuration(
      app_name="smollm3_template_repro",
      app_data_dir=str(app_data),
      model_cache_dir=str(model_cache),
      logs_dir=str(logs),
  )

  FoundryLocalManager.initialize(config)
  manager = FoundryLocalManager.instance

  model = manager.catalog.get_model("smollm3-3b")
  model.download()
  model.load()

  client = model.get_chat_client()

  messages = [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the golden ratio?"},
  ]

  response = client.complete_chat(messages)
  print(response.choices[0].message.content)

  model.unload()

  ## Actual Result

  The request fails before a model response is generated.

  Error excerpt:

  FoundryLocalException: Error during chat completion: Error:
  Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException:
  Unknown method: replace at row 23, column 48:

  {%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
                                                 ^

  The same failure occurs consistently for every chat completion request using this model.

  ## Expected Result

  smollm3-3b-generic-gpu:1 should successfully render its chat template and produce a chat completion response.

  Alternatively, if the current runtime intentionally supports only a restricted subset of Jinja/template syntax,
  the catalog version of this model should use a compatible chat template.

  ## Suspected Cause

  The model package includes:

  .foundry_local/models/Microsoft/smollm3-3b-generic-gpu-1/v1/chat_template.jinja

  The template contains multiple string method calls:

  {%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
  {{- custom_instructions.replace("/system_override", "").rstrip() -}}
  {{ "<|im_start|>assistant\n" + content.lstrip("\n") + "<|im_end|>\n" }}
  {{ "<|im_start|>assistant\n" + "<think>\n\n</think>\n" + content.lstrip("\n") + "<|im_end|>\n" }}

  These appear to be unsupported by the runtime template parser used by Foundry Local / ONNX Runtime GenAI.

  ## Impact

  This blocks use of smollm3-3b through the normal Foundry Local Python SDK chat completion flow.

  In my evaluation run, the model had:

  success_rate = 0 / 25

  All failures had the same template parsing error, so this should be treated as a runtime/model-template
  compatibility failure rather than a model quality issue.

  ## Possible Fix

  One possible fix is to update the catalog model's chat_template.jinja so it avoids unsupported string methods
  such as .replace(), .rstrip(), and .lstrip().

  For example, a minimal compatibility-oriented template could avoid dynamically stripping /think, /no_think, and /
  system_override markers, or use only syntax supported by the runtime template parser.

  Another possible fix is to extend the runtime template parser to support these string methods if this syntax is
  expected to be valid for catalog model templates.

  ## Additional Notes

  The model's inference_model.json reports:

  {
    "Name": "smollm3-3b-generic-gpu:1",
    "PromptTemplate": {
      "system": "<|im_start|>system\n{Content}<|im_end|>",
      "user": "<|im_start|>user\n{Content}<|im_end|>",
      "assistant": "<|im_start|>assistant\n{Content}<|im_end|>",
      "prompt": "<|im_start|>user\n{Content}<|im_end|>\n<|im_start|>assistant"
    }
  }

  However, the runtime error references chat_template.jinja, so it appears that the bundled Jinja template is being
  used during chat completion.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smollm3-3b chat completion fails: chat_template.jinja uses unsupported .replace() / .rstrip() methods #800

Description

Environment

Reproduction Steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Smollm3-3b chat completion fails: chat_template.jinja uses unsupported .replace() / .rstrip() methods #800

Description

Description

Environment

Reproduction Steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions