Skip to content

Smollm3-3b chat completion fails: chat_template.jinja uses unsupported .replace() / .rstrip() methods #800

@ambs19

Description

@ambs19

Description

smollm3-3b-generic-gpu:1 fails during chat completion before inference starts. The failure appears to happen
while Foundry Local / ONNX Runtime GenAI renders the model's chat_template.jinja.

The bundled chat template uses Python/Jinja-style string methods such as .replace(), .rstrip(), and
.lstrip(), but the runtime template parser reports these methods as unsupported.

This means the model cannot currently be evaluated or used through normal complete_chat(...) calls with system/
user messages.

Environment

  • OS: macOS 14.6.1, Apple Silicon / arm64
  • Python: 3.13.14
  • Package: foundry-local-sdk==1.2.1
  • Model: smollm3-3b-generic-gpu:1
  • Model cache path observed locally:
    • .foundry_local/models/Microsoft/smollm3-3b-generic-gpu-1/v1/

Reproduction Steps

from pathlib import Path

from foundry_local_sdk import Configuration, FoundryLocalManager

workspace = Path.cwd()
app_data = workspace / ".foundry_local" / "app_data"
model_cache = workspace / ".foundry_local" / "models"
logs = workspace / ".foundry_local" / "logs"

app_data.mkdir(parents=True, exist_ok=True)
model_cache.mkdir(parents=True, exist_ok=True)
logs.mkdir(parents=True, exist_ok=True)

config = Configuration(
    app_name="smollm3_template_repro",
    app_data_dir=str(app_data),
    model_cache_dir=str(model_cache),
    logs_dir=str(logs),
)

FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

model = manager.catalog.get_model("smollm3-3b")
model.download()
model.load()

client = model.get_chat_client()

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the golden ratio?"},
]

response = client.complete_chat(messages)
print(response.choices[0].message.content)

model.unload()

## Actual Result

The request fails before a model response is generated.

Error excerpt:

FoundryLocalException: Error during chat completion: Error:
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException:
Unknown method: replace at row 23, column 48:

{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
                                               ^

The same failure occurs consistently for every chat completion request using this model.

## Expected Result

smollm3-3b-generic-gpu:1 should successfully render its chat template and produce a chat completion response.

Alternatively, if the current runtime intentionally supports only a restricted subset of Jinja/template syntax,
the catalog version of this model should use a compatible chat template.

## Suspected Cause

The model package includes:

.foundry_local/models/Microsoft/smollm3-3b-generic-gpu-1/v1/chat_template.jinja

The template contains multiple string method calls:

{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
{{- custom_instructions.replace("/system_override", "").rstrip() -}}
{{ "<|im_start|>assistant\n" + content.lstrip("\n") + "<|im_end|>\n" }}
{{ "<|im_start|>assistant\n" + "<think>\n\n</think>\n" + content.lstrip("\n") + "<|im_end|>\n" }}

These appear to be unsupported by the runtime template parser used by Foundry Local / ONNX Runtime GenAI.

## Impact

This blocks use of smollm3-3b through the normal Foundry Local Python SDK chat completion flow.

In my evaluation run, the model had:

success_rate = 0 / 25

All failures had the same template parsing error, so this should be treated as a runtime/model-template
compatibility failure rather than a model quality issue.

## Possible Fix

One possible fix is to update the catalog model's chat_template.jinja so it avoids unsupported string methods
such as .replace(), .rstrip(), and .lstrip().

For example, a minimal compatibility-oriented template could avoid dynamically stripping /think, /no_think, and /
system_override markers, or use only syntax supported by the runtime template parser.

Another possible fix is to extend the runtime template parser to support these string methods if this syntax is
expected to be valid for catalog model templates.

## Additional Notes

The model's inference_model.json reports:

{
  "Name": "smollm3-3b-generic-gpu:1",
  "PromptTemplate": {
    "system": "<|im_start|>system\n{Content}<|im_end|>",
    "user": "<|im_start|>user\n{Content}<|im_end|>",
    "assistant": "<|im_start|>assistant\n{Content}<|im_end|>",
    "prompt": "<|im_start|>user\n{Content}<|im_end|>\n<|im_start|>assistant"
  }
}

However, the runtime error references chat_template.jinja, so it appears that the bundled Jinja template is being
used during chat completion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions