Description
smollm3-3b-generic-gpu:1 fails during chat completion before inference starts. The failure appears to happen
while Foundry Local / ONNX Runtime GenAI renders the model's chat_template.jinja.
The bundled chat template uses Python/Jinja-style string methods such as .replace(), .rstrip(), and
.lstrip(), but the runtime template parser reports these methods as unsupported.
This means the model cannot currently be evaluated or used through normal complete_chat(...) calls with system/
user messages.
Environment
- OS: macOS 14.6.1, Apple Silicon / arm64
- Python: 3.13.14
- Package:
foundry-local-sdk==1.2.1
- Model:
smollm3-3b-generic-gpu:1
- Model cache path observed locally:
.foundry_local/models/Microsoft/smollm3-3b-generic-gpu-1/v1/
Reproduction Steps
from pathlib import Path
from foundry_local_sdk import Configuration, FoundryLocalManager
workspace = Path.cwd()
app_data = workspace / ".foundry_local" / "app_data"
model_cache = workspace / ".foundry_local" / "models"
logs = workspace / ".foundry_local" / "logs"
app_data.mkdir(parents=True, exist_ok=True)
model_cache.mkdir(parents=True, exist_ok=True)
logs.mkdir(parents=True, exist_ok=True)
config = Configuration(
app_name="smollm3_template_repro",
app_data_dir=str(app_data),
model_cache_dir=str(model_cache),
logs_dir=str(logs),
)
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance
model = manager.catalog.get_model("smollm3-3b")
model.download()
model.load()
client = model.get_chat_client()
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the golden ratio?"},
]
response = client.complete_chat(messages)
print(response.choices[0].message.content)
model.unload()
## Actual Result
The request fails before a model response is generated.
Error excerpt:
FoundryLocalException: Error during chat completion: Error:
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException:
Unknown method: replace at row 23, column 48:
{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
^
The same failure occurs consistently for every chat completion request using this model.
## Expected Result
smollm3-3b-generic-gpu:1 should successfully render its chat template and produce a chat completion response.
Alternatively, if the current runtime intentionally supports only a restricted subset of Jinja/template syntax,
the catalog version of this model should use a compatible chat template.
## Suspected Cause
The model package includes:
.foundry_local/models/Microsoft/smollm3-3b-generic-gpu-1/v1/chat_template.jinja
The template contains multiple string method calls:
{%- set custom_instructions = system_message.replace("/no_think", "").replace("/think", "").rstrip() -%}
{{- custom_instructions.replace("/system_override", "").rstrip() -}}
{{ "<|im_start|>assistant\n" + content.lstrip("\n") + "<|im_end|>\n" }}
{{ "<|im_start|>assistant\n" + "<think>\n\n</think>\n" + content.lstrip("\n") + "<|im_end|>\n" }}
These appear to be unsupported by the runtime template parser used by Foundry Local / ONNX Runtime GenAI.
## Impact
This blocks use of smollm3-3b through the normal Foundry Local Python SDK chat completion flow.
In my evaluation run, the model had:
success_rate = 0 / 25
All failures had the same template parsing error, so this should be treated as a runtime/model-template
compatibility failure rather than a model quality issue.
## Possible Fix
One possible fix is to update the catalog model's chat_template.jinja so it avoids unsupported string methods
such as .replace(), .rstrip(), and .lstrip().
For example, a minimal compatibility-oriented template could avoid dynamically stripping /think, /no_think, and /
system_override markers, or use only syntax supported by the runtime template parser.
Another possible fix is to extend the runtime template parser to support these string methods if this syntax is
expected to be valid for catalog model templates.
## Additional Notes
The model's inference_model.json reports:
{
"Name": "smollm3-3b-generic-gpu:1",
"PromptTemplate": {
"system": "<|im_start|>system\n{Content}<|im_end|>",
"user": "<|im_start|>user\n{Content}<|im_end|>",
"assistant": "<|im_start|>assistant\n{Content}<|im_end|>",
"prompt": "<|im_start|>user\n{Content}<|im_end|>\n<|im_start|>assistant"
}
}
However, the runtime error references chat_template.jinja, so it appears that the bundled Jinja template is being
used during chat completion.
Description
smollm3-3b-generic-gpu:1fails during chat completion before inference starts. The failure appears to happenwhile Foundry Local / ONNX Runtime GenAI renders the model's
chat_template.jinja.The bundled chat template uses Python/Jinja-style string methods such as
.replace(),.rstrip(), and.lstrip(), but the runtime template parser reports these methods as unsupported.This means the model cannot currently be evaluated or used through normal
complete_chat(...)calls with system/user messages.
Environment
foundry-local-sdk==1.2.1smollm3-3b-generic-gpu:1.foundry_local/models/Microsoft/smollm3-3b-generic-gpu-1/v1/Reproduction Steps