Open-source model library for interacting with a variety of LLM providers. Originally developed for internal use at vals.ai benchmarks. This tool is designed to be a general-purpose solution for projects that need a unified interface for multiple model providers.
Requires Python 3.11+.
pip install model-libraryNote: This library is undergoing rapid development. Expect breaking changes.
| Task | Start here |
|---|---|
| Use the installed Python library | Usage and Environment setup |
| Browse models from a repo checkout | Browse models |
| Configure provider API keys | API keys guide |
| Run the gateway | Gateway guide |
| Run examples from a repo checkout | Examples guide |
| Run tests | Tests guide |
| Contribute to model registry config | Model config README |
- AI21 Labs
- Alibaba
- Amazon Bedrock
- Anthropic
- Azure OpenAI
- Cohere
- DeepSeek
- Fireworks
- Google Gemini
- Mistral
- Perplexity
- Together AI
- OpenAI
- X AI
- ZhipuAI (zai)
From a repo checkout, run this to browse the model registry interactively:
python -m scripts.browse_modelsInstalled-package users can inspect providers through the Python API:
from model_library.registry_utils import get_model_names_by_provider, get_provider_names
print(get_provider_names())
print(get_model_names_by_provider("chosen-provider"))- Images
- Files
- Tools with full history
- Batch
- Reasoning
- Custom parameters
Warning: This query makes a real provider call. Configure the provider key first, expect provider billing/rate limits, and do not send sensitive prompts unless intentional. Query logging can include request and response content; use
set_logging(enable=False)or a redacting logger for sensitive workloads.
import asyncio
from model_library import model
async def main():
llm = model("anthropic/claude-opus-4-1-20250805-thinking")
result = await llm.query(
"What is QSBS? Explain your thinking in detail and make it concise."
)
print(result.output_text)
print(result.metadata) # cost, token, and performance telemetry
if __name__ == "__main__":
asyncio.run(main())The model registry holds model attributes such as reasoning, file support, tool support, and max tokens. You may also use models not included in the registry:
from model_library import raw_model
from model_library.base import LLMConfig
llm = raw_model("grok/grok-code-fast", LLMConfig(max_tokens=10000))You can extend the registry with custom configs from a local YAML file or URL using the same format as the bundled provider configs:
from model_library import load_custom_model_configs, load_latest_vals_model_configs
load_custom_model_configs("/path/to/my_models.yaml")
load_custom_model_configs("https://raw.githubusercontent.com/org/repo/main/models.yaml")
# Pull latest bundled configs from GitHub without upgrading the package.
load_latest_vals_model_configs()Root logger is named llm. To disable logging:
from model_library import set_logging
set_logging(enable=False)The model library reads provider API keys from environment variables, including:
OPENAI_API_KEYANTHROPIC_API_KEYGOOGLE_API_KEY
You can also set values through model_library_settings:
from model_library import model_library_settings
model_library_settings.set(MY_KEY="my-key")See docs/api-keys.md for supported provider key names, docs/config.md for YAML config structure, and docs/result.md for result metadata, cost, tokens, and performance telemetry.
The snippets below are excerpts. For runnable files and setup prerequisites, see examples/README.md.
uv run python examples/quickstart.pyawait llm.query(
[
SystemInput(
text="You are a pirate. Answer in a pirate style under 10 words."
),
TextInput(text="Hello, how are you?"),
],
)uv run python examples/inputs.pyred_image_content = b"..."
await llm.query(
[
TextInput(text="What color is the image?"),
FileWithBase64(
type="image",
name="red_image.png",
mime="png",
base64=base64.b64encode(red_image_content).decode("utf-8"),
),
]
)uv run python examples/tools.py <model> [--mode agent|direct|both]tools = [
ToolDefinition(
name="get_weather",
body=ToolBody(
name="get_weather",
description="Get current temperature in a given location",
properties={
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia",
},
},
required=["location"],
),
)
]
output1 = await llm.query(
[TextInput(text="What is the weather in SF right now?")],
tools=tools,
)
output2 = await llm.query(
[
ToolResult(tool_call=output1.tool_calls[0], result="25C"),
TextInput(text="Also include at least 8 emojis in your answer."),
],
history=output1.history,
tools=tools,
)Run these examples from a repo checkout. See examples/README.md for validator coverage, model-release checks, agent loops, and one-off demos:
| Example | Command |
|---|---|
| Model validator | uv run python examples/validate_model.py <model> [--json] |
| Quickstart | uv run python examples/quickstart.py |
| Inputs | uv run python examples/inputs.py |
| Tools | `uv run python examples/tools.py [--mode agent |
Use the validator first for model-release checks. It exercises core text, declared image/file transports, bounded agent tool use, reasoning evidence, prompt caching, configured/live rate limits, and configured pricing. List example commands with uv run examples or uv run python -m examples. If you already activated .venv, bare python examples/... commands work too.
- Provider API Keys — provider key names and gateway key rules
- Model Configuration — YAML config structure, inheritance, deprecation, settings
- Gateway — centralized FastAPI model proxy
- Agent — tool-augmented conversation loop
- ATIF — agent trajectory interchange format
- Conductor — multi-agent conversation orchestration
- Result Metadata — result shape, cost, tokens, and performance telemetry
- Token Retry & Benchmark Queue — rate-limit-aware scheduling via Redis
Designed to abstract different LLM providers:
- LLM base class: common interface for all models.
- Model registry: central registry that loads model configurations from YAML files.
- Provider-specific implementations: concrete classes for providers such as OpenAI, Google, and Anthropic.
- Data models: Pydantic models for input and output types such as
TextInput,FileWithBase64,ToolDefinition, andToolResult. - Retry logic: retry strategies for provider errors and rate limiting.
We use uv for dependency management. A Makefile is provided to help with development.
make install| Command | Purpose |
|---|---|
make install |
Install dependencies |
make test |
Run unit tests |
make test-integration |
Run integration tests; requires API keys and makes live provider calls |
make style |
Format and lint with fixes |
make style-check |
Check formatting and lint without fixes |
make typecheck |
Run basedpyright |
make config |
Generate all_models.json |
make run-models |
Run all configured model smoke tests |
make browse_models |
Browse models interactively |
The current Makefile help mentions make test-all, but that target has no recipe and does not run unit plus integration tests. Run make test and make test-integration separately.
Unit tests do not require API keys:
make testIntegration tests require provider API keys and make live calls:
make test-integrationSee tests/README.md for model selection, raw pytest usage, and environment setup.