feat(catalog)!: add staged provider catalog indexing by Zzackllack · Pull Request #119 · Zzackllack/AniBridge

Zzackllack · 2026-04-29T23:04:47Z

Adds a persistent, SQLite-backed provider catalog index for AniBridge and moves catalog-dependent Torznab/search behavior away from request-time provider probing.

This introduces scheduled provider indexing with staged title, detail, and canonical enrichment so AniBridge can build local provider metadata progressively, expose readiness/progress through health endpoints, and serve searches from indexed database state once the catalog is ready.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Testing

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Screenshots (if applicable)

Additional Notes

Breaking changes

Catalog-dependent Torznab/search requests may now return an initialization/503 response until the local provider catalog title index is ready.
Search and tvsearch behavior now depends on the persisted catalog index and canonical mapping tables instead of request-time provider crawling/probing.
Existing deployments must apply the new Alembic migrations before using the indexed catalog paths.
First startup may require a local provider catalog bootstrap before Sonarr/Radarr queries return normal results.

What changed

Added a new app.catalog indexing layer for provider catalog discovery, enrichment, readiness checks, and progress reporting.
Added database schema and migrations for provider titles, aliases, episodes, episode languages, canonical series/episodes, provider mappings, generation tracking, and staged index status.
Updated Torznab search, movie search, and tvsearch paths to use indexed provider and canonical mapping data instead of doing uncontrolled live provider crawls during requests.
Added catalog readiness gating so catalog-dependent requests return a clear initializing response while the local index is not ready.
Added /health catalog status data and a dedicated /health/catalog endpoint.
Reworked provider indexing into memory-bounded staged workers with bounded queues, batch persistence, SQLite-safe serialized writes, retry/backoff handling, and staged readiness states.
Added safer defaults and environment configuration for provider refresh cadence, crawler concurrency, queue sizes, writer batching, canonical enrichment, direct-link timeout, and scheduler progress flushing.
Improved operational stability around SQLite by coalescing download progress writes and deferring qBittorrent worker startup until task/job metadata is persisted.
Hardened downloader host resolution with per-host direct-link timeouts and better logging for provider/host resolution failures.
Stabilized Megakino startup/indexing by resolving the active domain before catalog startup and bypassing the shared pooled HTTP client for resolver/sitemap requests.
Updated Docker/runtime behavior for dynamic API ports, mounted DATA_DIR logs, and writable runtime home defaults.
Added internal specs documenting the scheduled provider index, canonical normalization/mapping strategy, streaming persistence, memory bounds, and staged indexing design.
Expanded unit, integration, migration, and regression coverage for indexed Torznab flows, catalog readiness, indexer behavior, provider parsing, staged readiness backfills, SQLite write-race fixes, progress write coalescing, and downloader timeout handling.

Why

The previous request-driven/provider-live behavior made cold searches expensive and unpredictable. Large provider crawls could also retain too much state in memory, especially during first bootstrap or broad provider refreshes.

This PR makes provider catalog data local, durable, progressively refreshed, and observable. It also keeps the default SQLite deployment viable by bounding indexing memory usage, reducing write contention, and avoiding request-path full provider crawls.

Operational notes

Adds Alembic migrations for the provider catalog index and staged index state.
Fresh installs will need to build the local provider catalog before catalog-dependent Torznab requests are fully usable.
Health responses now expose catalog bootstrap and per-provider/stage progress.
Existing successful catalog generations can continue to be served while replacement generations are built.
Provider-derived catalog data is computed locally; this does not add a bundled provider catalog snapshot.

Testing

This branch adds or updates coverage for:

indexed Torznab search and tvsearch behavior
catalog bootstrap/readiness responses
catalog indexer scheduling, recovery, and progress reporting
provider parsing for AniWorld and s.to episode/language data
canonical episode deduplication
staged readiness migration/backfill behavior
qBittorrent download startup SQLite race prevention
scheduler progress write coalescing
provider direct-link timeout handling
Docker runtime home and terminal log path defaults

Summary by CodeRabbit

New Features
- Background provider catalog indexing with configurable refresh/concurrency, staged indexing/enrichment, canonical matching, and cache controls.
- Health endpoint now includes catalog progress and a dedicated catalog health route.
- Scheduling: separate create/start flows, optional autostart, and periodic job-progress persistence/flush.
Bug Fixes
- Enforced direct-link resolution timeouts and clearer, redacted download/fallback logs.
- More consistent torrent state reporting across APIs.
Tests
- Expanded unit and integration coverage for cataloging, Torznab, scheduler, provider resolution, migrations, and downloads.

…rmalization strategy

Implement the scheduled provider catalog index and canonical mapping layer for AniBridge. Move provider catalog discovery out of the request path and into a persistent SQLite-backed index with bootstrap gating, provider refresh status tracking, and canonical TV episode mappings. Update Torznab search and tvsearch to serve indexed provider and mapping data instead of triggering request-time title refresh or episode probing. Expose bootstrap visibility through health endpoints and terminal progress output so first-time catalog builds are easier to monitor.

Update integration and migration tests for the catalog-indexed Torznab flow. Cover bootstrap blocking, indexed generic search, canonical tvsearch mapping, special mapping behavior, health progress reporting, and dynamic Alembic head verification.

Use DATA_DIR as the default base for the generated terminal log path instead of the container working directory. This keeps development Compose logs on the mounted /data volume rather than writing them into /app/data inside the container.

Replace whole-provider buffering with a bounded crawl-to-writer pipeline. Add queue backpressure, batch SQLite persistence, interrupted staging cleanup, richer health progress, and generation-aware provider mappings so partial refresh state never becomes live.

Redesign provider catalog indexing into durable title, detail, and canonical stages with DB-backed request reads. Add SQLite-safe serialized catalog writes, WAL/busy-timeout configuration, bounded canonical caches, and retry backoff that respects future retry timestamps. Update defaults for safe self-hosted concurrency and add targeted coverage for staged readiness, incremental persistence, cache bounds, and scheduler backoff.

Defer background worker startup until after the qBittorrent add request has persisted its client task and job metadata. This prevents the request thread and scheduler worker from contending on the same SQLite rows during the initial add flow, which was surfacing as Sonarr download warnings. Add regression coverage for deferred worker startup in the qBittorrent-compatible API.

Normalize legacy provider index rows into the staged readiness model without assuming every persisted status object already has the new stage attributes populated. This keeps health output consistent after the staged catalog migration and prevents bootstrap-ready providers from reporting pending title index state due to missing backfilled fields. Add focused tests for interrupted-state recovery and legacy stage backfill behavior.

Co-authored-by: Copilot <copilot@github.com>

Add a hard timeout around provider direct-link resolution so a stuck upstream host cannot leave a job in downloading forever. Expose the timeout as PROVIDER_DIRECT_LINK_TIMEOUT_SECONDS and update the example environment configuration accordingly. Add regression coverage for timed-out host resolution and config parsing of the new timeout value.

Move yt-dlp progress persistence out of the callback hot path and flush only the latest per-job snapshot from a single writer. This removes concurrent progress-hook writes for the same job, reduces SQLite write pressure, and prevents repeated database locked errors during active downloads.

Enhance logging messages to provide clearer context during download host resolution and retry attempts. This includes details about preferred and resolved providers, as well as error messages for failed downloads.

Enhance error reporting by capturing exception messages during title indexing failures. This change ensures that the error details are passed correctly to the failure handling functions, improving debugging and logging capabilities.

…ookup Add functionality to resolve titles from an in-memory index before querying the database. This improves performance by prioritizing faster lookups and reduces reliance on database access when the catalog is ready.

bypass catalog readiness for empty torznab search test responses honor STRM_FILES_MODE=only in indexed torznab item emission fall back to default provider languages when episode language rows are absent bound provider timeout workers and add cooperative crawl cancellation redact resolved direct URLs from download logs only mark qBittorrent tasks downloading after worker startup succeeds prevent non-flush scheduler shutdown from writing stale progress move health snapshot handlers off the event loop

store provider catalog titles, aliases, episodes, and episode languages per generation scope staged replacements to the target generation keep live catalog generations intact during failed staged refreshes add migration to rebuild provider catalog tables with generation-aware primary keys implement real downgrade paths for provider index stage migration make generation rollback inserts PK-safe in provider mapping downgrade BREAKING CHANGE: provider catalog tables now use generation-aware primary keys. Any direct SQL, external tooling, or handwritten migrations that assumed uniqueness on provider/slug or provider/slug/season/episode without indexed_generation must be updated before deploying this schema change.

- stub catalog readiness in low-confidence overlap unit coverage - initialize schema explicitly for DB-backed alias lookup coverage - remove dependency on pre-existing local test database state

- pin STRM_FILES_MODE in the torznab hard-cap integration test - bind title_resolver to the test database engine in alias lookup coverage - remove CI dependence on ambient module import order and env defaults

- fail closed when the title index writer cannot be stopped cleanly - avoid marking enrichment stages ready while rows are only deferred by retry windows - remove implicit commits from provider generation cleanup helpers - handle missing catalog tables in title resolver DB fallbacks

- pre-register running jobs before executor submit to avoid stale RUNNING entries - mark qBittorrent client tasks failed when background job startup fails - guard provider direct-link timeouts with daemon threads instead of the shared executor

- seed enough mapped episodes for torznab hard-cap coverage - bootstrap providerindexstatus via migrations in title resolver tests - add regressions for fast-finishing scheduler jobs and deferred stage retries

Isolate timed-out provider detail crawls so stuck tasks cannot starve later title enrichment work. Preserve queued torrent state semantics in the qBittorrent shim, dedupe duplicate episode-language inserts, and ensure title-index failures are still persisted even when writer shutdown also errors. Also block title refresh while enrichment rows are still unfinished and treat OperationalError during indexed title lookups as a catalog-not-ready fallback instead of bubbling the exception.

Collapse ambiguous provider episode title matches into a single conflict mapping, filter indexed title generations in SQL, and allow site-scoped DB title lookup before all providers are bootstrapped.

Track timed-out direct-link lookup workers per episode and skip further provider fallback attempts while the original lookup is still running. Clear the catalog indexer stop flag before starting a new scheduler thread so reused interpreter lifecycles can restart indexing.

Query a small batch of indexed title candidates for site-scoped slug resolution and return the first candidate whose rescored title match meets the resolver threshold.

Allow provider fallback to continue after a host timeout while suppressing duplicate hung provider attempts. Gate movie Torznab searches on Megakino readiness only and persist row-stage retry deadlines when parking pending catalog enrichment stages.

Scope Torznab searches to ready provider indexes while preserving synthetic validation and optional fallback behavior. Persist download metadata for paused tasks and add qBittorrent-compatible resume handling.

Zzackllack requested a review from Copilot April 29, 2026 23:04

Zzackllack self-assigned this Apr 29, 2026

Zzackllack added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Apr 29, 2026

This comment was marked as low quality.

Sign in to view

This comment was marked as resolved.

Sign in to view

Copilot started reviewing on behalf of Zzackllack April 29, 2026 23:05 View session

This comment was marked as outdated.

Sign in to view

coderabbitai Bot added the dependencies Pull requests that update a dependency file label Apr 30, 2026

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

Zzackllack added 4 commits June 25, 2026 02:57

docs(specs): spec files for adding scheduled provider indexing and no…

a80b556

…rmalization strategy

Zzackllack and others added 26 commits June 25, 2026 02:57

docs(specs): Streaming Persistence and Memory Bounds

eb10ebe

docs(specs): performance considerations for provider catalog indexing

9402e97

refactor(db): add static type hinting mirror for dynamic exports

b675f48

Co-authored-by: Copilot <copilot@github.com>

fix(downloader): improve logging for download host resolution failures

7dc4021

Enhance logging messages to provide clearer context during download host resolution and retry attempts. This includes details about preferred and resolved providers, as well as error messages for failed downloads.

style: run ruff format

bdbcff4

style: run ruff formatter

ec11cc8

test: isolate title resolver DB fallback tests

c529047

- stub catalog readiness in low-confidence overlap unit coverage - initialize schema explicitly for DB-backed alias lookup coverage - remove dependency on pre-existing local test database state

test: stabilize torznab and title-resolver CI expectations

65022b7

- pin STRM_FILES_MODE in the torznab hard-cap integration test - bind title_resolver to the test database engine in alias lookup coverage - remove CI dependence on ambient module import order and env defaults

fix: close scheduler and task state races

06d754c

- pre-register running jobs before executor submit to avoid stale RUNNING entries - mark qBittorrent client tasks failed when background job startup fails - guard provider direct-link timeouts with daemon threads instead of the shared executor

test: align catalog fixtures with hard-cap expectations

45de487

- seed enough mapped episodes for torznab hard-cap coverage - bootstrap providerindexstatus via migrations in title resolver tests - add regressions for fast-finishing scheduler jobs and deferred stage retries

fix(catalog): prevent ambiguous usable provider mappings

5f8ac4e

Collapse ambiguous provider episode title matches into a single conflict mapping, filter indexed title generations in SQL, and allow site-scoped DB title lookup before all providers are bootstrapped.

fix(resolver): rescore multiple site-scoped DB candidates

337b9a2

Query a small batch of indexed title candidates for site-scoped slug resolution and return the first candidate whose rescored title match meets the resolver threshold.

Zzackllack force-pushed the provider-catalog-index branch from 8043103 to 3d23b78 Compare June 25, 2026 01:02

This comment was marked as resolved.

Sign in to view

fix(api): resolve catalog readiness and paused torrent resume

61351bb

Scope Torznab searches to ready provider indexes while preserving synthetic validation and optional fallback behavior. Persist download metadata for paused tasks and add qBittorrent-compatible resume handling.

coderabbitai Bot approved these changes Jun 25, 2026

View reviewed changes

Conversation

Zzackllack commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Testing

Screenshots (if applicable)

Additional Notes

Breaking changes

What changed

Why

Operational notes

Testing

Summary by CodeRabbit

Uh oh!

This comment was marked as low quality.

This comment was marked as resolved.

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Zzackllack commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading