feat(catalog)!: add staged provider catalog indexing#119
Open
Zzackllack wants to merge 45 commits into
Open
Conversation
This comment was marked as low quality.
This comment was marked as low quality.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
…rmalization strategy
Implement the scheduled provider catalog index and canonical mapping layer for AniBridge. Move provider catalog discovery out of the request path and into a persistent SQLite-backed index with bootstrap gating, provider refresh status tracking, and canonical TV episode mappings. Update Torznab search and tvsearch to serve indexed provider and mapping data instead of triggering request-time title refresh or episode probing. Expose bootstrap visibility through health endpoints and terminal progress output so first-time catalog builds are easier to monitor.
Update integration and migration tests for the catalog-indexed Torznab flow. Cover bootstrap blocking, indexed generic search, canonical tvsearch mapping, special mapping behavior, health progress reporting, and dynamic Alembic head verification.
Use DATA_DIR as the default base for the generated terminal log path instead of the container working directory. This keeps development Compose logs on the mounted /data volume rather than writing them into /app/data inside the container.
Replace whole-provider buffering with a bounded crawl-to-writer pipeline. Add queue backpressure, batch SQLite persistence, interrupted staging cleanup, richer health progress, and generation-aware provider mappings so partial refresh state never becomes live.
Redesign provider catalog indexing into durable title, detail, and canonical stages with DB-backed request reads. Add SQLite-safe serialized catalog writes, WAL/busy-timeout configuration, bounded canonical caches, and retry backoff that respects future retry timestamps. Update defaults for safe self-hosted concurrency and add targeted coverage for staged readiness, incremental persistence, cache bounds, and scheduler backoff.
Defer background worker startup until after the qBittorrent add request has persisted its client task and job metadata. This prevents the request thread and scheduler worker from contending on the same SQLite rows during the initial add flow, which was surfacing as Sonarr download warnings. Add regression coverage for deferred worker startup in the qBittorrent-compatible API.
Normalize legacy provider index rows into the staged readiness model without assuming every persisted status object already has the new stage attributes populated. This keeps health output consistent after the staged catalog migration and prevents bootstrap-ready providers from reporting pending title index state due to missing backfilled fields. Add focused tests for interrupted-state recovery and legacy stage backfill behavior.
Co-authored-by: Copilot <copilot@github.com>
Add a hard timeout around provider direct-link resolution so a stuck upstream host cannot leave a job in downloading forever. Expose the timeout as PROVIDER_DIRECT_LINK_TIMEOUT_SECONDS and update the example environment configuration accordingly. Add regression coverage for timed-out host resolution and config parsing of the new timeout value.
Move yt-dlp progress persistence out of the callback hot path and flush only the latest per-job snapshot from a single writer. This removes concurrent progress-hook writes for the same job, reduces SQLite write pressure, and prevents repeated database locked errors during active downloads.
Enhance logging messages to provide clearer context during download host resolution and retry attempts. This includes details about preferred and resolved providers, as well as error messages for failed downloads.
Enhance error reporting by capturing exception messages during title indexing failures. This change ensures that the error details are passed correctly to the failure handling functions, improving debugging and logging capabilities.
…ookup Add functionality to resolve titles from an in-memory index before querying the database. This improves performance by prioritizing faster lookups and reduces reliance on database access when the catalog is ready.
bypass catalog readiness for empty torznab search test responses honor STRM_FILES_MODE=only in indexed torznab item emission fall back to default provider languages when episode language rows are absent bound provider timeout workers and add cooperative crawl cancellation redact resolved direct URLs from download logs only mark qBittorrent tasks downloading after worker startup succeeds prevent non-flush scheduler shutdown from writing stale progress move health snapshot handlers off the event loop
store provider catalog titles, aliases, episodes, and episode languages per generation scope staged replacements to the target generation keep live catalog generations intact during failed staged refreshes add migration to rebuild provider catalog tables with generation-aware primary keys implement real downgrade paths for provider index stage migration make generation rollback inserts PK-safe in provider mapping downgrade BREAKING CHANGE: provider catalog tables now use generation-aware primary keys. Any direct SQL, external tooling, or handwritten migrations that assumed uniqueness on provider/slug or provider/slug/season/episode without indexed_generation must be updated before deploying this schema change.
- stub catalog readiness in low-confidence overlap unit coverage - initialize schema explicitly for DB-backed alias lookup coverage - remove dependency on pre-existing local test database state
- pin STRM_FILES_MODE in the torznab hard-cap integration test - bind title_resolver to the test database engine in alias lookup coverage - remove CI dependence on ambient module import order and env defaults
- fail closed when the title index writer cannot be stopped cleanly - avoid marking enrichment stages ready while rows are only deferred by retry windows - remove implicit commits from provider generation cleanup helpers - handle missing catalog tables in title resolver DB fallbacks
- pre-register running jobs before executor submit to avoid stale RUNNING entries - mark qBittorrent client tasks failed when background job startup fails - guard provider direct-link timeouts with daemon threads instead of the shared executor
- seed enough mapped episodes for torznab hard-cap coverage - bootstrap providerindexstatus via migrations in title resolver tests - add regressions for fast-finishing scheduler jobs and deferred stage retries
Isolate timed-out provider detail crawls so stuck tasks cannot starve later title enrichment work. Preserve queued torrent state semantics in the qBittorrent shim, dedupe duplicate episode-language inserts, and ensure title-index failures are still persisted even when writer shutdown also errors. Also block title refresh while enrichment rows are still unfinished and treat OperationalError during indexed title lookups as a catalog-not-ready fallback instead of bubbling the exception.
Collapse ambiguous provider episode title matches into a single conflict mapping, filter indexed title generations in SQL, and allow site-scoped DB title lookup before all providers are bootstrapped.
Track timed-out direct-link lookup workers per episode and skip further provider fallback attempts while the original lookup is still running. Clear the catalog indexer stop flag before starting a new scheduler thread so reused interpreter lifecycles can restart indexing.
Query a small batch of indexed title candidates for site-scoped slug resolution and return the first candidate whose rescored title match meets the resolver threshold.
Allow provider fallback to continue after a host timeout while suppressing duplicate hung provider attempts. Gate movie Torznab searches on Megakino readiness only and persist row-stage retry deadlines when parking pending catalog enrichment stages.
8043103 to
3d23b78
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Scope Torznab searches to ready provider indexes while preserving synthetic validation and optional fallback behavior. Persist download metadata for paused tasks and add qBittorrent-compatible resume handling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a persistent, SQLite-backed provider catalog index for AniBridge and moves catalog-dependent Torznab/search behavior away from request-time provider probing.
This introduces scheduled provider indexing with staged title, detail, and canonical enrichment so AniBridge can build local provider metadata progressively, expose readiness/progress through health endpoints, and serve searches from indexed database state once the catalog is ready.
Type of Change
Testing
Screenshots (if applicable)
Additional Notes
Breaking changes
What changed
app.catalogindexing layer for provider catalog discovery, enrichment, readiness checks, and progress reporting./healthcatalog status data and a dedicated/health/catalogendpoint.DATA_DIRlogs, and writable runtime home defaults.Why
The previous request-driven/provider-live behavior made cold searches expensive and unpredictable. Large provider crawls could also retain too much state in memory, especially during first bootstrap or broad provider refreshes.
This PR makes provider catalog data local, durable, progressively refreshed, and observable. It also keeps the default SQLite deployment viable by bounding indexing memory usage, reducing write contention, and avoiding request-path full provider crawls.
Operational notes
Testing
This branch adds or updates coverage for:
Summary by CodeRabbit
New Features
Bug Fixes
Tests