Skip to content

feat(detect): index Jupyter notebooks with kernel-language fenced code blocks#1

Draft
jimwhite with Copilot wants to merge 2 commits into
v8from
copilot/fix-markdown-fenced-code-type
Draft

feat(detect): index Jupyter notebooks with kernel-language fenced code blocks#1
jimwhite with Copilot wants to merge 2 commits into
v8from
copilot/fix-markdown-fenced-code-type

Conversation

Copilot AI commented Jun 28, 2026

Copy link
Copy Markdown

Ports notebook indexing from Graphify-Labs#1498 with the core fix: fenced code blocks use the actual kernel language from notebook metadata instead of hardcoded code.

Changes

graphify/detect.py

  • NOTEBOOK_EXTENSIONS = {'.ipynb'} — kept separate from CODE_EXTENSIONS/DOC_EXTENSIONS since notebooks go through sidecar conversion
  • ipynb_to_markdown(path) — converts cells to markdown; resolves fence language via metadata.language_info.namekernelspec.language"code"; handles source as either str or list[str] (both valid per Jupyter spec)
  • convert_notebook_file(path, out_dir) — sidecar logic mirroring convert_office_file(); uses content-equality check so output-only re-runs don't churn sidecar mtime
  • classify_file().ipynbDOCUMENT
  • detect() — converts notebooks to markdown sidecars before indexing

tests/test_detect.py — 14 new tests covering language resolution priority, mixed cells, mtime stability on output-only changes, detect()/detect_incremental() integration, and string vs. list source formats.

Language resolution

meta = nb.get("metadata", {})
lang = (
    meta.get("language_info", {}).get("name")   # set by kernel at runtime
    or meta.get("kernelspec", {}).get("language") # set at notebook creation
    or "code"                                     # fallback
)

A standard Python notebook produces ```python blocks; the "code" fallback applies only when metadata is absent entirely.

Copilot AI added 2 commits June 28, 2026 18:53
… code blocks

- Add NOTEBOOK_EXTENSIONS = {'.ipynb'}
- Add ipynb_to_markdown() using metadata.language_info.name / kernelspec.language
  for fenced code type (falls back to 'code' when metadata absent)
- Add convert_notebook_file() sidecar logic mirroring office files
- classify_file() now classifies .ipynb as DOCUMENT
- detect() converts notebooks to markdown sidecars
- Tests for all new behaviour including language metadata resolution
The Jupyter format allows 'source' to be either a str or list[str].
Also adds a test for the string-source case.
Copilot AI changed the title feat(detect): add Jupyter notebook support with language-aware fenced code blocks feat(detect): index Jupyter notebooks with kernel-language fenced code blocks Jun 28, 2026
Copilot AI requested a review from jimwhite June 28, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants