Draft
feat(detect): index Jupyter notebooks with kernel-language fenced code blocks#1
Conversation
… code blocks
- Add NOTEBOOK_EXTENSIONS = {'.ipynb'}
- Add ipynb_to_markdown() using metadata.language_info.name / kernelspec.language
for fenced code type (falls back to 'code' when metadata absent)
- Add convert_notebook_file() sidecar logic mirroring office files
- classify_file() now classifies .ipynb as DOCUMENT
- detect() converts notebooks to markdown sidecars
- Tests for all new behaviour including language metadata resolution
The Jupyter format allows 'source' to be either a str or list[str]. Also adds a test for the string-source case.
Copilot
AI
changed the title
feat(detect): add Jupyter notebook support with language-aware fenced code blocks
feat(detect): index Jupyter notebooks with kernel-language fenced code blocks
Jun 28, 2026
Copilot created this pull request from a session on behalf of
jimwhite
June 28, 2026 18:55
View session
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ports notebook indexing from Graphify-Labs#1498 with the core fix: fenced code blocks use the actual kernel language from notebook metadata instead of hardcoded
code.Changes
graphify/detect.pyNOTEBOOK_EXTENSIONS = {'.ipynb'}— kept separate fromCODE_EXTENSIONS/DOC_EXTENSIONSsince notebooks go through sidecar conversionipynb_to_markdown(path)— converts cells to markdown; resolves fence language viametadata.language_info.name→kernelspec.language→"code"; handlessourceas eitherstrorlist[str](both valid per Jupyter spec)convert_notebook_file(path, out_dir)— sidecar logic mirroringconvert_office_file(); uses content-equality check so output-only re-runs don't churn sidecar mtimeclassify_file()—.ipynb→DOCUMENTdetect()— converts notebooks to markdown sidecars before indexingtests/test_detect.py— 14 new tests covering language resolution priority, mixed cells, mtime stability on output-only changes,detect()/detect_incremental()integration, and string vs. list source formats.Language resolution
A standard Python notebook produces
```pythonblocks; the"code"fallback applies only when metadata is absent entirely.