Skip to content

andy-emerson/ParetoLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Notebook

A browser-based notebook for data analysis. Write Python, R, SQL, and Markdown cells, run them natively in-browser, and save as .md or .qmd files. No server, no build step, no installation — everything is notebook.html.


Design Philosophy

One file. The entire application — styles, embedded Python Web Worker, and all app logic — lives in notebook.html. This is a hard constraint, not a starting point. No bundler, no npm, no external files. All dependencies are CDN-loaded at runtime and pinned to exact versions in <head>.

cells[] is the only source of truth. The DOM is always derived from the cells array, never the reverse. When a notebook loads, rebuildDOM() constructs the UI from scratch. When a cell changes, only its output node is patched. There is no two-way data binding and no sync logic.

Pyodide loads eagerly; WebR loads lazily. Pyodide starts loading immediately on page open — DuckDB lives inside it and SQL cells depend on it being ready. WebR loads on demand when the first R cell is created. maybeTriggerRuntime(type) ensures each runtime initializes at most once per session, using _triggeredRuntimes (a Set) as a guard.

.md or .qmd as the save format. When YAML frontmatter is enabled (Settings → Format), notebooks save as .qmd (Quarto Markdown); when disabled, as plain .md. Format is auto-detected on load by whether the file starts with ---. Both are human-readable, diff-friendly, and render on GitHub. parseQMD() handles nested code fences correctly using a depth counter.

Secrets never touch disk. Secrets are encrypted with AES-GCM using a device-specific key stored in IndexedDB. They are injected into the Python Worker's os.environ at runtime and are never written to notebook files.


Architecture

The diagram below shows the major components and how they connect. The Python Worker runs in a separate thread; all communication with it is message-passing. DuckDB and WebR each run in their own worker threads managed by their respective libraries.

flowchart TD
    subgraph State ["State (source of truth)"]
        cells["nb.doc.cells[ ]\ncells[0] = yaml cell"]
        liveState["nb.exec.liveState\n{ qmd, pyState }"]
    end

    subgraph CellMgmt ["Cell Management"]
        actions["add · delete · move · convert"]
        dom["rebuildDOM · createCellEl"]
        cm["CodeMirror instances\nnb.editor.cmEditors Map"]
        pkgs["Package Manager\npkgs.py · pkgs.r · pkgs.sql"]
    end

    subgraph Exec ["Execution"]
        runCell["runCell()"]
        runAll["runAll() · runAllFrom()"]
    end

    subgraph Workers ["Worker Threads"]
        subgraph PyWorker ["Python Worker (Blob URL)"]
            pyodide["Pyodide"]
            duckdb["duckdb (_ql_db)\nshared connection"]
        end
        webr["WebR"]
    end

    subgraph Rendering ["Panel Rendering"]
        cellOut["Cell Output"]
        preview["Preview"]
        schema["Tables"]
        vars["Variables"]
    end

    subgraph Persistence ["IndexedDB"]
        history["history-v2\nqmd · db snapshots"]
        secrets["secrets\nAES-GCM"]
    end

    actions --> cells
    cells --> dom --> cm
    cells --> buildQMD["buildQMD()"]

    runAll --> runCell
    runCell -->|"pyCall('run')"| pyodide
    runCell -->|"pyCall('sql')"| duckdb
    runCell -->|"two-phase IPC"| webr
    webr -->|"pyCall('sqlBridgeQuery/Write')"| duckdb

    runCell --> cellOut & preview & schema & vars
    runCell --> liveState
    liveState["nb.exec.liveState"] -->|Stop / restore| pyodide

    secrets -->|setEnv| pyodide
    pkgs -->|install| pyodide

    cells & duckdb --> history
Loading

Modules

Use Ctrl+F for #region <name> to jump to any JS section. The stylesheet has its own table of contents at the top with line references.


Stylesheet (~1,150 lines)

Organized into 9 sections. Navigate with the table of contents comment at line 44.

All colors are CSS custom properties. Functional colors (--fn-execute, --fn-success, --fn-danger) carry semantic meaning and are constant across brightness levels (with separate overrides for dark themes). Aesthetic colors (--bg-*, --text-*, --border-*) are defined per-theme.

The active theme is applied as a data-theme attribute on <html>. CM6 editor colors are driven by CSS variables via the shared EditorView.theme() extension — changing data-theme automatically updates all open editors with no additional JS.


Python Worker

The Worker source is stored in a <script id="py-worker-src" type="text/plain"> block and loaded at runtime as a Blob URL, keeping everything in one file. The main thread communicates with it exclusively through pyCall(type, payload), which returns a Promise keyed on a monotonically incrementing message ID.

The Worker exposes these commands:

Command What it does
init Load Pyodide, load duckdb+pandas, create shared _ql_db connection, patch duckdb.connect() to return it, force AGG matplotlib backend
run Execute code; returns { segments } with stdout, warning, image, and error kinds
install micropip.install(pypi)
loadPkg pyodide.loadPackage(name) — faster for Pyodide built-ins
checkSpec importlib.util.find_spec() — test if a module is importable
getVersion importlib.metadata.version()
getVars Serialize user namespace as name\x1ftype\x1fpreview records joined by \x1e
getState pickle + base64 the full user namespace
setState Deserialize and restore a pickled namespace
setEnv Inject key/value pairs into os.environ
sql Execute a SQL statement against _ql_db; returns { columns, types, rows, totalRows }
sqlExport Export all tables as parquet; returns { tables: { name: Uint8Array } }
sqlImport Drop all tables and recreate from a base64 parquet map
sqlClear Drop all tables from _ql_db
sqlBridgeQuery Execute a SQL query for the R DBI bridge; returns a JSON array of row objects
sqlBridgeWrite Write a data frame (JSON-encoded) as a new table, for the R DBI bridge
reset Delete all non-underscore names from __main__

Matplotlib capture works in two passes. _QL_MPL_SETUP (injected before user code) patches plt.show() to save the figure to __main__._ql_image instead of displaying it. _QL_MPL_CAPTURE (run after user code) catches figures from cells that never call plt.show().

DuckDB lives in the Worker. init loads the real duckdb Python package (v1.1.3, bundled with Pyodide 0.27.5), creates a shared _ql_db connection, and patches duckdb.connect() so user code gets the shared instance. SQL cells call pyCall('sql'). R DBI calls go through runR()'s two-phase IPC, which pre-fetches results via pyCall('sqlBridgeQuery/Write') between captureR calls.


§1 · Constants & Configuration

Type display config (BADGE_CLASS, BADGE_LABEL, CELL_DEFAULTS), inline SVG icon strings (ICONS), and the default YAML frontmatter applied to every new notebook.


§2 · State Management

All mutable globals live in named namespace objects:

  • nb.doc — document state: cells, cellCounter, dirty, dbDirty, source, title, frontmatter, lineNumbers
  • nb.exec — execution state: execCounter, runningAll, cancelled, liveState
  • nb.editor — editor state: cmEditors (Map of CM6 EditorView instances), cm (loaded CM6 modules + shared theme/highlight extension, null until first load), sqlSchema (table→columns map used for SQL autocompletion)
  • themebrightness / tone, persisted to localStorage
  • py — Python Worker lifecycle: worker, ready, loading, callbacks, msgId, pending
  • r — WebR: instance, loading, callbacks
  • pkgs.py — Python package state: installed, status, versions, userStatus, lastHash, installing, installPromise
  • pkgs.r — R package state: same shape as pkgs.py
  • pkgs.sql — DuckDB extension state: loaded, status

_triggeredRuntimes (a Set) stays as a standalone global.


§3 · Theme System

Six themes from a brightness × tone matrix:

Warm Cool Mono
Light linen slate cloud
Dark ember dusk carbon

theme.brightness (light / dark / auto) and theme.tone (warm / cool / neutral) are persisted to localStorage. The neutral key maps to the Mono column. getThemeName() resolves auto via window.matchMedia('(prefers-color-scheme: dark)') at call time. applyTheme() sets data-theme on <html>. CM6 editors inherit theme colors automatically via CSS variables in the shared EditorView.theme() extension — no per-editor theme updates are needed on theme change.


§4 · Runtime Initialization

maybeTriggerRuntime(type, immediate) gates on _triggeredRuntimes (a Set) so each runtime starts at most once. When immediate is false, the load is deferred to scheduleIdle() (a requestIdleCallback wrapper).

  • Pyodide (ensurePyodide): starts eagerly on page open. Creates the Python Worker from the embedded Blob, sends init (which loads Pyodide, then duckdb+pandas, creates the shared _ql_db connection, and patches duckdb.connect()). On success, kicks off background package installs for imports already detected in Python cells.
  • WebR (ensureWebR): loads lazily when the first R cell is created. Dynamically injects a type="module" script to import WebR from CDN (it cannot be loaded with a regular <script> tag), calls r.instance.init(), installs the DBI and jsonlite packages, and runs the _QL_R_DBI_DRIVER shim to register the stub DBI driver.

§5 · Core Execution

Thin wrappers over the runtimes:

  • runSQL(sql) — awaits ensurePyodide(), calls pyCall('sql', { sql }). The Worker executes against _ql_db and returns { columns, types, rows, totalRows }.
  • runPython(code) — awaits ensurePyodide(), calls pyCall('run', { code }). Returns { segments }.
  • runR(code) — awaits ensureWebR(), then runs the three-phase IPC: Phase 1 (discovery) runs code in local() with the DBI driver in stub mode to collect DB requests; Phase 2 (pre-fetch) executes those requests via pyCall('sqlBridgeQuery/Write') and writes results to WebR's FS outside captureR; Phase 3 (execution) runs the real code with the driver reading from cached files. Always calls shelter.purge() in finally. Returns { segments }.

Output segment model. All three runtimes resolve to { segments: [{ kind, content }] }. Kinds: stdout, warning, error, table, image, md. SQL DDL (CREATE/DROP/ALTER/TRUNCATE) produces empty segments; DML (INSERT/UPDATE/DELETE) produces a single stdout segment with the row count. parseCellOptions(code, type) parses #| key: value (Python/R) or --| key: value (SQL) options from the top of a cell; output: false and warning: false filter segments in the Preview panel.


§6 · Status & Utilities

  • Status pills (setStatus(which, state, label)): updates the two pills in the status bar (nb, db) by setting a state class (idle / loading / ready / error) and updating the label text. Both pills go loading/ready together during Pyodide init; nb additionally goes loading/ready when WebR loads.
  • Toast (showToast(msg)): displays a transient message for 2.5 seconds, auto-dismissed with a timeout.
  • Helpers: esc(s) for HTML-escaping strings before inserting into innerHTML, textareaResizer (a ResizeObserver that auto-sizes plain textareas), scheduleIdle(fn).

§7 · Rendering

All rendering functions are strictly one-directional: they read from state and write to the DOM.

  • renderMarkdown(src) — runs marked.parse() then renderMathInElement() (KaTeX) on the output element.
  • renderPreview() — walks cells[], renders the frontmatter header (only when nb.doc.frontmatter is true), md cells as inline prose, and code cell output segments in document style. Segments are filtered by cell.options (output: false, warning: false) and kind routing rules before rendering. Reads the echo field from cells[0].code to decide whether to show source code. No-ops if the panel is not active or the right pane is collapsed.
  • renderSchema() — queries information_schema.tables and information_schema.columns via runSQL(). Row counts are lazy-loaded on accordion open.
  • renderVariables() — calls pyCall('getVars') and R's ls() via shelter.captureR(). Both return \x1f-delimited name/type/preview records joined by \x1e.
  • renderTable(data) — interactive table with client-side column sort and pagination. renderStaticTable(data) is the non-interactive version used in Preview.
  • renderCellOutput(cell) — loops over cell.output.segments, rendering each by kind. Warning segments are filtered out if cell.options.warning === 'false'.

Each panel renderer has a schedule*Update() wrapper that coalesces rapid calls with requestAnimationFrame.


§8 · Cell Management

Cell class model

Every cell in nb.doc.cells has a type (yaml, md, python, sql, r). Two constants define class membership:

const TEXT_TYPES = new Set(['yaml', 'md']);          // render live, never executed
const CODE_TYPES = new Set(['python', 'sql', 'r']); // CM editor, executed, autosaved

This is not OOP inheritance — it's behavioral composition through shared helpers. All cells share: an id, a type, code, collapsed, and the attachCollapseHandler utility. Text cells additionally share: plain textarea editing, exclusion from runAll/resetOutputs, and the .text-cell CSS class that handles layout. Code cells additionally share: upgradeCellEditor (CodeMirror), runCell, execution counter, and autosave.

Within those classes, type-specific behavior is:

  • yaml — always cells[0], fixed position, no move/delete/type-picker. When nb.doc.frontmatter is true: shown in DOM, serializes as --- frontmatter. When false: hidden (display:none), skipped in serialization — data preserved for toggling back on.
  • md — WYSIWYG (blur renders inline, click edits), moveable, deletable, type-switchable
  • python / sql / r — different CM mode, runtime trigger, package scanner, and bridge to worker threads

createCellEl(cell) dispatches to createYamlCellEl, createMdCellEl, or the code path. convertCellType does a full DOM swap for text↔code transitions (the DOM structures are incompatible) and a fast in-place patch for code↔code. runAll and runAllFrom filter to CODE_TYPES before iterating — text cells are never in the execution loop.

§8.1 — Cell Execution

runCell(cell) is the central execution path:

  1. Ensures packages are installed (ensurePackages() / ensureRPackages())
  2. Sets cell.output = { running: true } and calls renderCellOutput()
  3. Starts a live timer that updates the execution time display at 100ms intervals
  4. Dispatches to runSQL, runPython, or runR
  5. On completion: saves nb.exec.liveState, calls renderCellOutput(), schedules panel updates, triggers history auto-save

setExecutionMode(active, current, total) swaps the topbar between its idle toolbar and a progress bar with a fractional fill. runAll() and runAllFrom(startId) set nb.exec.runningAll = true to suppress per-cell UI locking, run cells sequentially in a for loop, and call setExecutionMode() on each iteration.

§8.2 — Package Management

Package state is split by runtime: pkgs.py for Python, pkgs.r for R, pkgs.sql for DuckDB extensions. Both pkgs.py and pkgs.r track installed, status, versions, userStatus, installing, and installPromise. pkgs.py additionally tracks lastHash for change-detection. pkgs.sql tracks loaded and status.

extractImports(code) parses import X and from X import statements from Python source. extractRLibraries(code) scans library() / require() calls. extractDuckDBExtensions(sql) scans for INSTALL / LOAD statements. resolvePackageSync(importName) checks localStorage for user-defined overrides (e.g., PILPillow).

installDetectedPackage(importName, pypi) tries pyodide.loadPackage() first (faster for Pyodide built-ins), falling back to micropip.install(). discoverImportMappings(pypiName) runs after a user-added install and checks whether any previously-failed import names have become resolvable, updating the override map if so.

R libraries are installed via r.instance.installPackages().

§8.3 — Cell Actions

  • addCell(type, code, index) — splices into cells[], creates a DOM node via createCellEl(), and appends an insert zone after it
  • deleteCell(id) — splices from cells[], calls view.destroy() to tear down the CM6 editor, removes the cell element and its following insert zone
  • swapCells(id, dir) — swaps adjacent entries in cells[] and swaps DOM nodes in place
  • convertCellType(cell, el, newType) — for text↔code transitions, does a full DOM replaceChild (the element structures are incompatible); for code↔code, fast-patches the badge and swaps the CM mode in place
  • createInsertBtn() — returns an insert zone <div> with four typed buttons. No event listeners are attached here. All insert-zone clicks are handled by a single delegated listener on #nb-scroll in §12 that computes the insertion index at click time.

§8.4 — CodeMirror Integration

CM6 is loaded lazily on first use. loadCM6() dynamically imports all CM6 packages via the <script type="importmap"> in <head>, builds a shared EditorView.theme() extension using CSS variables (so editors inherit theme colors automatically), and caches everything in nb.editor.cm.

upgradeCellEditor(el, cell) replaces a <textarea> with a CM6 EditorView. Each editor gets a Compartment stored as view._lang for hot-swapping the language extension. langExtension(type) returns the appropriate CM6 language pack: python(), sql({ schema: nb.editor.sqlSchema }), or a StreamLanguage wrapper for R. Text cells (yaml, md) stay as plain textareas.

updateSQLCompletionSchema() calls fetchSQLSchema() (which queries information_schema) and reconfigures all open SQL editors via view._lang.reconfigure(langExtension('sql')). This runs after any DDL via scheduleSchemaUpdate() and after DB import/export via maybeSaveDbSnapshot().

§8.5 — Cell DOM Creation

createCellEl(cell) dispatches to createYamlCellEl, createMdCellEl, or the code-cell path based on type. Shared helpers — attachDragHandler, attachCollapseHandler, attachBadgeHandler — are called as appropriate. rebuildDOM() destroys all CM6 editors (via view.destroy()), clears nb.editor.cmEditors, clears the scroll container (preserving only the watermark <div>), and rebuilds from cells[].


§9 · QMD Parsing & Building

buildQMD() iterates nb.doc.cells — when nb.doc.frontmatter is true the yaml cell (cells[0]) serializes as --- frontmatter, otherwise it is skipped. md cells become prose, code cells become fenced ```{type} blocks. The file extension follows nb.doc.frontmatter: .qmd or .md.

parseQMD(text) splits on lines and uses a depth counter to correctly handle Markdown cells that contain their own fenced code blocks. Always creates a yaml cell as cells[0] (using the default frontmatter if none is present in the file). Returns { name, cells }. loadNotebook sets nb.doc.frontmatter by checking text.trimStart().startsWith('---').

File I/O: saveNotebook() triggers a browser download. loadNotebook(arrayBuffer, filename) decodes, parses, resets nb.doc.cells, calls rebuildDOM(), and immediately calls maybeTriggerRuntime() for each cell type present in the file.


§10 · Database Operations

All DB operations route through the Pyodide Worker — there is no separate DuckDB runtime.

exportDBBundle() calls pyCall('sqlExport'). The Worker issues COPY TO '*.parquet' per table into /tmp, reads each file, base64-encodes it, and returns a JSON map. The main thread decodes it back to { tables: { name: Uint8Array } } — parquet bytes, no encoding at rest in IndexedDB.

importDBBundle(bundle) converts the Uint8Array values to base64 and calls pyCall('sqlImport', { data }). The Worker drops all existing tables, writes each parquet blob to /tmp, and recreates tables via CREATE TABLE AS SELECT * FROM read_parquet(...).

importDB(arrayBuffer) is the user-facing file import path. _bytesToBundle(bytes) decodes the legacy base64 JSON format (_ql: 1, detected by first byte {) into a bundle object, then calls importDBBundle.

exportDB() serializes the bundle back to base64 JSON for the .db file download (the only place encoding is needed).


§11 · Panels & Features

§11.1 — Secrets Manager (IIFE)

Self-contained IIFE that exposes window.secretsInit() and window._secretsInjectEnv(). Uses a device-specific AES-GCM-256 key generated once and stored in IndexedDB (notebook-secrets). Both key names and values are encrypted separately per row. injectEnv() calls pyCall('setEnv', { secrets }) to push decrypted pairs into the Worker's os.environ.

§11.2 — Pane Resize (IIFEs)

Two self-contained IIFEs manage the left and right drag handles and collapse tabs. Pane widths are persisted to localStorage. The right pane's max width is computed dynamically as Math.floor((window.innerWidth - leftPane.offsetWidth) / 2). When the right pane is expanded, schedulePreviewUpdate(true) is called with a 50ms delay to let the layout settle before rendering.

§11.3 — History Panel

Two IndexedDB object stores in notebook-history-v2: qmd-snapshots stores the .qmd string; db-snapshots stores a bundle object { tables: { name: Uint8Array } } — parquet bytes per table stored directly, no encoding. Every entry includes a size field (sum of parquet buffer lengths).

autoTrimForCap(newEntrySize) deletes the oldest auto-saves (sorted by ts) until used + newEntrySize <= histStorageCap. Manual saves over cap show a dialog offering to raise the cap, delete oldest entries, or cancel. maybeSaveSnapshot() and maybeSaveDbSnapshot() are no-ops when nb.doc.dirty / nb.doc.dbDirty are false.


§12 · Event Listeners & Initialization

Wires all toolbar buttons, file inputs, drag-and-drop, left/right pane tab switching, and theme buttons. The delegated insert-zone handler on #nb-scroll computes insertion index by finding the clicked zone's position among all .cell-insert elements at click time.

Startup sequence:

  1. rebuildDOM() — build the empty notebook
  2. renderPackagesPanel() · renderUserPackagesPanel() · renderRUserPackagesPanel()
  3. renderHistoryPanel()
  4. Set initial status pills
  5. secretsInit()
  6. upgradeAllCells() — lazy-load CM6, attach EditorView to all code cells
  7. schedulePreviewUpdate(true) — initial preview render

CDN Dependencies

Library Version Purpose
Pyodide 0.27.5 Python runtime + duckdb v1.1.3 (Web Worker, eager load)
WebR 0.4.2 R runtime (lazy load)
CodeMirror 6 Cell syntax highlighting (loaded via import map + dynamic import())
marked.js 9.1.6 Markdown rendering
js-yaml 4.1.0 Preamble YAML parsing
KaTeX 0.16.9 LaTeX math rendering
Tabler Icons 3.19.0 UI icons

Development

Open notebook.html in a browser. Edit, refresh, test manually — there is no build step.

To navigate the source, use Ctrl+F for #region <name> in the JS or the section numbers in the CSS table of contents at line 44.

About

A browser-based data notebook that runs Python, R, and SQL natively. No server, no installation, no build step. Everything is a single HTML file. Open it, run code, save your work as markdown documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages