Skip to content

Redesign migration API around Conversion/Parse result types#30

Merged
gschlager merged 3 commits into
mainfrom
claude/refine-local-plan-dMDie
Jun 11, 2026
Merged

Redesign migration API around Conversion/Parse result types#30
gschlager merged 3 commits into
mainfrom
claude/refine-local-plan-dMDie

Conversation

@gschlager

@gschlager gschlager commented May 5, 2026

Copy link
Copy Markdown
Member

Summary

Reshapes the Markbridge public API to return structured result objects (Parse and Conversion) instead of raw strings, consolidates render-side customization into a single renderer: parameter built via a discourse_renderer(...) factory, and adds AST-mutation hooks plus selective escaper customisation. Eliminates the global configuration singleton and per-process default registries in favor of explicit, reusable renderer instances.

Squashed to a single commit on the current origin/main, plus dependency-update and version-bump commits. Linear history; no merge commits.

Bumps the gem version to 0.2.0 — merging this PR publishes the gem and creates the GitHub release automatically (the publish job releases on push to main whenever version.rb carries a new version).

Key Changes

  • New result types: Parse (parse-only) and Conversion (render) — Data.define value objects wrapping AST, markdown output, format, unknown tags, diagnostics, and errors

    • parse_* methods return Parse; *_to_markdown methods return Conversion
    • Conversion#to_s delegates to markdown for string-coercion contexts; Conversion#parsed exposes the underlying Parse
  • Removed global configuration: Markbridge::Configuration, .configuration, .configure, .reset_defaults!, and all default_* accessors are gone

  • Renderer factory: Markbridge.discourse_renderer(...) builds reusable Renderer instances

    • Accepts: tags:, tag_library:, unregister:, escaper:, escape:, escape_hard_line_breaks:, allow:, postprocessor:, strip_trailing_invisibles:
  • Simplified method signatures: *_to_markdown and convert now accept only handlers:, renderer:, raise_on_error: plus an optional block yielding the parsed AST between parse and render

  • Markbridge.render accepts a Parse or an AST::Node: when given a Parse, the resulting Conversion carries unknown_tags, diagnostics, and source format forward; when given a bare AST node those default to empty and format is nil — there was no source document, so there is no source format to report. A bare non-Document node is wrapped in an AST::Document, so Conversion#ast is always a Document

  • AST traversal/mutation helpers: AST::Element#each_descendant, #descendants(klass = nil), and #replace_child(old, new) support mutating the AST between parse and render

  • AST::Details + DetailsTag: renders Discourse [details=…]…[/details] collapsible sections. No built-in parser produces the node — it exists for custom migration handlers

  • Handler registry #overlay: BBCode, HTML, and TextFormatter registries gain #overlay(name) { |previous| ... } for wrapping defaults

  • Pre-parsed Nokogiri input: the HTML and TextFormatter parsers accept a Nokogiri::XML::Node (DocumentFragment, Document, Element) in addition to Strings, so importers can run their own Nokogiri-driven pre-processing on the live tree without a re-encode/re-parse round-trip

  • TagLibrary#unregister: removes a binding so the renderer falls through to render_children (auto-passthrough for unregistered AST classes)

  • Postprocessor: extracted from Markbridge#cleanup_markdown into Renderers::Discourse::Postprocessor; injectable on the Renderer

  • MarkdownEscaper#allow: for selective passthrough: replaces the subclass-and-override pattern. Recognised keys: :bullet_list, :ordered_list, :atx_heading, :block_quote. Alias :lists expands to [:bullet_list, :ordered_list]. Unknown keys raise. Marker bytes pass through verbatim while inline content after the marker is still escaped.

  • IdentityEscaper + escape: false sugar: new no-op escaper for migration paths where source content is already trusted Markdown. discourse_renderer(escape: false) swaps it in. Mutually exclusive with escape_hard_line_breaks: / allow:; an explicit escaper: always wins.

  • Class-only handlers: BBCode, HTML, and TextFormatter parsers now require objects responding to #process(...). The br/hr lambdas in HTML's default registry became Handlers::SelfClosingHandler; the example file's lambda demos became Handler classes.

    Note on the rebase: main shipped its own VoidHandler (same shape, different name) to fix the related whitespace-trim bug. The rebase unified these — kept this branch's SelfClosingHandler name (covers both <br> and <hr> since they're both self-closing void elements per HTML), kept main's whitespace-trim behavior via produces_block? (now simplified — no Proc gate needed).

  • MediaWiki API consistency: inline_tag_registry: parameter renamed to handlers: for parity with sibling parsers

  • TextFormatter handler signature: handlers accept processor: so they can call back into processor.process_children(xml, ast_node)

Notable Implementation Details

  • Parse and Conversion use Data.define (frozen, structural equality)
  • Renderer instances are designed to be built once and reused across thousands of posts
  • The raise_on_error: false mode allows per-row failure isolation in batch migrations
  • Auto-passthrough for unregistered AST classes eliminates boilerplate "passthrough" Tags
  • MarkdownEscaper#allow: storage is a private Array (≤ 4 elements) — #include? is observably identical to Set#include? at this size, no allocation
  • examples/forum_migration.rb exercises the migration API end-to-end: discourse_renderer factory, tags:, unregister:, allow: :lists, the AST-mutation block, Conversion#errors, raise_on_error: false, and the convert(format:) dispatcher
  • UPGRADING.md covers every breaking change

Migration-side resolution

The migration use case resolves placeholders (uploads, mentions, internal links) at parse time via custom handler subclasses. The handler stores the source-side reference in the converter's upload/user/topic store, gets back a stable identifier, and pins it on the AST node directly. Renderer Tags stay trivial output formatters — no per-post state, no side-channel.

The earlier drafts of this PR shipped an interface.emit / Conversion#emissions API for render-time side data; that has been removed. Resolution is a write, not a read — it should happen exactly once per source attachment regardless of how many times the post is rendered, the lookup table varies per post (defeating "build-once-reuse-many" if state lived in Tag constructors), and re-renders should be pure functions of the AST. With resolution in the handler, every job emit was trying to do is now done by something simpler. The BaseAttachmentHandler template-method pattern (extract source ref → store_or_lookup → AttachmentPlaceholder.new) reuses cleanly across phpBB / vBulletin / SMF / IPB attachment handlers — that layer lives in the converter framework, not Markbridge.

Tests

  • All existing system and unit tests updated for the new result types
  • New tests for discourse_renderer, #overlay, #unregister, Postprocessor, MarkdownEscaper#allow:, IdentityEscaper, polymorphic Markbridge.render(Parse|AST), AST-mutation block form, SelfClosingHandler, AST::Details/DetailsTag, the Element traversal helpers, and pre-parsed Nokogiri input
  • Removed tests for deleted Configuration, interface.emit, Conversion#emissions, with_provisional_emissions, obsolete Proc-handler shape
  • 3322 examples green at HEAD (114095d)
  • Mutation: 100% coverage on every subject changed since main on every local pass

This change is Reviewable

gschlager pushed a commit that referenced this pull request May 5, 2026
Fixes the lint CI failure on PR #30 — six files had drifted from
syntax_tree's expected format. No semantic changes.
gschlager added a commit that referenced this pull request May 6, 2026
Pivot the docs site to the task-oriented "I'm migrating a forum to
Discourse" perspective and bring every page in sync with the migration-
API redesign coming in PR #30.

New top-level "Migrating to Discourse" section with overview,
placeholders, and a full walkthrough that ports examples/forum_migration.rb
end-to-end. The placeholders page makes explicit what was implicit:
Tag return strings are spliced verbatim — only AST::Text goes through
the escaper — so placeholder sigils reach the output untouched.

The old guides/configuration.md is gone; renderer-side knobs move to
a new customization/customizing-renderer.md describing the
Markbridge.discourse_renderer(...) factory. PR30's UPGRADING.md is
ported under reference/upgrading.md.

Existing pages updated for: Conversion return type, removed
Markbridge.configure global, MediaWiki handlers: kwarg rename,
TextFormatter handler processor: arg, interface.emit and html_mode?
on the rendering interface table, the build-once / reuse-many
renderer pattern.

MIGRATION_API_PLAN.md updated to reflect §1-3 implemented in PR30,
§4 (DSL) considered and rejected, and the resolved open questions.

Every code sample in the migration section was verified against PR30's
branch via a sibling worktree before commit.
gschlager pushed a commit that referenced this pull request May 11, 2026
Fixes the lint CI failure on PR #30 — six files had drifted from
syntax_tree's expected format. No semantic changes.
@gschlager gschlager force-pushed the claude/refine-local-plan-dMDie branch from 0aec929 to 29fc9f2 Compare May 11, 2026 10:06
gschlager pushed a commit that referenced this pull request May 11, 2026
Fixes the lint CI failure on PR #30 — six files had drifted from
syntax_tree's expected format. No semantic changes.
@gschlager gschlager force-pushed the claude/refine-local-plan-dMDie branch 3 times, most recently from 7aab7ed to 90648bf Compare May 11, 2026 22:06
gschlager pushed a commit that referenced this pull request May 12, 2026
Fixes the lint CI failure on PR #30 — six files had drifted from
syntax_tree's expected format. No semantic changes.
@gschlager gschlager force-pushed the claude/refine-local-plan-dMDie branch from 90648bf to e850d1b Compare May 12, 2026 19:07
@gschlager gschlager requested a review from Copilot May 31, 2026 14:55

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reshapes Markbridge’s migration-facing public API to return structured Parse/Conversion result objects and to route render customization through explicit, reusable Renderer instances (via Markbridge.discourse_renderer(...)), replacing prior global/singleton configuration patterns.

Changes:

  • Introduces/standardizes Parse and Conversion result types across parse/convert entrypoints, including raise_on_error: error-collection mode.
  • Consolidates Discourse rendering customization behind a renderer factory and expands renderer/tag capabilities (e.g., new Details support, tag library unregistering, postprocessing).
  • Updates docs and unit specs to cover the new API and behaviors.

Reviewed changes

Copilot reviewed 70 out of 70 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
UPGRADING.md New/expanded upgrade guide describing the redesigned API and migration patterns.
docs/renderers/discourse.md Updates renderer documentation around configuration and usage patterns.
lib/markbridge.rb Public API surface and YARD docs for parse/convert/render entrypoints and error-handling semantics.
lib/markbridge/parsers/html/parser.rb HTML parser input handling for pre-parsed Nokogiri nodes.
lib/markbridge/renderers/discourse/postprocessor.rb Extracted postprocessing/cleanup pipeline for rendered Markdown.
lib/markbridge/parsers/text_formatter/handler_registry.rb Handler registry behavior updates (e.g., overlay, processor plumbing) supporting the new handler API.
spec/unit/markbridge/renderers/discourse/tags/table_tag_spec.rb Adds coverage for “effectively empty table” rendering behavior.
spec/unit/markbridge/renderers/discourse/tags/details_tag_spec.rb Adds coverage for the new Details tag in markdown + html_mode.
spec/unit/markbridge/renderers/discourse/tag_library_spec.rb Adds coverage for TagLibrary#unregister and merge semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread UPGRADING.md
Comment thread UPGRADING.md Outdated
Comment thread docs/renderers/discourse.md Outdated
Comment thread lib/markbridge.rb Outdated
Comment thread lib/markbridge/parsers/html/parser.rb
Comment thread lib/markbridge/renderers/discourse/postprocessor.rb Outdated
gschlager added a commit that referenced this pull request Jun 10, 2026
Add §5 to MIGRATION_API_PLAN.md capturing the design conclusion that
placeholder resolution belongs in custom handler subclasses at parse
time, not in renderer Tags at render time.

Implications, all written out:

  - The converter framework (a layer above Markbridge) owns the
    resolution-aware base handler classes, the placeholder AST
    classes, and the trivial render Tags. Markbridge keeps the parser
    primitives. Per-format converters (phpBB / vBulletin / SMF /
    IPB / XenForo) subclass the bases via template method, only
    overriding source-ref extraction.
  - Resolution is a *write* (store the upload, get the id), not a
    pure lookup. Parse-time resolution gives that side effect a
    natural once-per-attachment home; render-time would re-store on
    every render or need an in-Tag cache.
  - AST nodes carry resolved data; render Tags become trivial.
    `interface.emit` and `Conversion#emissions` (§2) are no longer
    needed and should be removed from PR #30 before merge.
  - Section ends with a concrete file-by-file change list for PR #30
    plus the docs-branch follow-up needed once §2 is removed.

§2 above is now marked obsoleted-by-§5. Working branch section
unchanged.

@gschlager gschlager left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gschlager partially reviewed 71 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion.


lib/markbridge.rb line 109 at r2 (raw file):

      ast = parser.parse(input)

      # stree-ignore

Why is there a # stree-ignore?

Code quote:

# stree-ignore

Reshapes the Markbridge public API to return structured result objects
(Parse and Conversion) instead of raw strings, consolidates render-side
customization into a single renderer: parameter built via a
discourse_renderer(...) factory, and adds AST-mutation hooks plus
selective escaper customization. Eliminates the global configuration
singleton and per-process default registries in favor of explicit,
reusable renderer instances.

Result types and entry points:

* Parse (parse-only) and Conversion (render) — Data.define value
  objects wrapping AST, markdown output, source format, unknown tags,
  diagnostics, and errors. parse_* methods return Parse; *_to_markdown
  methods return Conversion. Conversion#to_s delegates to #markdown
  for string-coercion contexts, and Conversion#parsed exposes the
  underlying Parse.
* Markbridge::Configuration, .configuration, .configure,
  .reset_defaults!, and all default_* accessors are removed.
* Markbridge.convert(format:) dispatches to the per-format methods;
  *_to_markdown and convert accept only handlers:, renderer:,
  raise_on_error:, plus an optional block yielding the parsed AST
  between parse and render.
* Markbridge.render accepts a Parse or an AST::Node. Given a Parse,
  the Conversion carries unknown_tags, diagnostics, and source format
  forward. Given a bare AST node, those default to empty and format
  is nil (there is no source document); a non-Document node is wrapped
  in an AST::Document so Conversion#ast is always a Document.
* raise_on_error: false collects render errors on Conversion#errors
  instead of raising, allowing per-row failure isolation in batch
  migrations.

Renderer customization:

* Markbridge.discourse_renderer(...) builds reusable Renderer
  instances, designed to be created once and shared across thousands
  of posts. Accepts tags:, tag_library:, unregister:, escaper:,
  escape:, escape_hard_line_breaks:, allow:, postprocessor:, and
  strip_trailing_invisibles:.
* TagLibrary#unregister removes a binding so the renderer falls
  through to render_children — auto-passthrough for unregistered AST
  classes, eliminating boilerplate passthrough Tags.
* Markdown cleanup is extracted from Markbridge#cleanup_markdown into
  Renderers::Discourse::Postprocessor, injectable on the Renderer.
* MarkdownEscaper#allow: gives selective marker passthrough, replacing
  the subclass-and-override pattern. Recognized keys: :bullet_list,
  :ordered_list, :atx_heading, :block_quote; :lists expands to the
  first two; unknown keys raise. Marker bytes pass through verbatim
  while inline content after the marker is still escaped.
* IdentityEscaper is a no-op escaper for migration paths where source
  content is already trusted Markdown; discourse_renderer(escape:
  false) swaps it in. Mutually exclusive with
  escape_hard_line_breaks:/allow:; an explicit escaper: always wins.

Parser-side extensibility:

* BBCode, HTML, and TextFormatter handler registries gain
  #overlay(name) { |previous| ... } for wrapping default handlers.
* The HTML and TextFormatter parsers accept pre-parsed Nokogiri input
  (DocumentFragment, Document, Element) in addition to Strings, so
  importers can run their own Nokogiri-driven pre-processing on the
  live tree without a re-encode/re-parse round-trip.
* Handlers must be objects responding to #process(...); Proc handler
  support is dropped. The br/hr lambdas in HTML's default registry
  became Handlers::SelfClosingHandler, which subsumes main's
  VoidHandler (same shape) while keeping its whitespace-trim behavior
  via produces_block?.
* TextFormatter handlers receive processor: so they can call back into
  processor.process_children(xml, ast_node).
* The MediaWiki parser's inline_tag_registry: parameter is renamed to
  handlers: for parity with the sibling parsers.

AST additions:

* AST::Details + DetailsTag render Discourse
  [details=…]…[/details] collapsible sections; no built-in parser
  produces the node — it exists for custom migration handlers.
* AST::Element gains #each_descendant, #descendants(klass = nil), and
  #replace_child(old, new) to support AST mutation between parse and
  render.

The migration use case resolves placeholders (uploads, mentions,
internal links) at parse time via custom handler subclasses that pin
stable identifiers on AST nodes; renderer Tags stay trivial output
formatters with no per-post state. The earlier interface.emit /
Conversion#emissions API for render-time side data is removed:
resolution is a write that should happen exactly once per source
attachment regardless of render count, and re-renders should be pure
functions of the AST.

examples/forum_migration.rb exercises the migration API end to end,
and UPGRADING.md covers every breaking change. 100% mutation coverage
on every subject changed since main.
@gschlager gschlager force-pushed the claude/refine-local-plan-dMDie branch 2 times, most recently from 1f1cf3e to 9ca20ef Compare June 11, 2026 23:14
@gschlager gschlager force-pushed the claude/refine-local-plan-dMDie branch from 9ca20ef to 114095d Compare June 11, 2026 23:17
@gschlager gschlager marked this pull request as ready for review June 11, 2026 23:18
@gschlager gschlager merged commit bb97c8a into main Jun 11, 2026
7 checks passed
@gschlager gschlager deleted the claude/refine-local-plan-dMDie branch June 11, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants