Redesign migration API around Conversion/Parse result types#30
Conversation
Fixes the lint CI failure on PR #30 — six files had drifted from syntax_tree's expected format. No semantic changes.
Pivot the docs site to the task-oriented "I'm migrating a forum to Discourse" perspective and bring every page in sync with the migration- API redesign coming in PR #30. New top-level "Migrating to Discourse" section with overview, placeholders, and a full walkthrough that ports examples/forum_migration.rb end-to-end. The placeholders page makes explicit what was implicit: Tag return strings are spliced verbatim — only AST::Text goes through the escaper — so placeholder sigils reach the output untouched. The old guides/configuration.md is gone; renderer-side knobs move to a new customization/customizing-renderer.md describing the Markbridge.discourse_renderer(...) factory. PR30's UPGRADING.md is ported under reference/upgrading.md. Existing pages updated for: Conversion return type, removed Markbridge.configure global, MediaWiki handlers: kwarg rename, TextFormatter handler processor: arg, interface.emit and html_mode? on the rendering interface table, the build-once / reuse-many renderer pattern. MIGRATION_API_PLAN.md updated to reflect §1-3 implemented in PR30, §4 (DSL) considered and rejected, and the resolved open questions. Every code sample in the migration section was verified against PR30's branch via a sibling worktree before commit.
Fixes the lint CI failure on PR #30 — six files had drifted from syntax_tree's expected format. No semantic changes.
0aec929 to
29fc9f2
Compare
Fixes the lint CI failure on PR #30 — six files had drifted from syntax_tree's expected format. No semantic changes.
7aab7ed to
90648bf
Compare
Fixes the lint CI failure on PR #30 — six files had drifted from syntax_tree's expected format. No semantic changes.
90648bf to
e850d1b
Compare
There was a problem hiding this comment.
Pull request overview
Reshapes Markbridge’s migration-facing public API to return structured Parse/Conversion result objects and to route render customization through explicit, reusable Renderer instances (via Markbridge.discourse_renderer(...)), replacing prior global/singleton configuration patterns.
Changes:
- Introduces/standardizes
ParseandConversionresult types across parse/convert entrypoints, includingraise_on_error:error-collection mode. - Consolidates Discourse rendering customization behind a renderer factory and expands renderer/tag capabilities (e.g., new
Detailssupport, tag library unregistering, postprocessing). - Updates docs and unit specs to cover the new API and behaviors.
Reviewed changes
Copilot reviewed 70 out of 70 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| UPGRADING.md | New/expanded upgrade guide describing the redesigned API and migration patterns. |
| docs/renderers/discourse.md | Updates renderer documentation around configuration and usage patterns. |
| lib/markbridge.rb | Public API surface and YARD docs for parse/convert/render entrypoints and error-handling semantics. |
| lib/markbridge/parsers/html/parser.rb | HTML parser input handling for pre-parsed Nokogiri nodes. |
| lib/markbridge/renderers/discourse/postprocessor.rb | Extracted postprocessing/cleanup pipeline for rendered Markdown. |
| lib/markbridge/parsers/text_formatter/handler_registry.rb | Handler registry behavior updates (e.g., overlay, processor plumbing) supporting the new handler API. |
| spec/unit/markbridge/renderers/discourse/tags/table_tag_spec.rb | Adds coverage for “effectively empty table” rendering behavior. |
| spec/unit/markbridge/renderers/discourse/tags/details_tag_spec.rb | Adds coverage for the new Details tag in markdown + html_mode. |
| spec/unit/markbridge/renderers/discourse/tag_library_spec.rb | Adds coverage for TagLibrary#unregister and merge semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add §5 to MIGRATION_API_PLAN.md capturing the design conclusion that
placeholder resolution belongs in custom handler subclasses at parse
time, not in renderer Tags at render time.
Implications, all written out:
- The converter framework (a layer above Markbridge) owns the
resolution-aware base handler classes, the placeholder AST
classes, and the trivial render Tags. Markbridge keeps the parser
primitives. Per-format converters (phpBB / vBulletin / SMF /
IPB / XenForo) subclass the bases via template method, only
overriding source-ref extraction.
- Resolution is a *write* (store the upload, get the id), not a
pure lookup. Parse-time resolution gives that side effect a
natural once-per-attachment home; render-time would re-store on
every render or need an in-Tag cache.
- AST nodes carry resolved data; render Tags become trivial.
`interface.emit` and `Conversion#emissions` (§2) are no longer
needed and should be removed from PR #30 before merge.
- Section ends with a concrete file-by-file change list for PR #30
plus the docs-branch follow-up needed once §2 is removed.
§2 above is now marked obsoleted-by-§5. Working branch section
unchanged.
gschlager
left a comment
There was a problem hiding this comment.
@gschlager partially reviewed 71 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion.
lib/markbridge.rb line 109 at r2 (raw file):
ast = parser.parse(input) # stree-ignore
Why is there a # stree-ignore?
Code quote:
# stree-ignoreReshapes the Markbridge public API to return structured result objects
(Parse and Conversion) instead of raw strings, consolidates render-side
customization into a single renderer: parameter built via a
discourse_renderer(...) factory, and adds AST-mutation hooks plus
selective escaper customization. Eliminates the global configuration
singleton and per-process default registries in favor of explicit,
reusable renderer instances.
Result types and entry points:
* Parse (parse-only) and Conversion (render) — Data.define value
objects wrapping AST, markdown output, source format, unknown tags,
diagnostics, and errors. parse_* methods return Parse; *_to_markdown
methods return Conversion. Conversion#to_s delegates to #markdown
for string-coercion contexts, and Conversion#parsed exposes the
underlying Parse.
* Markbridge::Configuration, .configuration, .configure,
.reset_defaults!, and all default_* accessors are removed.
* Markbridge.convert(format:) dispatches to the per-format methods;
*_to_markdown and convert accept only handlers:, renderer:,
raise_on_error:, plus an optional block yielding the parsed AST
between parse and render.
* Markbridge.render accepts a Parse or an AST::Node. Given a Parse,
the Conversion carries unknown_tags, diagnostics, and source format
forward. Given a bare AST node, those default to empty and format
is nil (there is no source document); a non-Document node is wrapped
in an AST::Document so Conversion#ast is always a Document.
* raise_on_error: false collects render errors on Conversion#errors
instead of raising, allowing per-row failure isolation in batch
migrations.
Renderer customization:
* Markbridge.discourse_renderer(...) builds reusable Renderer
instances, designed to be created once and shared across thousands
of posts. Accepts tags:, tag_library:, unregister:, escaper:,
escape:, escape_hard_line_breaks:, allow:, postprocessor:, and
strip_trailing_invisibles:.
* TagLibrary#unregister removes a binding so the renderer falls
through to render_children — auto-passthrough for unregistered AST
classes, eliminating boilerplate passthrough Tags.
* Markdown cleanup is extracted from Markbridge#cleanup_markdown into
Renderers::Discourse::Postprocessor, injectable on the Renderer.
* MarkdownEscaper#allow: gives selective marker passthrough, replacing
the subclass-and-override pattern. Recognized keys: :bullet_list,
:ordered_list, :atx_heading, :block_quote; :lists expands to the
first two; unknown keys raise. Marker bytes pass through verbatim
while inline content after the marker is still escaped.
* IdentityEscaper is a no-op escaper for migration paths where source
content is already trusted Markdown; discourse_renderer(escape:
false) swaps it in. Mutually exclusive with
escape_hard_line_breaks:/allow:; an explicit escaper: always wins.
Parser-side extensibility:
* BBCode, HTML, and TextFormatter handler registries gain
#overlay(name) { |previous| ... } for wrapping default handlers.
* The HTML and TextFormatter parsers accept pre-parsed Nokogiri input
(DocumentFragment, Document, Element) in addition to Strings, so
importers can run their own Nokogiri-driven pre-processing on the
live tree without a re-encode/re-parse round-trip.
* Handlers must be objects responding to #process(...); Proc handler
support is dropped. The br/hr lambdas in HTML's default registry
became Handlers::SelfClosingHandler, which subsumes main's
VoidHandler (same shape) while keeping its whitespace-trim behavior
via produces_block?.
* TextFormatter handlers receive processor: so they can call back into
processor.process_children(xml, ast_node).
* The MediaWiki parser's inline_tag_registry: parameter is renamed to
handlers: for parity with the sibling parsers.
AST additions:
* AST::Details + DetailsTag render Discourse
[details=…]…[/details] collapsible sections; no built-in parser
produces the node — it exists for custom migration handlers.
* AST::Element gains #each_descendant, #descendants(klass = nil), and
#replace_child(old, new) to support AST mutation between parse and
render.
The migration use case resolves placeholders (uploads, mentions,
internal links) at parse time via custom handler subclasses that pin
stable identifiers on AST nodes; renderer Tags stay trivial output
formatters with no per-post state. The earlier interface.emit /
Conversion#emissions API for render-time side data is removed:
resolution is a write that should happen exactly once per source
attachment regardless of render count, and re-renders should be pure
functions of the AST.
examples/forum_migration.rb exercises the migration API end to end,
and UPGRADING.md covers every breaking change. 100% mutation coverage
on every subject changed since main.
1f1cf3e to
9ca20ef
Compare
9ca20ef to
114095d
Compare
Summary
Reshapes the Markbridge public API to return structured result objects (
ParseandConversion) instead of raw strings, consolidates render-side customization into a singlerenderer:parameter built via adiscourse_renderer(...)factory, and adds AST-mutation hooks plus selective escaper customisation. Eliminates the global configuration singleton and per-process default registries in favor of explicit, reusable renderer instances.Squashed to a single commit on the current
origin/main, plus dependency-update and version-bump commits. Linear history; no merge commits.Bumps the gem version to 0.2.0 — merging this PR publishes the gem and creates the GitHub release automatically (the publish job releases on push to
mainwheneverversion.rbcarries a new version).Key Changes
New result types:
Parse(parse-only) andConversion(render) —Data.definevalue objects wrapping AST, markdown output, format, unknown tags, diagnostics, and errorsparse_*methods returnParse;*_to_markdownmethods returnConversionConversion#to_sdelegates tomarkdownfor string-coercion contexts;Conversion#parsedexposes the underlyingParseRemoved global configuration:
Markbridge::Configuration,.configuration,.configure,.reset_defaults!, and alldefault_*accessors are goneRenderer factory:
Markbridge.discourse_renderer(...)builds reusableRendererinstancestags:,tag_library:,unregister:,escaper:,escape:,escape_hard_line_breaks:,allow:,postprocessor:,strip_trailing_invisibles:Simplified method signatures:
*_to_markdownandconvertnow accept onlyhandlers:,renderer:,raise_on_error:plus an optional block yielding the parsed AST between parse and renderMarkbridge.renderaccepts aParseor anAST::Node: when given a Parse, the resulting Conversion carriesunknown_tags,diagnostics, and sourceformatforward; when given a bare AST node those default to empty andformatisnil— there was no source document, so there is no source format to report. A bare non-Document node is wrapped in anAST::Document, soConversion#astis always a DocumentAST traversal/mutation helpers:
AST::Element#each_descendant,#descendants(klass = nil), and#replace_child(old, new)support mutating the AST between parse and renderAST::Details+DetailsTag: renders Discourse[details=…]…[/details]collapsible sections. No built-in parser produces the node — it exists for custom migration handlersHandler registry
#overlay: BBCode, HTML, and TextFormatter registries gain#overlay(name) { |previous| ... }for wrapping defaultsPre-parsed Nokogiri input: the HTML and TextFormatter parsers accept a
Nokogiri::XML::Node(DocumentFragment, Document, Element) in addition to Strings, so importers can run their own Nokogiri-driven pre-processing on the live tree without a re-encode/re-parse round-tripTagLibrary#unregister: removes a binding so the renderer falls through torender_children(auto-passthrough for unregistered AST classes)Postprocessor: extracted from
Markbridge#cleanup_markdownintoRenderers::Discourse::Postprocessor; injectable on the RendererMarkdownEscaper#allow:for selective passthrough: replaces the subclass-and-override pattern. Recognised keys::bullet_list,:ordered_list,:atx_heading,:block_quote. Alias:listsexpands to[:bullet_list, :ordered_list]. Unknown keys raise. Marker bytes pass through verbatim while inline content after the marker is still escaped.IdentityEscaper+escape: falsesugar: new no-op escaper for migration paths where source content is already trusted Markdown.discourse_renderer(escape: false)swaps it in. Mutually exclusive withescape_hard_line_breaks:/allow:; an explicitescaper:always wins.Class-only handlers: BBCode, HTML, and TextFormatter parsers now require objects responding to
#process(...). Thebr/hrlambdas in HTML's default registry becameHandlers::SelfClosingHandler; the example file's lambda demos became Handler classes.Note on the rebase: main shipped its own
VoidHandler(same shape, different name) to fix the related whitespace-trim bug. The rebase unified these — kept this branch'sSelfClosingHandlername (covers both<br>and<hr>since they're both self-closing void elements per HTML), kept main's whitespace-trim behavior viaproduces_block?(now simplified — no Proc gate needed).MediaWiki API consistency:
inline_tag_registry:parameter renamed tohandlers:for parity with sibling parsersTextFormatter handler signature: handlers accept
processor:so they can call back intoprocessor.process_children(xml, ast_node)Notable Implementation Details
ParseandConversionuseData.define(frozen, structural equality)raise_on_error: falsemode allows per-row failure isolation in batch migrationsMarkdownEscaper#allow:storage is a private Array (≤ 4 elements) —#include?is observably identical toSet#include?at this size, no allocationexamples/forum_migration.rbexercises the migration API end-to-end:discourse_rendererfactory,tags:,unregister:,allow: :lists, the AST-mutation block,Conversion#errors,raise_on_error: false, and theconvert(format:)dispatcherUPGRADING.mdcovers every breaking changeMigration-side resolution
The migration use case resolves placeholders (uploads, mentions, internal links) at parse time via custom handler subclasses. The handler stores the source-side reference in the converter's upload/user/topic store, gets back a stable identifier, and pins it on the AST node directly. Renderer Tags stay trivial output formatters — no per-post state, no side-channel.
The earlier drafts of this PR shipped an
interface.emit/Conversion#emissionsAPI for render-time side data; that has been removed. Resolution is a write, not a read — it should happen exactly once per source attachment regardless of how many times the post is rendered, the lookup table varies per post (defeating "build-once-reuse-many" if state lived in Tag constructors), and re-renders should be pure functions of the AST. With resolution in the handler, every job emit was trying to do is now done by something simpler. TheBaseAttachmentHandlertemplate-method pattern (extract source ref → store_or_lookup → AttachmentPlaceholder.new) reuses cleanly across phpBB / vBulletin / SMF / IPB attachment handlers — that layer lives in the converter framework, not Markbridge.Tests
discourse_renderer,#overlay,#unregister,Postprocessor,MarkdownEscaper#allow:,IdentityEscaper, polymorphicMarkbridge.render(Parse|AST), AST-mutation block form,SelfClosingHandler,AST::Details/DetailsTag, theElementtraversal helpers, and pre-parsed Nokogiri inputConfiguration,interface.emit,Conversion#emissions,with_provisional_emissions, obsolete Proc-handler shape114095d)mainon every local passThis change is