Skip to content

sinabarimd/reputation-engine

Repository files navigation

Reputation Engine

A multi-site automated publishing system for entity-first SEO, built by Dr. Sina Bari, MD.

Reputation Engine is the system I built to take control of my professional online presence. It coordinates AI-powered content generation, multi-site publishing, structured data optimization, and SERP monitoring across four owned domains - all orchestrated by autonomous agents running on n8n.

📖 Read the full article: How I Built a Personal Reputation Engine with AI Agents

▶️ Watch the walkthrough on YouTube

Latest Changes (June 13, 2026)

  • Narrative-led editorial voice for drsinabari.com (Jun 13) -- Long-form essays now write in the Malcolm Gladwell tradition: scene-first openings, withheld thesis, counterintuitive reframes, one coined named concept per essay (2-4 words) that recurs, and stacked cases that converge on a single principle. AEO answer-block suppressed on this site only; extraction value moves to the coined concept and the end-of-essay FAQ. Other three sites unchanged.
  • Pre-push scan hardened (Jun 13) -- Adds OpenAI/HuggingFace/AWS/Google API key patterns, generic JWT detection, an X-Voice-Key literal check, and a generalized high-entropy base64-ish blob scanner with a # SAFE-B64 allowlist for known-safe content hashes.
  • measure.py SSL hardening (Jun 13) -- Removed insecure CERT_NONE SSL context; cert verification now enforced on the BrightData / GSC measurement script.
  • sync_pending_actions label updates (Jun 13) -- Daily Todo reconciler detects when a tracked todo's label text has drifted against the same todo_id and updates the line in place rather than leaving stale wording.

See CHANGELOG.md for the full changelog.


Why This Exists

Physicians and professionals often discover that Google results for their name include outdated, inaccurate, or context-free information. You can't remove those results, but you can build enough high-quality, authoritative content to occupy the visible SERP yourself.

That's what Reputation Engine does - systematically.

Instead of a single website competing for one slot, I run four purpose-built domains, each targeting a different facet of my professional identity. An autonomous agent pipeline researches topics, generates content, validates SEO quality, publishes articles, and measures the impact - on a weekly schedule, with human oversight at every stage.


Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                  Portfolio Orchestrator                   │
│          (scheduling, cadence, dispatch, auto-publish)   │
└──────────┬──────────┬──────────┬──────────┬─────────────┘
           │          │          │          │
     ┌─────▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────────┐
     │ Research │ │Content │ │  QA  │ │ Publisher  │
     │  Agent   │ │ Agent  │ │Agent │ │   Agent    │
     └─────────┘ └────────┘ └──────┘ └────────────┘
           │          │          │          │
     ┌─────▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────────┐
     │  SEO    │ │ Media  │ │Meas. │ │ Technical  │
     │Research │ │Ingest  │ │Agent │ │ SEO Agent  │
     └─────────┘ └────────┘ └──────┘ └────────────┘

The Four Domains

Domain Role Content Focus
sinabarimd.com Canonical identity hub Bio, work, media, selected writing
sinabari.net Healthcare AI authority Healthcare AI analysis, health tech, digital health
drsinabari.com Editorial node Medicine & technology essays, clinical ethics, healthcare policy
sinabariplasticsurgery.com Specialty node Aesthetics, aging, rejuvenation, surgery

The Agent System (10 Workflows)

Every agent is a standalone n8n workflow with a single responsibility:

  1. Portfolio Orchestrator - The scheduling brain. Runs per-site cron jobs, checks publishing cadence, auto-publishes approved drafts, and dispatches content generation when the queue is empty.

  2. Content Research Agent - Runs weekly topic scouting (Phase 1) using web search APIs, then deep research (Phase 2) when an operator selects a topic. Supports file attachments (PDFs, papers) for research context.

  3. Content Generator - Takes a research brief and site profile, generates a structured draft via LLM, and stores it for human review. Enforces per-site word counts, tone, and forbidden topics.

  4. Content Publisher - A 20-node pipeline that fetches the approved draft, renders the article page with full SEO metadata and structured data, updates the homepage, generates sitemaps, deploys via a deterministic file-sync service, and triggers QA.

  5. SEO QA Agent - Three-level validation (article, domain, portfolio). Checks structured data, meta tags, internal linking, content quality. Runs automatically after every publish.

  6. SEO Research Agent - Weekly intelligence brief analyzing SERP trends, competitor movements, and keyword opportunities across all four domains.

  7. Technical SEO Implementer - Converts SEO research briefs into actionable tasks with an approve/dismiss/execute workflow.

  8. Media Ingestion Agent - Monitors the web for mentions of my name, classifies them, and queues relevant items for the press page.

  9. Measurement Agent - Tracks SERP positions using residential proxy searches and Google Search Console data. Monitors for negative results and generates alerts.

  10. Site Refresh - Operator-triggered full page regeneration for design updates (used carefully - it's a destructive operation).


Tech Stack

Component Technology Why
Orchestration n8n (self-hosted) Visual workflow builder, webhook-native, good API
Content Generation OpenClaw (self-hosted LLM gateway) Full control over prompts, model swapping, no vendor lock-in, all data stays on-prem
Web Research Tavily API Purpose-built for AI research, good relevance
SERP Monitoring BrightData residential SERP API Accurate residential-IP search results
Search Analytics Google Search Console API First-party click/impression data
Hosting Static HTML + nginx + Traefik Fast, simple, deterministic deploys
Deploy Custom Python deploy service (port 9911) Full-file-sync model, atomic deploys
Text Extraction Custom Python service (port 9913) PDF/DOCX/TXT → plain text for research attachments
Site Design Google Stitch via MCP AI-generated site designs, connected through Model Context Protocol
Development - Design Claude Cowork Architecture planning, spec writing, brainstorming
Development - Build Claude Code Live API calls, coding, deployment, debugging
Infrastructure Single VPS + Docker n8n in container, host services via Docker bridge

The "Open Server" Pattern

n8n runs inside Docker and can't execute host commands directly. The solution: lightweight Python HTTP services on the host, managed by systemd, firewalled to only accept connections from the Docker bridge subnet. Each is a single Python file using http.server. When n8n needs host-level capabilities (file deploy, text extraction, etc.), it makes an HTTP call to host.docker.internal:{port}.

Claude as Development Partner

This entire system was designed in Claude Cowork and built with Claude Code. Cowork handles the thinking - architecture, specs, strategy, brainstorming. Claude Code handles the doing - live n8n API calls, writing code, deploying changes, debugging production issues. They share the same project folder, so a spec file drafted in Cowork is immediately available for Code to implement.

A 500-line CLAUDE.md file in the project root acts as institutional memory - complete API reference, workflow IDs, webhook endpoints, architectural rules, and deployment procedures. Every Claude Code session reads it automatically, starting with full system context.

RLAiF Content Quality Loop

The system uses a form of Reinforcement Learning from AI Feedback (RLAiF) to improve published content quality over time. It works in two layers:

Layer 1 (Deterministic) runs a rules-based check on every published article, scanning for known AI content tells: banned generic phrases, em-dashes, missing first-person clinical voice, weak specificity signals, insufficient outbound authority links, and structural tells like hedge openers. This runs as a Code node in the SEO QA Agent, costs nothing, and executes in milliseconds.

Layer 2 (Model-Based) sends each article through a 5-dimension editorial rubric via three independent LLM passes, then aggregates scores with confidence tracking. The dimensions - first_hand_expertise, information_gain, specificity_evidence, depth_substance, voice_authenticity - evaluate the holistic signals that deterministic checks can't measure. This is advisory only and never gates a deploy.

The feedback loop: articles are published, automatically graded by both layers, and the results surface in the operator dashboard with per-dimension scores and suggested fixes. The operator rewrites weak articles using the grading feedback, redeploys, and regrades. Each cycle produces measurable score deltas that identify which editorial tactics have the highest impact per dimension.

After two rewrite cycles across 12 articles, the system produced a ranked playbook of editorial interventions. The highest-impact tactics: named citations with quantitative findings (+1.3-2.4 on specificity), opening clinical anecdotes with patient-specific detail (+2.0 on expertise), and quoted patient dialogue (+1.0-1.4 on expertise/voice). These findings feed back into the Content Generator's prompt engineering, closing the loop between evaluation and generation.

The approach treats content quality as an empirical optimization problem rather than a subjective editorial judgment. Every rewrite is an experiment with a measurable outcome.

Example: Scrolling News Ticker. The sinabarimd.com homepage has a scrolling news ticker showing recent media mentions. It went from idea → design spec (Cowork) → working component deployed to production (Claude Code) in a single session. That's the kind of iteration speed this workflow enables for a non-engineer.


Key Design Decisions

Why Static HTML Instead of WordPress/CMS

The deploy service does a full file sync - every deploy lists exactly which files should exist, and anything not in the list is removed. This makes deploys completely deterministic: you always know exactly what's live. No database, no plugins, no security surface. The tradeoff is that you need a rendering pipeline, which the Content Publisher handles.

Why Separate Domains Instead of Subdomains

Each domain builds its own authority and competes for its own SERP slot. Subdomains of a single domain would consolidate ranking power but only occupy one result. The goal is to own as many page-one results as possible for branded queries.

Why Human-in-the-Loop

Every draft goes through human review before publishing. The operator can edit titles, excerpts, full content, and even reroute articles to a different site. Auto-publish only fires for drafts that have been explicitly approved. This is a reputation system - accuracy matters more than speed.

Why Per-Agent Isolation

Each agent has exactly one job. The Content Generator doesn't know about SEO scores. The QA Agent doesn't generate content. The Measurement Agent doesn't publish anything. This makes the system debuggable, testable, and safe to modify - changing one agent never breaks another.


Repository Structure

reputation-engine/
├── README.md                    # You are here
├── LICENSE                      # MIT
├── workflows/                   # All 10 n8n workflow JSONs (sanitized)
│   ├── portfolio-orchestrator.json      # Scheduling brain (54 nodes)
│   ├── content-research-agent.json      # Topic scout + deep research (43 nodes)
│   ├── content-generator.json           # Draft generation via LLM (30 nodes)
│   ├── content-publisher.json           # 20-node article pipeline
│   ├── seo-qa-agent.json               # 3-level SEO validation (28 nodes)
│   ├── seo-research-agent.json          # Weekly intelligence brief (14 nodes)
│   ├── technical-seo-implementer.json   # Brief-to-tasks pipeline (22 nodes)
│   ├── media-ingestion-agent.json       # Media monitoring (18 nodes)
│   ├── measurement-agent.json           # SERP + GSC tracking (28 nodes)
│   └── site-refresh.json               # Full page regen (35 nodes)
├── dashboard.html               # Operator dashboard (3,350 lines, 8 tabs)
├── deploy/
│   └── deploy_service.py        # Deterministic file-sync deploy service
├── services/
│   └── deep-researcher-api.py   # Async academic paper research + n8n callback
├── scripts/
│   └── backup.sh                # Full system backup (workflows + sites + state)
├── profiles/
│   ├── sinabarimd_com.yaml      # Site profile - canonical hub
│   ├── sinabari_net.yaml        # Site profile - healthcare AI
│   ├── drsinabari_com.yaml      # Site profile - editorial
│   └── sinabariplasticsurgery_com.yaml  # Site profile - specialty
├── qa/
│   └── qa_checks.js             # SEO QA validation logic (n8n Code node)
├── schema/
│   ├── homepage_person.json     # Person+Physician structured data
│   ├── article_schema.json      # Article page structured data template
│   └── faq_extractor.js         # Auto-extracts FAQPage schema from HTML
├── templates/
│   └── article_meta.html        # Article page SEO meta template
├── measurement/
│   └── measure.py               # GSC data collection via service account
└── docs/
    ├── architecture.md          # Detailed architecture documentation
    └── publishing-pipeline.md   # The 20-node publish flow

Example: Site Profile

Each domain has a YAML profile that controls content generation, publishing cadence, and SEO settings:

site_id: sinabari_net
domain: sinabari.net
name: "Sina Bari, MD - Healthcare AI Analysis"
role: "Healthcare AI authority site"
author:
  name: "Dr. Sina Bari, MD"
  url: "https://sinabarimd.com/about"
content:
  allowed_topics:
    - healthcare AI
    - medical technology
    - digital health
    - precision medicine
  forbidden_topics:
    - plastic surgery
    - reconstructive surgery
    - generic AI
  default_word_count: 1200
  tone: "analytical, evidence-based, first-person clinical perspective"
publishing:
  min_days_between_publishes: 3
  pipeline_section: "ANALYSIS"
  cron_days: [tuesday, friday]
seo:
  schema_type: "WebSite"
  canonical_hub_link: true
  author_id: "https://sinabarimd.com/#sinabari"

Example: Deploy Service

The deploy service is a simple Python HTTP server that receives a file manifest and atomically syncs a site directory:

# Simplified - see deploy/deploy_service.py for the full implementation
def handle_deploy(request):
    payload = request.json
    domain = payload['domain']
    deploy_path = payload['deployPath']
    files = payload['files']
    
    # Write all files from the manifest
    for file_entry in files:
        path = os.path.join(deploy_path, file_entry['path'])
        os.makedirs(os.path.dirname(path), exist_ok=True)
        with open(path, 'w') as f:
            f.write(file_entry['content'])
    
    # Remove any files NOT in the manifest (full sync)
    manifest_paths = {f['path'] for f in files}
    for existing in walk_directory(deploy_path):
        if existing not in manifest_paths:
            os.remove(os.path.join(deploy_path, existing))
    
    return {"success": True, "files_written": len(files)}

Results

After 5 weeks of operation (as of late April 2026):

  • 4 owned domains ranking on page 1 for branded queries
  • 10+ articles published across all sites with automated QA
  • Structured data (Person, Physician, Article, FAQPage) deployed on every page
  • Web 2.0 syndication across 15+ platforms with no-repeat tracking
  • Zero manual deploys -- everything goes through the pipeline
  • Deep academic research -- papers indexed and synthesized into content briefs
  • Operator dashboard -- single-page control plane with 8 tabs, daily todos, inline actions

Running Your Own

This repository is a reference implementation. To adapt it for your own use:

  1. Set up n8n - self-hosted instance with API access
  2. Define your domains - what facets of your identity do you want to represent?
  3. Create site profiles - YAML configs that control content and publishing rules
  4. Set up the deploy service - or adapt to your hosting (Netlify, Vercel, S3, etc.)
  5. Connect an LLM - OpenClaw, OpenAI, Anthropic, or any compatible API
  6. Build incrementally - start with one site, add agents as you go

Author

Dr. Sina Bari, MD Physician · Healthcare AI · Medical Technology


License

MIT - see LICENSE for details.

This is a reference implementation of the system described in How I Built a Personal Reputation Engine with AI Agents.

About

Automated multi-site SEO publishing system using n8n, AI agents, and structured data. 10 autonomous workflows manage content research, generation, publishing, QA, and SERP measurement across four domains.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors