Current System
Example Full system prompt:
https://gist.github.com/sprice/a915695ade4ebe7c05a4d9cfb25e9957
# copy bill to clipboard
npm run prompt -- C-244
System prompt in codebase:
https://github.com/BuildCanada/BillsTracker/blob/main/src/prompt/summary-and-vote-prompt.ts
Goals
- Quality & Correctness - Produce LLM judgements and assessments that meet or exceed our standard of quality and correctness
- Separation of concerns - Decouple application UI/business logic from LLM system
- Improved prompt engineering - Establish foundation for objective/systematic prompt improvements
- Enhanced metadata - Provide valuable contextual information for each bill and judgement
Phase 0
Tighten up code-formatting/CI
Move Judgement Logic to LLM
Infrastructure & Observability
- Development environment - Create dedicated environment for judgement/metadata system improvements (not a general dev environment)
- Prompt tracing - Implement trace tracking (recommend Langfuse)
- Logging improvements
- Replace
console.* statements with debug package
- Log prompts and prompt arguments in production; log everything in development
Metadata Requirements
Define initial required metadata fields:
Separate LLM Calls
- Judgement call - System prompt + bill → vote decision + analysis
- Metadata call(s) - System prompt + bill + judgement/analsis → metadata extraction
Future Phases
Problem: Are we collecting as much data as we want for each bill?
Problem: Are our prompts as good as they can be?
- Prompt evaluations - Create eval framework for testing prompt performance
- Manual prompt refinement - Iteratively improve system prompts
- Automated prompt optimization - Explore programmatic prompt training/tuning (DSPy)
Problem: Many bills are changes to existing bills. We need the context of those bills.
- Get other legislation
- Build database of other bills?
- Create LLM tool to fetch contents of other bills?
Problem: Understanding the issues related to bills is often found in Parliamentary debate and among commentary/criticism of topic experts
-
Hansard context database
-
Build database of Hansard content?
-
Create LLM tool to fetch specific Hansard content?
-
Topic Expert context - Build LLM tools to search for and fetch context of related commentary/criticism of bills from topic experts.
Current System
Example Full system prompt:
https://gist.github.com/sprice/a915695ade4ebe7c05a4d9cfb25e9957
# copy bill to clipboard npm run prompt -- C-244System prompt in codebase:
https://github.com/BuildCanada/BillsTracker/blob/main/src/prompt/summary-and-vote-prompt.ts
Goals
Phase 0
Tighten up code-formatting/CI
Move Judgement Logic to LLM
yes,no, orabstain)neutraltoabstainto align with parliamentary voting terminologyneutral/abstainneutraltoabstainand update prompt to judgeyes,no, orabstain#61Infrastructure & Observability
console.*statements withdebugpackageMetadata Requirements
Define initial required metadata fields:
Separate LLM Calls
Future Phases
Problem: Are we collecting as much data as we want for each bill?
Problem: Are our prompts as good as they can be?
Problem: Many bills are changes to existing bills. We need the context of those bills.
Problem: Understanding the issues related to bills is often found in Parliamentary debate and among commentary/criticism of topic experts
Hansard context database
Build database of Hansard content?
Create LLM tool to fetch specific Hansard content?
Topic Expert context - Build LLM tools to search for and fetch context of related commentary/criticism of bills from topic experts.