The Vision
MRS enables AI-assisted composition and orchestration of full scores—symphonies, vocal arrangements, film cues—through iterative collaboration with a human arranger/orchestrator. The objective is Dorico-level semantic detail without “information ceilings.” The system must be able to represent and edit the same kinds of decisions a professional score needs: instruments and doublings, voicing, notation semantics, phrasing and expression, and evolving form. This is not a simplified demo. It targets the hardest case: creating and revising large, multi-movement works from scratch, iteratively, while keeping the score structurally safe and human-auditable.The Core Problem
AI cannot currently compose or edit full orchestral scores reliably. Three fundamental challenges block progress:1. The Scale Problem
A full orchestral score is too large for any AI context window:| Score Type | Parts | Measures | Events | Tokens |
|---|---|---|---|---|
| Lead sheet | 1 | 32 | ~200 | ~2K |
| Piano sonata | 2 | 300 | ~5,000 | ~40K |
| String quartet | 4 | 400 | ~8,000 | ~65K |
| Full orchestra | 90 | 1000 | ~200,000 | ~1.5M |
2. The Context Problem
Musical decisions require understanding surrounding material. Writing a clarinet countermelody requires knowing:- What melodic material it responds to
- The harmonic context (what chords are sounding)
- Where phrase boundaries fall
- What comes before and after
3. The Reliability Problem
Existing approaches ask AI to emit complete, structurally-valid score fragments. This creates high failure rates:- Accidental omissions: Agent returns less content than given → content silently deleted
- Calculation errors: Miscomputed absolute positions, duration sums that don’t add up
- Hallucinated references: Agent references IDs that don’t exist in the score
- Format violations: Malformed syntax, missing required fields
Why Existing Solutions Fail
MusicXML / MEI
- Interchange-first, not editing-first: Excellent for export/import, but not designed for repeated, local, semantically-safe edits.
- Too verbose for agent interaction: Token cost is high; diffs are noisy.
- Fragile references: Many references are positional or implicit, and break under structural edits.
ABC Notation / LilyPond
- Human-authoring optimized: Powerful for humans, but heavy on implicit state and shorthand.
- No stable identity: Difficult to reference “that exact note” after insertions/deletions.
MIDI
- Performance data, not score semantics: Great for capturing a performance or sketch, but it does not reliably encode notation intent (spelling, voices, phrasing, structure).
In practice, a professional workflow uses both: MIDI as input/idea capture and a notation model for the authoritative score. MRS is about the score semantics and safe editing; MIDI can be an input and a preview output, not the canonical truth.
The MRS Solution
MRS solves these problems through architectural separation:1. Separate Storage from Mutation
MRS-S (storage): Complete, archival-quality encoding with stable UUIDs. Agents READ this. MRS-Ops (mutation): Typed operation protocol with orchestrator-derived fields. Agents WRITE this.- Accidental deletions (explicit create/update/delete operations)
- Calculation errors (orchestrator computes
:at,:beat-start) - Hallucinated references (orchestrator validates all refs before application)
2. Orchestrator Authority
The orchestrator is the sole authority for:- UUID minting (agents use temporary IDs)
- Derived field computation
- Canonical state management
- Validation and application
3. Task-Adaptive Context Views
Instead of fixed “near/far” reduction rings, agents receive context views tailored to their specific task:| Task Type | Context Views |
|---|---|
| Countermelody | Melodic reference, harmonic context, phrase structure |
| Orchestration | Orchestration map, texture density, dynamics profile |
| Dynamics pass | Phrase structure, existing dynamics, climax points |
4. Professional Orchestration Semantics
The Player → Instrument → Staff model supports real-world orchestration:- Instrument doubling (Flute 2 / Piccolo)
- Mid-score instrument changes
- Transposition handling across changes
- Percussion kits
5. Progressive Validation
Operations are validated in stages:- Syntax: Operation is well-formed
- References: All IDs exist (or are valid tmp-ids)
- Permissions: Operation within granted lanes/scope
- Musical rules: Constraints satisfied (range, parallel fifths, etc.)
Why This Approach is Feasible
The Scale Problem → Solved by Scoped Extraction
Working Set Envelopes reduce a full orchestral score to a focused fragment (often 5–50K tokens) plus task-appropriate context, so the agent is not forced to “hold the whole piece.”The Context Problem → Solved by Task-Adaptive Views
Context views supply what the task actually needs—melodic references, harmonic context, phrase structure, orchestration maps—instead of “more measures at reduced detail.”The Reliability Problem → Solved by Typed Operations + Validation
| Failure Mode | What Prevents It |
|---|---|
| Accidental deletion | Explicit create/update/delete operations (no implicit replacement) |
| Calculation errors | Orchestrator computes derived fields (no agent arithmetic) |
| Hallucinated references | Reference validation before application |
| Drift beyond scope | Scope + lane permissions enforced by the orchestrator |
Similarities to Large Codebases (and Why Music is Harder)
At a systems level, this looks like how teams safely change large codebases:- You don’t “rewrite the repository.” You make bounded changes, review diffs, and run validations.
- The orchestrator plays the role of a gate: validate → apply → record.
- There is no universal compiler for “good music.” A score can be syntactically correct but stylistically wrong. Validation can catch “illegal” changes (out of range, broken ties, duration overflow), but taste remains human.
- Musical coupling is highly non-local. A local orchestration change affects balance, texture, phrasing, and form perception. The system must supply the right musical context, not just the nearby measures.
The Hardest Version of the Problem
MRS is designed for the most demanding case: building and revising a large score from scratch, iteratively, with AI assistance and human review. This means supporting:- Structural creation: Insert measures, sections, movements—not just edit existing content
- Incomplete states: Early-stage work has placeholders (“strings texture TBD”, “orchestrate later”)
- Professional orchestration: Player/instrument model with doubling, switching, condensing
- Cross-boundary spans: Hairpins, slurs, pedal marks that cross edit boundaries
- Human-in-the-loop checkpoints: Lock form, lock harmony, lock orchestration at appropriate stages
- Low repair loops: Typed operations + progressive validation keep iteration fast
What Makes This Achievable
1. The Orchestrator is the Product
The core IP is not the surface syntax—it’s the orchestrator + validation + typed operations. MRS-S is a human-auditable representation of the score semantics. Implementations may store canonical state in a different form (for example, a database/graph-backed model) as long as they preserve the same semantics and can reliably:- extract Working Sets,
- validate and apply MRS-Ops,
- and export MRS-S for interchange/audit when needed.
2. AI Quality Will Improve
MRS doesn’t require AI to be perfect. It provides:- Safety rails: Progressive validation catches errors before they propagate
- Bounded scope: Mistakes are contained to the working set
- Human review gates: Checkpoints for approval before proceeding
3. Human-in-the-Loop is a Feature
Professional composition is inherently collaborative. MRS supports:- Human approval at structural checkpoints
- Semantic diffs for reviewing AI changes (operations, not text noise)
- Playback preview integration
- The ability to lock decisions (“form is final”) and iterate on details
Feasibility Summary
| Challenge | Solution | Status |
|---|---|---|
| Scale (scores too large) | Working Set Envelopes with scoped extraction | Solved |
| Context (decisions need surroundings) | Task-adaptive context views | Solved |
| Reliability (agents make errors) | MRS-Ops + orchestrator authority + progressive validation | Solved |
| Structural fragility (references break) | UUID-first identity | Solved |
| Professional orchestration | Player-Instrument-Staff model | Solved |
| Safe iteration (edits conflict) | Single-writer orchestrator + lane bundles | Solved |
| From-scratch composition | Draft states + checkpoints | Solved |
Next Steps
The following specifications complete the architecture for professional-scale AI composition:- Orchestrator Contract — Validation, transactions, conflict detection
- Working Set Envelope — Scoped extraction with context
- MRS-Ops Protocol — Typed operations for agent mutations
- Design Principles — Core philosophy
- Architecture Overview — Component relationships