Skip to main content

The Vision

MRS enables AI-assisted composition and orchestration of full scores—symphonies, vocal arrangements, film cues—through iterative collaboration with a human arranger/orchestrator. The objective is Dorico-level semantic detail without “information ceilings.” The system must be able to represent and edit the same kinds of decisions a professional score needs: instruments and doublings, voicing, notation semantics, phrasing and expression, and evolving form. This is not a simplified demo. It targets the hardest case: creating and revising large, multi-movement works from scratch, iteratively, while keeping the score structurally safe and human-auditable.

The Core Problem

AI cannot currently compose or edit full orchestral scores reliably. Three fundamental challenges block progress:

1. The Scale Problem

A full orchestral score is too large for any AI context window:
Score TypePartsMeasuresEventsTokens
Lead sheet132~200~2K
Piano sonata2300~5,000~40K
String quartet4400~8,000~65K
Full orchestra901000~200,000~1.5M
No LLM can hold a full orchestral score. More importantly, it shouldn’t have to. Editing measure 847 should not require loading measures 1–846.

2. The Context Problem

Musical decisions require understanding surrounding material. Writing a clarinet countermelody requires knowing:
  • What melodic material it responds to
  • The harmonic context (what chords are sounding)
  • Where phrase boundaries fall
  • What comes before and after
Simple “slice some measures as text” approaches fail because music is structurally entangled: meter, phrasing, harmony, and spans (slurs, hairpins, pedal) all interact across measure boundaries.

3. The Reliability Problem

Existing approaches ask AI to emit complete, structurally-valid score fragments. This creates high failure rates:
  • Accidental omissions: Agent returns less content than given → content silently deleted
  • Calculation errors: Miscomputed absolute positions, duration sums that don’t add up
  • Hallucinated references: Agent references IDs that don’t exist in the score
  • Format violations: Malformed syntax, missing required fields
Even with good models, these failures require multiple repair loops, destroying productivity.

Why Existing Solutions Fail

MusicXML / MEI

  • Interchange-first, not editing-first: Excellent for export/import, but not designed for repeated, local, semantically-safe edits.
  • Too verbose for agent interaction: Token cost is high; diffs are noisy.
  • Fragile references: Many references are positional or implicit, and break under structural edits.

ABC Notation / LilyPond

  • Human-authoring optimized: Powerful for humans, but heavy on implicit state and shorthand.
  • No stable identity: Difficult to reference “that exact note” after insertions/deletions.

MIDI

  • Performance data, not score semantics: Great for capturing a performance or sketch, but it does not reliably encode notation intent (spelling, voices, phrasing, structure).
In practice, a professional workflow uses both: MIDI as input/idea capture and a notation model for the authoritative score. MRS is about the score semantics and safe editing; MIDI can be an input and a preview output, not the canonical truth.

The MRS Solution

MRS solves these problems through architectural separation:

1. Separate Storage from Mutation

MRS-S (storage): Complete, archival-quality encoding with stable UUIDs. Agents READ this. MRS-Ops (mutation): Typed operation protocol with orchestrator-derived fields. Agents WRITE this.
Agent reads:  Working Set Envelope (MRS-S fragment + context views)
Agent writes: MRS-Ops (typed operations with tmp-ids)
Orchestrator: Validates → Maps tmp-ids to UUIDs → Computes derived fields → Applies
This eliminates:
  • Accidental deletions (explicit create/update/delete operations)
  • Calculation errors (orchestrator computes :at, :beat-start)
  • Hallucinated references (orchestrator validates all refs before application)

2. Orchestrator Authority

The orchestrator is the sole authority for:
  • UUID minting (agents use temporary IDs)
  • Derived field computation
  • Canonical state management
  • Validation and application
Agents cannot create invalid references or miscalculate derived fields—the orchestrator handles it.

3. Task-Adaptive Context Views

Instead of fixed “near/far” reduction rings, agents receive context views tailored to their specific task:
Task TypeContext Views
CountermelodyMelodic reference, harmonic context, phrase structure
OrchestrationOrchestration map, texture density, dynamics profile
Dynamics passPhrase structure, existing dynamics, climax points
This provides what’s actually needed, not just “nearby measures at reduced detail.”

4. Professional Orchestration Semantics

The Player → Instrument → Staff model supports real-world orchestration:
(player woodwind-2
  :name "Flute 2 / Piccolo"
  :instruments [flute-2 piccolo]
  :default flute-2)
  • Instrument doubling (Flute 2 / Piccolo)
  • Mid-score instrument changes
  • Transposition handling across changes
  • Percussion kits

5. Progressive Validation

Operations are validated in stages:
  1. Syntax: Operation is well-formed
  2. References: All IDs exist (or are valid tmp-ids)
  3. Permissions: Operation within granted lanes/scope
  4. Musical rules: Constraints satisfied (range, parallel fifths, etc.)
Errors caught before state mutation. Specific feedback. No silent corruption.

Why This Approach is Feasible

The Scale Problem → Solved by Scoped Extraction

Working Set Envelopes reduce a full orchestral score to a focused fragment (often 5–50K tokens) plus task-appropriate context, so the agent is not forced to “hold the whole piece.”

The Context Problem → Solved by Task-Adaptive Views

Context views supply what the task actually needs—melodic references, harmonic context, phrase structure, orchestration maps—instead of “more measures at reduced detail.”

The Reliability Problem → Solved by Typed Operations + Validation

Failure ModeWhat Prevents It
Accidental deletionExplicit create/update/delete operations (no implicit replacement)
Calculation errorsOrchestrator computes derived fields (no agent arithmetic)
Hallucinated referencesReference validation before application
Drift beyond scopeScope + lane permissions enforced by the orchestrator

Similarities to Large Codebases (and Why Music is Harder)

At a systems level, this looks like how teams safely change large codebases:
  • You don’t “rewrite the repository.” You make bounded changes, review diffs, and run validations.
  • The orchestrator plays the role of a gate: validate → apply → record.
But music is harder than code in two key ways:
  1. There is no universal compiler for “good music.” A score can be syntactically correct but stylistically wrong. Validation can catch “illegal” changes (out of range, broken ties, duration overflow), but taste remains human.
  2. Musical coupling is highly non-local. A local orchestration change affects balance, texture, phrasing, and form perception. The system must supply the right musical context, not just the nearby measures.
This is why MRS separates: canonical semantics (what the score is) from task-adaptive context (what an agent needs to make a good suggestion) and human review (what makes it musically final).

The Hardest Version of the Problem

MRS is designed for the most demanding case: building and revising a large score from scratch, iteratively, with AI assistance and human review. This means supporting:
  1. Structural creation: Insert measures, sections, movements—not just edit existing content
  2. Incomplete states: Early-stage work has placeholders (“strings texture TBD”, “orchestrate later”)
  3. Professional orchestration: Player/instrument model with doubling, switching, condensing
  4. Cross-boundary spans: Hairpins, slurs, pedal marks that cross edit boundaries
  5. Human-in-the-loop checkpoints: Lock form, lock harmony, lock orchestration at appropriate stages
  6. Low repair loops: Typed operations + progressive validation keep iteration fast
If MRS handles symphonies-from-scratch, it handles everything simpler too.

What Makes This Achievable

1. The Orchestrator is the Product

The core IP is not the surface syntax—it’s the orchestrator + validation + typed operations. MRS-S is a human-auditable representation of the score semantics. Implementations may store canonical state in a different form (for example, a database/graph-backed model) as long as they preserve the same semantics and can reliably:
  • extract Working Sets,
  • validate and apply MRS-Ops,
  • and export MRS-S for interchange/audit when needed.

2. AI Quality Will Improve

MRS doesn’t require AI to be perfect. It provides:
  • Safety rails: Progressive validation catches errors before they propagate
  • Bounded scope: Mistakes are contained to the working set
  • Human review gates: Checkpoints for approval before proceeding
As AI improves, the system captures that improvement. As AI makes errors, the system contains them.

3. Human-in-the-Loop is a Feature

Professional composition is inherently collaborative. MRS supports:
  • Human approval at structural checkpoints
  • Semantic diffs for reviewing AI changes (operations, not text noise)
  • Playback preview integration
  • The ability to lock decisions (“form is final”) and iterate on details

Feasibility Summary

ChallengeSolutionStatus
Scale (scores too large)Working Set Envelopes with scoped extractionSolved
Context (decisions need surroundings)Task-adaptive context viewsSolved
Reliability (agents make errors)MRS-Ops + orchestrator authority + progressive validationSolved
Structural fragility (references break)UUID-first identitySolved
Professional orchestrationPlayer-Instrument-Staff modelSolved
Safe iteration (edits conflict)Single-writer orchestrator + lane bundlesSolved
From-scratch compositionDraft states + checkpointsSolved
The approach is feasible. The core architectural decisions are sound. What remains is disciplined implementation of the orchestrator semantics—validation pipeline, operation application, context generation—which are well-understood problems with clear solutions.

Next Steps

The following specifications complete the architecture for professional-scale AI composition: See also: