Skip to content

Strata — Database Archaeology Pipeline

Strata is the sibling methodology to Project Phoenix. Where Phoenix extracts intent from application logic and rebuilds from zero, Strata excavates database complexity, classifies what it finds, and produces a migration brief.


Why Strata Exists

Phoenix assumes a transformation target — rebuild a system in a modern stack. That assumption holds for application code. It breaks on enterprise databases.

A SQL Server installation is not one thing. It is layered:

LayerContents
SchemaTables, views, constraints, indexes, relationships
LogicStored procedures, functions, triggers
JobsSQL Agent jobs, schedules, step dependencies
DependenciesLinked servers, synonyms, external aliases, cross-database references

Each layer has a different relationship to transformation. Some objects rebuild. Some preserve. Some retire. Forcing all four layers through a single rebuild pipeline produces the wrong outputs for three of them.

Strata keeps the concerns separated.


What Strata Is Not

  • Not a rebuild pipeline — Strata does not produce application code
  • Not a schema migrator — it classifies, it does not execute migration
  • Not a subset of Phoenix — it is a parallel methodology with different agents, different artifacts, and a different terminal decision
  • Not limited to SQL Server — the methodology applies to any enterprise database with layered complexity

The Five Agents

S-00  Schema Signal        →  data model map
S-01  Logic Extractor      →  routine traces
S-02  Job Mapping          →  operational layer
S-03  Dependency Tracer    →  external surface
S-04  Classification       →  terminal brief
AgentNameReadsProduces
S-00Schema SignalTable definitions, ERD, constraintsschema artifacts — one per domain
S-01Logic ExtractorStored procedures, functions, triggersroutine artifacts — one per routine
S-02Job MapperSQL Agent jobs, step definitions, schedulesjob artifacts — one per job
S-03Dependency TracerLinked servers, synonyms, cross-db refs, aliasesdependency artifacts — one per external surface
S-04ClassifierAll prior artifactsdecision artifact — transform / preserve / retire per object

No build pass. No certifier. The terminal artifact is the classification map and migration brief — not a production codebase.


The Non-Transformative Scope

The key distinction in Strata is the non-transformative classification.

Not every database object is a rebuild candidate. SQL Agent Jobs, linked servers, and synonyms are operational infrastructure — they schedule, they route, they abstract. The question for these objects is not how to rebuild them but what to do with them:

CONSTRUCT  decision
ID         billing.nightly-rollup
VERSION    1
─────────────────────────────────────────
object:    job
layer:     sql-agent
decision:  preserve
reason:
  Runs nightly billing rollup — no application equivalent.
  Must remain operational during migration.
  Reconsider after Phase 2 app layer is stable.

A decision construct with preserve status says: this object is accounted for, its scope is understood, and the decision is explicit. It does not fall through the cracks. It does not get rebuilt by accident. It does not get forgotten.


EMBER Construct Types

Strata introduces five new construct types to EMBER:

ConstructProduced byPurpose
schemaS-00Data model signal — tables, relationships, constraints per domain
routineS-01Stored procedure or function trace — inputs, outputs, dependencies, business logic
jobS-02SQL Agent job definition — steps, schedule, what it touches
dependencyS-03External surface map — linked server, synonym, alias, cross-db reference
decisionS-04Classification result — transform / preserve / retire with explicit reasoning

All five use the standard EMBER header and separator. The decision construct is the only one unique to Strata — the others could theoretically appear in a Phoenix engagement too.


Example Artifacts

Schema signal — one per domain:

CONSTRUCT  schema
ID         billing.core
VERSION    1
─────────────────────────────────────────
tables:
  invoices        → 2.3M rows, partitioned by year
  invoice_lines   → 18.4M rows, FK to invoices
  payment_events  → 4.1M rows, append-only

relationships:
  invoices → invoice_lines  (1:many, CASCADE DELETE)
  invoices → payment_events (1:many, no cascade)

constraints:
  invoices.status   CHECK (status IN ('draft','sent','paid','void'))
  invoice_lines.qty CHECK (qty > 0)

indexes:
  invoices (customer_id, created_at) — high-read composite
  payment_events (invoice_id, event_type) — covering

confidence: high

Routine trace — one per stored procedure:

CONSTRUCT  routine
ID         billing.sp_generate_invoice
VERSION    1
─────────────────────────────────────────
type:      stored-procedure
entry:     EXEC billing.sp_generate_invoice @order_id, @issued_by
reads:     orders, order_lines, customers, tax_rates
writes:    invoices, invoice_lines, audit_log

logic:
  - Calculates line totals before tax
  - Applies customer-specific tax rate if exists, else default
  - Inserts invoice header then lines in single transaction
  - Writes audit entry on completion

boundaries:
  → linked server REPORTING for audit mirror (write-through)

gaps:
  - No rollback on audit write failure — silent data loss possible
  - Tax rate lookup has no fallback if customer record missing

confidence: high

Job definition — one per SQL Agent job:

CONSTRUCT  job
ID         billing.nightly-rollup
VERSION    1
─────────────────────────────────────────
schedule:  daily 02:00 UTC
steps:
  1 → exec billing.sp_close_day_invoices
  2 → exec reporting.sp_refresh_billing_summary
  3 → exec dbo.sp_archive_processed_events (linked server)

dependencies:
  → sp_close_day_invoices must complete before step 2
  → step 3 requires REPORTING linked server available

outputs:
  billing_summary table refreshed
  processed_events archived to REPORTING

failure-behaviour:
  step 1 failure → alert DBA, abort
  step 2 failure → continue, flag in log
  step 3 failure → silent (non-critical archive)

decision:  preserve

Decision — the terminal classification:

CONSTRUCT  decision
ID         billing.classification
VERSION    1
─────────────────────────────────────────
scope:     billing domain — 4 tables, 12 routines, 3 jobs, 1 linked server

transform:
  sp_generate_invoice    → extract to application service layer
  sp_apply_credit        → extract to application service layer
  sp_void_invoice        → extract to application service layer

preserve:
  nightly-rollup job     → operational, no app equivalent
  REPORTING linked server → archive target, retain during migration

retire:
  sp_legacy_batch_import → superseded by API ingestion (2019)
  sp_fix_tax_rounding    → one-time correction, never called post-2021

open:
  sp_calculate_arrears   → business logic unclear, requires domain expert review

How Strata Feeds Phoenix

Strata and Phoenix are siblings. On a full engagement — an application backed by an enterprise database — both run. The sequence:

Strata S-00 → S-04   (database excavation, 1–2 days)

   decision artifact  →  feeds Phoenix A-04 (Solution Architect)
                      →  informs Phoenix A-05 (Builder) on data access layer
                      →  scopes what A-06 (Certifier) needs to verify

The Strata decision artifact answers the question Phoenix A-04 would otherwise have to guess: what does the database layer require, what can change, and what must stay intact. Without it, the architect is making assumptions. With it, the assumptions are named decisions.


What Strata Does Not Produce

  • No production code — classification only, not implementation
  • No schema migration scripts — that is execution, not archaeology
  • No application rebuild — use Phoenix for the application layer
  • No runtime — Strata-runtime is on the roadmap; the methodology ships first

Relationship to Phoenix

Phoenix                         Strata
──────────────────────────────  ──────────────────────────────────
Application layer               Database layer
Transformation pipeline         Archaeology pipeline
Terminal artifact: codebase     Terminal artifact: classification map
7 agents                        5 agents
Constructs: signal → cert       Constructs: schema → decision
Rebuild from zero               Classify and decide

Both use EMBER. Both produce .sil artifacts. Both are read by humans and agents without a manual.


Status

Strata is in specification. The methodology is defined. The runtime (strata-runtime) is on the roadmap.

  • Methodology: this document
  • Runtime: not yet built
  • EMBER constructs: schema, routine, job, dependency, decision — specified, not yet in @semanticintent/ember

Strata v0.1 — Project Phoenix family — phoenix.semanticintent.dev/strata/