MLIR as a Rewritable Graph Database

Oct 4, 2025

1. Concept Overview

MLIR (Multi-Level Intermediate Representation) can be understood not just as a compiler IR framework, but as a rewritable, typed graph database.

At its core, MLIR stores structured, typed, and interconnected data (operations and values) that can be queried, transformed, and restructured — just like a database manages records with schema and queries.

Database ConceptMLIR EquivalentDescription
SchemaOperation definitions (via TableGen)Define what nodes (ops) can exist and what fields (operands, attributes) they have.
RecordsOperation instances in IRConcrete nodes representing program fragments.
Query / UpdateRewrite patterns (DRR or C++)Declarative or imperative transformations on subsets of the IR graph.
Query EnginePass manager / pattern driverExecutes rewrites efficiently, ensuring consistency and termination.
Materialized View.mlir textual snapshotHuman-readable dump of current IR database state.

So MLIR ≈ a database of program operations, where dialects define schemas, rewrites define queries/updates, and passes execute those transformations.


2. Why "Database", Not Just "Graph"

MLIR’s IR is a typed, modular, and queryable graph, not a raw untyped DAG (Directed Acyclic Graph). Here’s the key distinction:

AspectPlain GraphMLIR as Graph Database
StructureUnstructured nodes/edgesNodes (ops) and edges (values) with rich typing.
SchemaNoneDefined in ODS/TableGen dialects.
QueriesManual graph traversalPattern-matching and rewrite rules.
TransactionsNot definedPatternRewriter guarantees consistent state after rewrite.
NamespacesFlatModular dialects (isolated schemas).
PersistenceOptional.mlir provides textual serialization.

Thus, the term database highlights the presence of:

  • Schema-driven structure (dialects + ops)
  • Rule-driven updates (rewrites)
  • Transaction-like consistency (rewrites maintain SSA, type, dominance)
  • Query semantics (find + replace via patterns)

3. The Core Mapping

Ops = Schema

Define what data (ops) can exist in the IR:

def TFL_LeakyReluOp : TFL_Op<TFL_Dialect, "leaky_relu", [NoMemoryEffect, SameValueType]> {
  let arguments = (ins F32Tensor:$x, F32Attr:$alpha);
  let results   = (outs F32Tensor:$y);
}

This is analogous to:

CREATE TABLE LeakyRelu (x FLOAT, alpha FLOAT, y FLOAT);

Each operation instance is a record conforming to this schema.

Patterns = Rewrite Rules

Declarative DRR or imperative C++ patterns are like database update queries:

def : Pat<(TF_LeakyReluOp $x, F32Attr:$a), (TFL_LeakyReluOp $x, $a)>;

Equivalent to:

UPDATE ops SET dialect = 'tfl' WHERE op = 'tf.LeakyRelu';

They match subgraphs and replace them with new equivalent forms.

Passes = Query Engine

Passes orchestrate which rewrites run and in what order:

  • Apply rewrites until fixpoint (no matches).
  • Maintain consistency (SSA, type legality).
  • Optimize or lower IR progressively.

Example:

struct LegalizeTFToTFLPass : public PassWrapper<LegalizeTFToTFLPass, OperationPass<ModuleOp>> {
  void runOnOperation() override {
    RewritePatternSet patterns(&getContext());
    populateWithGenerated(patterns);
    (void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));
  }
};

a. Compilers as Databases

  • Martin Odersky (Scala creator): described compilers as in-memory databases that store program facts and derive new ones.
  • Reddit/HackerNews discussions: reinforce the view of compilers as structured data systems that can be queried and transformed.

b. MLIR Research

  • MLIR LangRef: describes the IR as a graph of operations and values.
  • MLIR for Graph Algorithms (LLVM Doc): formalizes MLIR as a graph manipulation substrate.
  • Relational MLIR Dialects: academic work modeling relational algebra as MLIR dialects for query optimization (e.g., PVLDB 2022, Jungmair et al.).
  • Equality Saturation (Egg/MLIR integration): uses e-graphs (equivalence graphs) as rewrite databases for exploring all possible transformations simultaneously.

c. Database-style Compilers

  • Soufflé Datalog Compiler: represents program facts as relations; transformations and analyses as queries.
  • Flix, QL, CodeQL: treat program analysis as querying a database of IR facts.
  • Egglog / E-graphs: treat rewrites as fact insertions and equivalence updates.

5. Where the Analogy Fails

While the conceptual mapping is strong, MLIR isn’t a literal database.

Database PropertyMLIR RealityWhy it breaks
Persistence & durability (ACID)In-memory IR, no transaction logsMLIR rewrites are ephemeral and non-rollbackable.
Relational query languageNo general SQL/Datalog layerRewrites are specialized, not general-purpose.
Concurrency & isolationSingle-threaded mutationNo multi-user or concurrent transactions.
Indexing and query optimizationManual traversalPattern matching is structural, not cost-based.
General data modelProgram IR onlyCan’t represent arbitrary user data.

So the analogy fails when expecting full DBMS features. MLIR is database-like in structure and semantics, not in engineering.


6. Why the Analogy Still Matters

Despite its limits, the framing of MLIR as a graph database of program operations is:

  • Pedagogically powerful: clarifies dialects as schemas, rewrites as queries, and passes as engines.
  • Architecturally descriptive: explains why MLIR scales — modular schemas, composable transformations, structured consistency.
  • Future-oriented: MLIR could evolve toward persistent or incremental IR stores, especially for IDEs or incremental compilers.

In short:

MLIR bridges compiler theory and data systems. Thinking of it as a rewritable graph database highlights how its modular, queryable, schema-driven architecture goes beyond a traditional IR graph.


7. Key References


TL;DR:

MLIR is not just a graph of operations — it’s a typed, schema-driven, queryable graph database for program transformations. Dialects define the schema, rewrites define the queries, and passes execute them to evolve the IR toward hardware or optimized form.

https://josephbak.github.io/posts/feed.xml