SysMoBench

Evaluating AI on Formally Modeling Complex Real-World Systems

About SysMoBench

SysMoBench is a comprehensive benchmark for assessing AI's capacity to create formal specifications of complex, real-world computing systems. While recent advances in generative AI show promise in generating certain forms of specifications, existing work mostly targets small code, not complete systems. SysMoBench bridges this gap by focusing on concurrent and distributed systems using TLA+, the predominant specification language for these domains.

Key Features

📊

Automated Evaluation

Four-phase assessment pipeline: syntax validation, runtime analysis, behavioral conformance, and property verification.

🔄

Real-World Systems

Nine carefully selected artifacts spanning synchronization primitives, consensus protocols, and distributed replication.

🔧

Extensible

Add new systems with only system code, trace instrumentation, and invariant templates — no reference models needed.

Evaluation Pipeline

Phase 1
Syntax
SANY parser validation
Phase 2
Runtime
TLC model checking
Phase 3
Conformance
Execution trace verification
Phase 4
Properties
Safety & liveness checks

Benchmark Systems

Synchronization
Asterinas Spinlock
Synchronization
Asterinas Mutex
Synchronization
Asterinas Rwmutex
Consensus
Etcd Raft
Consensus
Redis Raft
Consensus
PGo Raftkvs
Replication
Xline CURP
Distributed
PGo Lock Server
Distributed
PGo Dist. Queue
Get Started →