I'm happy to announce that our paper "Intrinsic Verification of
Parsers and Formal Grammar Theory in Dependent Lambek Calculus" has
been accepted at PLDI 2025. Authors are Steven Schaefer (my PhD
student @stschaefSteven Schaefer), Nathan Varner (an undergrad here at
UM) and Pedro Amorim (@pamorimPedro Amorim). The extended version
of the paper is already up on arxiv (arxiv.org/abs/2504.03995)
and the code is available on github
(github.com/maxsnew/grammars-an).

The main idea of the paper is to define formal grammars as types in an
ordered linear logic (aka Lambek calculus). Then the terms are a kind
of intrinsically sound parse transformer, with intrinsically verified
parsers as a special case. The idea is that you write a parser as a
function from strings to parse trees, but the syntactic discipline of
Lambek calculus ensures that the output tree is a parse of the input
string.

Then formalisms like regular expressions and context-free grammars are
just inductive data types in Lambek calculus, and with the addition of
type dependency on non-linear types we are also able to express
automata formalisms. So far, we have an intrinsically verified parser
for regular expressions (using the classic Regex->NFA->DFA pipeline
using the powerset construction) as well as some hand-written parsers
for context-free grammars.

We give a simple denotational model where a grammar is defined as a
family of sets indexed by strings Σ* -> Set, a proof relevant version
of the definition of a formal language as a predicate Σ* -> Prop. This
category is monoidal closed and bicomplete, so supports all the
constructions of dependent Lambek calculus. The denotational semantics
is used as the basis for a shallow embedding of the system in Agda,
where we have formally verified all of our examples.

0
0
0

If you have a fediverse account, you can quote this note from your own instance. Search https://types.pl/users/maxsnew/statuses/114354430893166614 on your instance and quote it. (Note that quoting is not supported in Mastodon.)