Type-safe printf in Coq
Many languages provide mechanisms for formatted output, with C's
printf undoubtedly being the most influential one. Some of these
functions allow a format to be specified using a concise and intuitive
syntax, which is probably part of the reason for them being so
popular. For printf, for instance, a format is described using just
a plain string.
Unfortunately, this convenience often comes with a price. In C, a
mismatch between the output format and the other arguments results in
incorrect behavior. This non-trivial dependency can't be expressed in
the language's type system and requires additional compiler checks to
be enforced. Other languages suffer from similar problems. Haskell's
standard printf also causes a run-time error when a format mismatch
occurs. OCaml is able to enforce that format and arguments are
compatible at compile-time, but at the cost of extending the language
with an ad-hoc format type that is also represented as
strings. Other approaches solve the problem by adopting different
representations for the output format, which can make it slightly less
convenient to specify. In Functional
Unparsing, Olivier Danvy showed how to implement an analogue of
printf using formatting combinators. More recently, Oleg Kiselyov
used delimited control operators to implement his own type-safe
version of printf in Haskell,
an idea that has also been ported to Coq
by Matthieu Sozeau.
It would be a shame if we had to extend our language in an ad-hoc
manner just to get safe and convenient formatting. We will see how we
can use Coq's expressive type system to describe the dependency
between a format string and the arguments it requires, and implement a
version of sprintf that doesn't suffer from the aforementioned
issues.
Let's begin by defining some useful notations for lists and
strings.
Notation "[ ]" := nil : list_scope.
Notation "[ x , .. , y ]" := (cons x .. (cons y nil) ..) : list_scope.
Notation "x ::: y" := (String x y)
(at level 60, right associativity) : string_scope.
Open Scope list_scope.
Open Scope char_scope.
Directives and format
Inductive directive : Type :=
| DLit : ascii -> directive
| DNum : option nat -> directive
| DBool : directive
| DString : directive
| DChar : directive.
Directive DLit c outputs the literal character c, while DNum
s, DBool, DString, and DChar take an argument of the
corresponding type and print it. The s field of DNum s controls
how its argument should be printed. If s = Some n, then we output
exactly the n least-significant digits of the number, padding it
with zeros if necessary. Otherwise, if s = None, we just print the
whole number. Thus, the number 4 should be printed as 4 using the
DNum None, but as 04 using DNum (Some 2).
With the directive type in hand, defining format is
straightforward:
Definition format := list directive.
Relating format and arguments
Definition directiveType (dir : directive) :=
match dir with
| DLit _ => None
| DNum _ => Some nat
| DBool => Some bool
| DString => Some string
| DChar => Some ascii
end.
Now, formatType itself is just a direct translation of what we
stated above.
Fixpoint formatType (f : format) : Type :=
match f with
| [] => string
| dir :: dirs =>
match directiveType dir with
| Some T => T -> formatType dirs
| None => formatType dirs
end
end.
We can check if this definition makes sense on simple examples.
Example formatTypeTest1 : formatType [DBool, DString] =
(bool -> string -> string).
Proof. reflexivity. Qed.
Example formatTypeTest2 : formatType [DLit "a", DLit "b", DNum None] =
(nat -> string).
Proof. reflexivity. Qed.
The implementation
Fixpoint sprintf (f : format) : formatType f :=
match f with
| [] => ""
| dir :: dirs =>
match dir with
| DLit c => c ::: sprintf dirs
(* ... *)
end
end.
Module Internal.
Fixpoint sprintf (f : format) (k : string -> string) : formatType f :=
match f with
| [] => k ""
| dir :: dirs =>
match dir return formatType (dir :: dirs) with
| DLit c => sprintf dirs (fun res => k (c ::: res))
| DNum o =>
fun n =>
let k' := match o with
| Some size => fun res => k (writeNatSize size n res)
| None => fun res => k (writeNat n res)
end in
sprintf dirs k'
| DBool =>
fun b =>
sprintf dirs (fun res => k (writeBool b ++ res))
| DString =>
fun s =>
sprintf dirs (fun res => k (s ++ res))
| DChar =>
fun c =>
sprintf dirs (fun res => k (c ::: res))
end
end.
Most directives generate an additional argument to sprintf by
wrapping the recursive call with an anonymous function. Also, notice
the type annotation on the inner match: return formatType (dir ::
dirs). As one could hope, this mysterious expression is telling Coq
which type is being returned on each branch of the match. Dependent
types require more sophisticated type inference, and in some cases it
is necessary to provide these annotations explicitly.
To use sprintf, we just have to pass it the identity continuation
fun res => res, which will receive the value built by sprintf and
return it as-is.
Example sprintfTest1 :
sprintf [DNum None, DString] (fun res => res)
42 "This is a string" = "42This is a string".
Proof. reflexivity. Qed.
Example sprintfTest2 :
sprintf [DNum (Some 2), DLit "/", DNum (Some 2)] (fun res => res)
2 4 = "02/04".
Proof. reflexivity. Qed.
End Internal.
Strings as format
Definition addDir (o : option format) (dir : directive) : option format :=
match o with
| Some f => Some (dir :: f)
| None => None
end.
Fixpoint parseFormat (s : string) : option format :=
match s with
| "" => Some []
| "%" ::: s' =>
match s' with
| "%" ::: s'' => addDir (parseFormat s'') (DLit "%")
| "b" ::: s'' => addDir (parseFormat s'') DBool
| "s" ::: s'' => addDir (parseFormat s'') DString
| "c" ::: s'' => addDir (parseFormat s'') DChar
| "d" ::: s'' => addDir (parseFormat s'') (DNum None)
| _ => parseFormatSize s' 0
end
| c ::: s' =>
addDir (parseFormat s') (DLit c)
end
with parseFormatSize (s : string) (acc : nat) : option format :=
match s with
| "" => None
| "d" ::: s' => addDir (parseFormat s') (DNum (Some acc))
| c ::: s' =>
match digitToNat c with
| Some n => parseFormatSize s' (10 * acc + n)
| None => None
end
end.
We can test our function in some simple cases.
Example parseFormatTest1 :
parseFormat "%d%4da" = Some [DNum None, DNum (Some 4), DLit "a"].
Proof. reflexivity. Qed.
Example parseFormatTest2 :
parseFormat "%ca%s%" = None.
Proof. reflexivity. Qed.
Example parseFormatTest3 :
parseFormat "%s%b" = Some [DString, DBool].
Proof. reflexivity. Qed.
Putting the pieces together
Inductive printfError := InvalidFormat.
Definition sprintfOpt (o : option format) : match o with
| Some f => formatType f
| None => printfError
end :=
match o with
| Some f => Internal.sprintf f (fun res => res)
| None => InvalidFormat
end.
Definition sprintf (s : string) := sprintfOpt (parseFormat s).
Despite its intricate type, using our function is simple, as the
examples below show.
Definition greet name y m d : string :=
sprintf "Hello %s, today is %d/%2d/%2d" name y m d.
Example greetTest1 : greet "readers" 2013 4 19 =
"Hello readers, today is 2013/04/19".
Proof. reflexivity. Qed.
Definition tableRow name value : string :=
sprintf "<tr><td>%s</td><td>%b</td></tr>" name value.
Example tableRowTest1 : tableRow "x1" true =
"<tr><td>x1</td><td>true</td></tr>".
Proof. reflexivity. Qed.
Trying to pass the wrong number of arguments to sprintf, or
giving it arguments of the wrong type, will result in a type error.
(* Example greetTest2 : string := greet 2013 4 19 "readers". *)
(* Error: The term "2013" has type "nat" while it is expected to have type
"string". *)
(* Example tableRowTest2 : string := tableRow "x1". *)
(* Error: The term "tableRow "x1"" has type "bool -> string"
while it is expected to have type "string". *)