Where the Coq lives

| Comments GitHub

Reading and writing numbers in Coq

Many programming languages have built-in support for string processing. Coq is no exception. The standard library provides us with its own definition of strings. Unlike other languages, though, strings in Coq are not fundamentally different from other data types: they are defined as an inductive type.

Print string.

(* Inductive string : Set :=
      EmptyString : string | String : ascii -> string -> string *)

As we can see, strings are much like the list type, but contain ascii elements instead of elements of an arbitrary type. asciis, on the other hand, are just eightuples of bools.

Print ascii.

(* Inductive ascii : Set :=
      Ascii : bool ->
              bool -> bool -> bool -> bool -> bool -> bool -> bool -> ascii *)

Sure enough, if we had to use constructors explicitly for building strings, using them in Coq wouldn't be very practical. Fortunately, Coq provides a convenient notation for strings and ascii, much like the built-in notation for numbers. They are defined in string_scope and char_scope, respectively.

Open Scope string_scope.
Example stringEx : string := "This is a string".

Open Scope char_scope.
Example asciiEx : ascii := "a".

Let's see what kind of string-processing functions we can write. One could certainly hope that we'd be able to write a function to read numbers. To do this, we will need a function to convert asciis to nats: if the character is a digit, we return the corresponding number. Otherwise, the whole parsing should fail. As in other functional programming languages, we model this by making our function return an option instead ─ in this case, option nat.

Definition digitToNat (c : ascii) : option nat :=
  match c with
    | "0" => Some 0
    | "1" => Some 1
    | "2" => Some 2
    | "3" => Some 3
    | "4" => Some 4
    | "5" => Some 5
    | "6" => Some 6
    | "7" => Some 7
    | "8" => Some 8
    | "9" => Some 9
    | _ => None

We can now use this function to read numbers. To make it more efficient, we can add an acc parameter to store the intermediate results of the computation ─ i.e., the number we've read so far.

Open Scope string_scope.

Fixpoint readNatAux (s : string) (acc : nat) : option nat :=
  match s with
    | "" => Some acc
    | String c s' =>
      match digitToNat c with
        | Some n => readNatAux s' (10 * acc + n)
        | None => None

Definition readNat (s : string) : option nat :=
  readNatAux s 0.

We can write some unit tests to make sure our function behaves as expected.

Example readNat1 : readNat "1234" = Some 1234.
Proof. reflexivity. Qed.

Example readNat2 : readNat "asdf" = None.
Proof. reflexivity. Qed.

Since we have a function for reading numbers, we should now be able to write one for printing them. Again, as a first step, let's write a function that converts nats to their corresponding digits.

Open Scope char_scope.

Definition natToDigit (n : nat) : ascii :=
  match n with
    | 0 => "0"
    | 1 => "1"
    | 2 => "2"
    | 3 => "3"
    | 4 => "4"
    | 5 => "5"
    | 6 => "6"
    | 7 => "7"
    | 8 => "8"
    | _ => "9"

In all rigor, natToDigit should return an option ascii instead of a plain ascii, just like we did in our previous digitToNat function. After all, it doesn't make any sense to associate any digit to, say, 10. However, if we make sure that we only use this function on numbers less than 10, we don't have to do an explicit conversion from option ascii to ascii later on, and our function will still be correct. As a matter of fact, we can now prove a theorem that tells us when it's safe to use natToDigit.

Theorem digitToNatNatToDigit : forall n : nat,
  n < 10 ->
  digitToNat (natToDigit n) = Some n.
  intros n H.

  (* H ensures that we only have to check the cases from n = 0 to
     9. We can do this very easily by using a "repeat match" where we
     do a case analysis on n until the H hypothesis becomes
     contradictory. *)

  repeat match goal with
          | n : nat |- _ =>
            destruct n; [reflexivity|try lia]


In particular, we get the following consequence, which will be particularly useful later:

Theorem digitToNatNatToDigitMod : forall n : nat,
  digitToNat (natToDigit (n mod 10)) = Some (n mod 10).
  intros n. apply digitToNatNatToDigit.
  apply Nat.mod_upper_bound.

Now that we can convert numbers to digits, let's see how we could write our printing function. One idea would be to add a parameter to our function to accumulate the paritially printed number:

    Fixpoint writeNatAux (n : nat) (acc : string) : string :=
      let acc' := String (natToDigit (n mod 10)) acc in
      match n / 10 with
        | 0 => acc'
        | n' => writeNatAux n' acc'
The algorithm is straightforward. We print the least signigicant digit of the number, adding it to the string we've printed so far (recall that the String constructor adds a character to the front of a string). Then, we divide the number by 10 and print it recursively, until we reach zero.
Unfortunately, Coq doesn't accept this definition, since the recursive call is not done on structurally smaller terms. There are lots of ways to make this definition work, such as using the Program command. We will use the somewhat simpler, but standard, trick of adding an explicit "timeout" parameter to our function.

Fixpoint writeNatAux (time n : nat) (acc : string) : string :=
  let acc' := String (natToDigit (n mod 10)) acc in
  match time with
    | 0 => acc'
    | S time' =>
      match n / 10 with
        | 0 => acc'
        | n' => writeNatAux time' n' acc'

At each step, we do a pattern match on the timeout parameter. If we reach 0, our computation must stop, and we just give back the string we've printed so far. Otherwise, we have a structurally smaller argument we can use to do a recursive call, which will make Coq accept the definition.
Now, in order to effectively print a number, we need to find a timeout value that is big enough to perform the entire computation. In this case, as it turns out, we can use the very the number we're printing for the timeout parameter.

Open Scope string_scope.

Definition writeNat (n : nat) : string :=
  writeNatAux n n "".

Let's test our definition on some simple examples.

Example writeNat1 : writeNat 12 = "12".
Proof. reflexivity. Qed.

Example writeNat2 : writeNat 0 = "0".
Proof. reflexivity. Qed.

It seems clear that readNat is indeed the inverse of writeNat, but can we prove this rigorously in Coq? Formally, we can state the theorem we want as

Definition readNatWriteNatStatement :=
  forall n, readNat (writeNat n) = Some n.

Since both functions are defined in terms of auxiliary fixpoints, it is better to prove something about them first and then use that fact to get the result we want as a consequence. Here's one such possibility:

Definition readNatAuxWriteNatAuxStatement :=
  forall time n acc,
    n <= time ->
    readNatAux (writeNatAux time n acc) 0 =
    readNatAux acc n.

Notice the precondition n <= time in the above statement. As mentioned before, we must give enough time to writeNatAux for it to function correctly. This precondition is just stating this fact explicitly.
It is easy to prove that this lemma will suffice.

Theorem readNatWriteNat' :
  readNatAuxWriteNatAuxStatement ->
  unfold readNatAuxWriteNatAuxStatement, readNatWriteNatStatement.
  unfold readNat, writeNat.
  intros H n.
  rewrite H; trivial.

Now, we are ready to prove our intermediate lemma. As usual, since writeNatAux is defined by recursion on time, trying the proof by induction on time seems a good guess. Let's begin with a simple lemma that will be useful later. It states that if the precondition for our lemma is satisfied, then it'll also be satisfied for the recursive call, which will allow us to apply the induction hypothesis.

Lemma div_10_le : forall n time,
  n <= S time -> n / 10 <= time.
  intros [|n] time H. simpl. lia.
  assert (S n / 10 < S n); try lia.
  apply Nat.div_lt; lia.

We can proceed with our proof.

Lemma readNatAuxWriteNatAux : readNatAuxWriteNatAuxStatement.

  intros time.
  induction time as [|time' IHtime]; intros n acc H.

  (* The base case is where the inductive hypothesis is crucial, since it
     forces n to be 0. *)

  - simpl. inversion H. reflexivity.

  (* The inductive case is slightly trickier, but not too hard. We
     start by applying our previous lemma. *)

  - apply div_10_le in H. unfold writeNatAux.

    (* To simplify the goal, we need to do some case analysis on (n /
       10). *)

    destruct (n / 10) as [|n'] eqn:En';

    (* We do a recursive call in the non-zero case, and that's where we will
       need our induction hypothesis. *)

    [|rewrite IHtime; auto]; unfold readNatAux;

    (* Finally, we need the digitToNatNatToDigitMod lemma to show that
       readNatAux doesn't fail, and the div_mod theorem (defined in the
       standard library) to express n in terms of (n / 10) and (n mod 10).
       lia and congruence can easily take care of the end. *)

    try rewrite digitToNatNatToDigitMod;
    try rewrite (div_mod n 10) at 2; try lia;


The final result follows easily.

Theorem readNatWriteNat : readNatWriteNatStatement.
  apply readNatWriteNat'.
  apply readNatAuxWriteNatAux.
comments powered by Disqus