-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfinite-dtypes.lagda
392 lines (357 loc) · 14.1 KB
/
finite-dtypes.lagda
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
\documentclass[sigconf]{acmart}
\usepackage{amsfonts}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
\usepackage{verbatim}
\usepackage{url}
\usepackage{finite-dtypes}
\begin{document}
\title{Finite Dependent Types}
\subtitle{Fancy Types For Systems Metaprogramming And Dependency Management}
\author{Alan Jeffrey}
\orcid{0000-0001-6342-0318}
\affiliation{Mozilla Research}
\email{[email protected]}
\acmConference{Submitted for publication}{November 2017}{}
\acmDOI{}
\acmISBN{}
\acmYear{2018}
\copyrightyear{%
\includegraphics[height=1.5ex]{cc-by-88x31.png}
\url{https://creativecommons.org/licenses/by/4.0/}\\
This work is licensed under a Creative Commons Attribution 4.0 International License.\\
This paper~\cite{self} is written in Literate Agda, and typechecked with Agda~v2.4.2.5%
}
\setcopyright{none}
\settopmatter{printacmref=false}
\urlstyle{tt}
% Set up the basic definitions.
\begin{comment}
\begin{code}
{-# OPTIONS --type-in-type #-} -- WARNING, CLAXONS!
open import prelude -- POSTULATES LIVE HERE!
\end{code}
\end{comment}
\maketitle
\section{Introduction}
Applications such as web browsers often have issues of scale and
complexity of the code base. For example, the Servo~\cite{servo.org}
next generation web engine contains more than 250KLoC:
\begin{tinyverb}
$ loc servo/components/
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
Rust 811 323350 28930 35208 259212
...
--------------------------------------------------------------------------------
Total 961 353834 33406 37472 282956
--------------------------------------------------------------------------------
\end{tinyverb}
That is just the Servo codebase itself. Servo also makes use of the
Cargo~\cite{crates.io} software ecosystem, and has over 200
transitive dependencies with more than a 1MLoC in Rust,
and 9MLoC in total:
\begin{tinyverb}
$ loc servo/.cargo/
...
Rust 2274 1541124 65910 111065 1364149
...
--------------------------------------------------------------------------------
Total 58295 11784796 1274681 1179475 9330640
--------------------------------------------------------------------------------
\end{tinyverb}
Building Servo generates even more source code:
\begin{tinyverb}
$ loc servo/target/
...
Rust 611 893414 74200 13194 806020
...
--------------------------------------------------------------------------------
Total 3901 1660507 174703 107729 1378075
--------------------------------------------------------------------------------
\end{tinyverb}
Much of this generated code comes from the \texttt{script} component,
which generates Rust bindings from Web~IDL~\cite{webidl}:
\begin{tinyverb}
$ loc servo/target/debug/build/script-*/
...
Rust 579 781977 63352 6424 712201
...
--------------------------------------------------------------------------------
Total 592 800055 66936 9849 723270
--------------------------------------------------------------------------------
\end{tinyverb}
The code generator itself is 20KLoC in Python:
\begin{tinyverb}
$ loc servo/components/script/dom/bindings/codegen/
...
Python 80 26919 3903 2112 20904
...
--------------------------------------------------------------------------------
Total 81 26932 3904 2112 20916
--------------------------------------------------------------------------------
\end{tinyverb}
There should be a more principled approach to dependency management and metaprogramming.
\section{Metaprogramming}
Fortunately, metaprogramming is a well-explored area,
notably in the Racket~\cite{racket-lang.org} programming
language's \texttt{\#lang} ecosystem.
Much metaprogramming relies on dynamic checks,
since the host language's type system
is not usually expressive enough to encode object
language types at compile time.
A notable exception is the use of \emph{dependent types}
(as implemented in, for example, Coq~\cite{coq}, Agda~\cite{agda}
or Idris~\cite{idris})
which allow
the compile-time computation of types which depend on data,
Dependent types have already been proposed
for low-level programming~\cite{CHAGN07}, generic programming~\cite{AM03}
and metaprogramming~\cite{Chl10}.
% The Agda types of things defined in this figure.
\begin{comment}
\begin{code}
infixr 5 _/times/_ _/rightarrow/_
_/times/_ : ∀ {j k} {m : 𝔹 ↑ j} {n : 𝔹 ↑ k} → (FSet m) -> (FSet n) -> (FSet(m + n))
_/rightarrow/_ : ∀ {j k} {m : 𝔹 ↑ j} {n : 𝔹 ↑ k} → (A : FSet m) -> (FSet n) -> (FSet (n /ll/ m))
\end{code}
\end{comment}
\begin{figure}[t]
\begin{code}
(A /times/ B) = & (/prod/ x /in/ A /cdot/ B) \\
(A /rightarrow/ B) = & (/sum/ x /in/ A /cdot/ B) \\ \\
%nothing =
/nothing/ &/in/ /FSet/(/zerop/) \\
%unit =
/unit/ &/in/ /FSet/(/zerop/) \\
%bool =
/bool/ &/in/ /FSet/(/one/) \\
%prod = \ j k (m : 𝔹 ↑ j) (n : 𝔹 ↑ k) ->
(/prod/ x /in/ A /cdot/ B(x)) &/in/ /FSet/(m + n) \\
&/WHEN/ B /in/ (A /rightarrow/ /FSet/(n))
/AND/ A /in/ /FSet/(m) \\
%sum = \ j k (m : 𝔹 ↑ j) (n : 𝔹 ↑ k) ->
(/sum/ x /in/ A /cdot/ B(x)) &/in/ /FSet/ (n /ll/ m) \\
&/WHEN/ B /in/ (A /rightarrow/ /FSet/(n))
/AND/ A /in/ /FSet/(m) \\
%FSet = \ k (n : 𝔹 ↑ k) ->
/FSet/(n) &/in/ /FSet/(/succp/(n))
\end{code}
\caption{Type rules for simple finite dependent types,
where~$\kw{FSet}(n)$ is the type of types with at most $2^n$ elements}
\label{simple-types}
\end{figure}
\subsection{Dependent metaprogramming}
\begin{comment}
\begin{code}
-- We assume 8-bit words in this paper.
/WORDSIZE/ = & [ /false/ , /false/ , /false/ , /true/ , /false/ , /false/ , /false/ , /false/ ] \\
/word/ = & /bits/ (/WORDSIZE/)
-- Not sure what the real cardinality of IO should be
/IO/ : ∀ {k} {n : 𝔹 ↑ k} → FSet(n) → FSet(/WORDSIZE/)
/IO/ = HOLE
-- Pointers
/reference/ : ∀ {k} {n : 𝔹 ↑ k} → FSet(n) → FSet(/WORDSIZE/)
/reference/(A) = record { Carrier = Carrier A ; encodable = HOLE }
-- Strings
/str/ : FSet(/WORDSIZE/)
/str/ = HOLE
-- Console defined below
csize = (/WORDSIZE/ /ll/ /WORDSIZE/) /ll/ /WORDSIZE/
/Console/ : forall /ssize/ -> FSet(/ssize/) -> Carrier(/FSet/(csize))
\end{code}
\end{comment}
Metaprogramming includes the ability to interpret object
languages such as Web~IDL. For example:
\begin{verbatim}
interface Console { log(DOMString moduleURL); };
\end{verbatim}
might be interpreted (using types from Figure~\ref{simple-types}):
\begin{code}
/Console/ = &
/fn/ /ssize/ /in/ /word/ /cdot/
/fn/ /DOMString/ /in/ /FSet/(/ssize/) \\&\indent /cdot/
/prodp/ /csize/ /in/ /word/ /cdot/
/prodp/ /Console/ /in/ /FSet/(/csize/) \\&\indent /cdot/
/prodp/ /log/ /in/ /reference/((/Console/ /times/ /DOMString/) /rightarrow/ /IO/(/unit/)) \\&\indent /cdot/
/unitp/
\end{code}
The important point about this interpretation is that it is internal
to the system, and can be typed. If we define:
\begin{code}
/CSize/ = & (/WORDSIZE/ /ll/ /WORDSIZE/) /ll/ /WORDSIZE/ \\
\end{code}
then the typing of $\kw{Console}$ is internal to the language:
\begin{code}
%Console =
/Console/(n)(S) &/in/ /FSet/(/CSize/)
&/WHEN/ S /in/ /FSet/(n)
/AND/ n /in/ /word/
\end{code}
Chlipala~\cite{Chl10} has shown that dependent metaprogramming
can give type-safe bindings for first-order languages like
SQL schemas. Hopefully it scales to higher-order languages like Web~IDL.
\subsection{Dependent dependencies}
\begin{comment}
\begin{code}
size = [ /false/ , /false/ , /false/ , /false/ , /false/ , /false/ , /false/ , /false/ , /false/ , /false/ , /false/ , /false/ , /true/ ]
/A[1,0]/ : FSet(size)
/A[1,1]/ : FSet(size)
/a[1,0,1]/ : Carrier(/A[1,0]/)
/a[1,0,2]/ : Carrier(/A[1,0]/)
/a[1,1,0]/ : Carrier(/A[1,1]/)
/B[1,0]/ : Carrier(/A[1,1]/) → FSet(size)
/b[1,0,1]/ : ∀ a → (Carrier(/B[1,0]/ a))
\end{code}
\end{comment}
Dependencies are usually versioned, for instance Cargo uses semantic versioning~\cite{semver.org}.
Semantic versions are triples $[x, y, z]$, where the interface for a package only depends on
$[x, y]$, and interfaces with the same $x$ are required to be upwardly compatible.
For example an interface at version [1,0] might consist of a sized type $\kw{T}$
together with an element $\kw{z}\in\kw{T}$:
\begin{code}
/A[1,0]/ = &
/prodp/ /size/ /in/ /word/ /cdot/
/prodp/ /T/ /in/ /FSet/(/size/) \\&\indent /cdot/
/prodp/ /z/ /in/ /T/ /cdot/
/unitp/
\end{code}
One implementation sets $\kw{T}$ to be $\kw{unit}$,
but the next sets $\kw{T}$ to be $\kw{bool}$:
\begin{code}
/a[1,0,1]/ = (
/zerop/ ,
/unitp/ ,
/epsilon/ ,
/epsilon/ ) /quad/
/a[1,0,2]/ = (
/onep/ ,
/boolp/ ,
/false/ ,
/epsilon/ )
\end{code}
Bumping the minor version requires an implementation with a compatible interface,
for example:
\begin{code}
/A[1,1]/ = &
/prodp/ /size/ /in/ /word/ /cdot/
/prodp/ /T/ /in/ /FSet/(/size/) \\&\indent /cdot/
/prodp/ /z/ /in/ /T/ /cdot/
/prodp/ /s/ /in/ /reference/(/T/ /rightarrow/ /T/) /cdot/
/unitp/
\end{code}
which can be implemented by setting $\kw{T}$ to be $\kw{word}$:
\begin{code}
/tsucc/ = & (/fn/ x /cdot/ /truncate/(/one/ + x)) \\
/a[1,1,0]/ = &
( /WORDSIZE/
, /word/
, /zerop/
, /tsucc/
, /epsilon/
)
\end{code}
Implementations may be dependent, for example $\kw{B}$ might depend on $\kw{A}[1,y]$ for any $y\ge1$:
\begin{code}
/B[1,0]/(/size/ , /A/ , /z/ , /s/ , /ldots/) =
/prodp/ /ss/ /in/ /reference/(/A/ /rightarrow/ /A/) /cdot/
/unitp/
\end{code}
with matching implementation:
\begin{code}
/b[1,0,1]/(/size/ , /A/ , /z/ , /s/ , /ldots/) =
( (/fn/ x /cdot/ /s/(/s/(x)))
, /epsilon/
)
\end{code}
In summary, an interface $A[x,y]$ is interpreted as family of types
where if $y\le y'$ then $A[x,y] :> A[x,y']$ for an appropriate
definition of \emph{interface evolution}.
Dependent packages are treated in the style
of Kripke semantics, as functions
$\forall A[x,y'] <: A[x,y] \cdot B[m,n]$.
There has been much attention paid to dependent types for module
systems~\cite{McQ86, HMM90, Har93}. In some ways, dependency management is simpler
because dependencies are acyclic,
but it does introduce interface evolution complexity,
for example Rust's~\cite[\#1105]{rfcs}.
\subsection{Finite dependencies}
\begin{comment}
\begin{code}
/binary/ : ∀ {j} (k : 𝔹 ↑ j) → FSet(k)
\end{code}
\end{comment}
One feature that all of these examples have in common is that they do not
require any infinite data. Existing dependent type systems encourage the
use of infinite types such such as lists or trees.
The prototypical infinite types are $\mathbb{N}$ (the type of natural
numbers) and $\kw{Set}$ (the type of types). This is a mismatch with systems
programs, where types are often \emph{sized} (for example in Rust,
types are \texttt{Sized} by default~\cite[\S3.31]{rust-book}).
In particular, systems programs are usually parameterized by
$\kw{WORDSIZE}$, and assume that data fits into memory
(for example that arrays are indexed by a word, not by a natural number).
An overview of approaches to finite sets in dependent type theory is
given in~\cite{FGGvW18}, which defines
a type constructor $\mathcal{K}(A)$ (the
Kuratowski-finite subsets of $A$), and defines the finite
types to be a type $A$ together with an $X \in \mathcal{K}(A)$
such that every $x:A$ has $x \in X$.
Type constructors such as dependent functions and pairs, preserve
finiteness, but the type of finite types is not itself finite.
In Figure~\ref{simple-types}, we posit a type system with
a type-of-types $\kw{FSet}(n)$,
containing types with at most $2^n$ elements.
In this system, all types are finite, in particular $\kw{FSet}(n)
\in \kw{FSet}(\kw{one} + n)$. We expect that this explicit bound on
the size of $\kw{FSet}(n)$ cannot be expressed internally inside an
off-the-shelf dependent type theory, but we conjecture that an extension
like this is consistent.
This system is based on a theory of binary arithmetic, but even that is
definable within the language, in the presence of an appropriate
induction principal for binary arithmetic. For example the type of
bitstrings is definable by induction:
\begin{code}
/binary/ = & /indn/
(/FSet/)
(/unitp/)
(/fn/ n /cdot/ /fn/ A /cdot/ (/bool/ /times/ A)) \\
%WORDSIZE =
/WORDSIZE/ &/in/ /binary/(/WORDSIZE/)
\end{code}
As the type for $\kw{WORDSIZE}$ suggests, this is all spookily
cyclic. In particular, binary numbers are parameterized by their
bitlength, which is itself a binary number. A bootstrapping induction principal is
needed, similar to Agda universe
polymorphism~\cite{agda}. We hypothesize that irrelevant
natural numbers might suffice.
Potential advantages of explicitly finite dependent types
include:
\begin{itemize}
\item A better fit with systems programming languages such as Rust,
where sized types are the default.
\item Domain specific techniques for arithmetic can better support
type inference.
\item Simpler induction schemes, such as over the irrelevant naturals,
similar to the use of sized types in taming coinduction~\cite{AP16},
but without the use of ordinals.
\end{itemize}
Dependent type systems usually come with an identity type $a \equiv_A
b$ where $a : A$ and $b : A$. If identity types are interpreted as
paths as in Homotopy Type Theory~\cite{hott}, then the size of $A
\equiv_{\kw{FSet}(n)} B$ is at most the size of $A \rightarrow B$,
which would suggest considering $(a \equiv_A b) \in \kw{FSet}(n \ll
n)$ when $A \in \kw{FSet}(n)$. This makes the type of identities over
$A$ much larger than the type of $A$, which may give problems with,
for example, codings of indexed types.
\section{Conclusions}
Dependent types are a good fit for some of the more difficult problems
with programming in the large: metaprogramming and dependency management.
However, their focus on infinite types is a mismatch.
Systems are finite, and are better served by systems which
encourage the use of finite types.
\sloppy
\bibliographystyle{plain}
\bibliography{finite-dtypes}
\end{document}