Introduction to Type Theory

tutorialstype theory

(!) Note: This blog post is rather old. Kept here for archival purposes.

Your favorite programming language has the concept of types. If you’re an undergraduate learning Java or C++, you’ve probably encountered numerous errors relating to types. You’re most likely asking yourself, “What are these types good for? When I was learning Python I didn’t need types.” What I have found surprising is that a lot of programmers can’t identify when they’re using types. After their first Computer Science courses, they still don’t know what type theory is. This article serves as an introduction to type theory for people who have a small amount of programming knowledge under their belt.

The topics we will be covering are:

  • Defining Types, Type Systems, & Type Theory
  • Understanding how Type Theory is applicable to Programming Languages

Defining our “Terms”… haha get it?

Within Type Theory, there are two (sometimes equal) entities that we talk about: Terms and Types.

Terms

A simple definition for term would be an object that acts as a component to a formula or function.

For example, in the statement x = 2, we define a term called x which can be used in later statements. Its value is an expression containing the term 2. Here are some examples of assignments with annotations:

x = 2
term = term

y = x * 4
term = term operator term

z = f(y, x)
term = function(term, term)

Are functions and operators considered terms? In some languages, yes! Higher-order functions are functions that may have some parameters that are other functions. Languages that support this, like Haskell and OCaml, are treating these functions as terms.

Here’s an example in Python:

# Define a function that takes a function
# as a parameter and returns a function
def doTwice(f):
    def do(x):
        return f(f(x))
    return do

# Create a lambda that we will pass
timesTwo = lambda x: x * 2

timesFour = twice(timesTwo)

timesFour(5) # = 20

Types

Types can be thought of as categories for terms. The most common examples would be:

  • Numbers: 1, 2, 4, pi, e
  • Strings: "test"
  • Array: int[], String[]
  • Function Type: int foo(int a, int b)

If you’ve been writing Java or other C like languages, you’re probably used to the notation of type term, like in the function type example above (int a, int b). Functional languages opt for the notation term : type where : means “is of type”. We will be using the latter notation because that is what a lot of academia uses.

The “function” type stores the types of its parameters in the type signature. Haskell fans will recognize this. foo : Int -> Int -> Int defines the type of foo as a function that takes in two Ints and returns an Int (the parameter types are every type in the list except the last type).

We can also say that String : Type, which is saying “String is a Type” (or “String is of type Type” but that is of course a little confusing).

Type Constructor

The last examples are interesting because they are types composed of types. This is known as a type constructor. It’s like a function, but instead of taking in values and returning a value, it takes in types and returns a type.

Let’s take a look at List. It would be appropriate to say List : Type -> Type which means List takes in a Type and returns a new Type. In Java we may write List<String>, where String is acting as the parameter for the List constructor. This gives us a new type that is separate from every other type constructed from that type constructor. For example, List<String> != List<Int>.

We are also free to define type constructors that take in multiple types. If you’ve dealt with large generic classes, you’ve probably seen something like Foo<A, B, C, D>. In a more formal notation, we’d write this as Foo : Type -> Type -> Type -> Type -> Type.

Kinds of Type Systems

A type system can be thought of as a set of rules that govern the correctness of programs. More concretely, if the use of types in a piece of code follows the rules of the type system, that code will compile correctly. As a familiar example, if we define variables as one type and set them equal to another type, we will get an error at compile-time.

String s = 2; // ERR
int a = "wow"; // ERR

int b = 2; // OK
String wow = "wow"; // OK

There are many kinds of type system in the wild that are implemented in real programming languages.

Static vs. Dynamic Typing

In Python, we can define a variable without declaring what type of values the variable can hold, such as x = 2. There is no dance to change the value that variable stores:

x = "test"
x = 4
x = b"test"
x = 2

Variables in Python can be of any type, and the type of a variable can change during runtime. We can say the type of x is dynamic, also known as dynamic typing.

In Java, we are not allowed to do the same. Once we declare a variable, that variable has that type forever.

String s = "test";
s = "otherTest"; // OK
s = 2; // ERR

In this code, s is declare as a String. You are not allowed to change the value of the variable to something that isn’t of the same time. I can assign any String to s but as soon as I try a integer, the compiler will yell at us. We call this static typing because the type of a variable is unchanging or “static”.

Nominal vs. Structural Typing

C-like languages allow the user to define structures containing other values, usually called structs or classes. Let’s define two structures with the same values:

struct X {
    int x;
    int y;
};

struct Y {
    int x;
    int y;
};

With nominal typing, you are not allowed to use X where Y is expected and vice versa. The compiler will error even if they contain the same fields.

void useX(X x) {
    // ...
}

int main(void) {
    struct X x = {1, 2};
    struct Y y = {3, 4};

    useX(x); // OK
    useX(y); // ERR

    return 0;
}

Some languages like OCaml allow this behaviour. The name doesn’t have to be the same, only the structure of the objects. This is called structural typing.

# let x =
  object
    val mutable x = 5
    method get_x = x
    method set_x y = x <- y
  end;;
val x : < get_x : int; set_x : int -> unit > = <obj>
 
# let y =
  object
    method get_x = 2
    method set_x y = Printf.printf "%d\n" y
  end;;
val y : < get_x : int; set_x : int -> unit > = <obj>

# let set_to_10 a = a#set_x 10;;
val set_to_10 : < set_x : int -> 'a; .. > -> 'a = <fun>

You can see that the types of these variables are defined by the signatures of the methods inside it. The function defined at the end, set_to_10, takes in an object that contains a method called set_x. The rest of the object’s internals don’t matter.

Where does the Theory come in?

Now that we’ve looked at a few examples of type systems, you’re probably wondering where type theory comes in. Type theory is the theoretical research of new type systems that could have an impact on real world programming languages.

For example, dependent types are special kinds of types that depend on values (we will discuss this in depth later), and languages like Idris and Agda are general purpose languages that use dependent types to provide an extra layer of safety.

I hope this has served as a good introduction to type theory. In later articles, we will be diving into more complex topics.

Further Reading