Daniel Keast

Sum types and Rust

programming, rust

I’ve been listening to the Lambdacast podcast recently. During one episode they had a discussion about algebraic data types and the differences between product and sum types. I’d heard the term algebraic data types before, but had no idea what product or sum types are.

It turns out that product types are what I’m generally used to. Structs, classes, tuples etc are product types. There are a variety of potential fields, and each of these fields is a ‘slot’ that may hold data.

Sum types also have a variety of potential fields, but a value of the type is exactly one of them. At the time I was reading this I couldn’t see anywhere that these would be at all useful.

The Rust programming language has support for sum types though it’s enums. The section of the rust book describing them is very well written and has a good example of somewhere they are useful with the IPv4 and IPv6 standards. The next section describes the pattern matching construct that makes them simple to use.

I’ve been writing little tools to create spectrum tap files from z80 roms, data files, and loading screens. These files are streams of data blocks, each of which has a preceeding header. There are several different types of block, and they each have different fields which should be present in them. Despite this, they’re fundamentally the same type, and should fit though the same functions to convert them into bytes. These are the data structures that I currently have:

#[derive(Serialize, Deserialize, Debug)]
pub struct DataBlock {
    name: [u8; 10],
    header: Header,
    data: Vec<u8>,
}

#[derive(Serialize, Deserialize, Debug)]
pub enum Header {
    Basic {
        start_line: u16,
    },
    Numeric {
        var_name: VarName,
    },
    AlphaNumeric {
        var_name: VarName,
    },
    Bytes {
        start_addr: u16,
    },
    Screen,
}

#[derive(Serialize, Deserialize, Debug, Copy, Clone)]
pub enum VarName {
    A = 1,
    B,
    C,
    D,
    E,
    F,
    G,
    H,
    I,
    J,
    K,
    L,
    M,
    N,
    O,
    P,
    Q,
    R,
    S,
    T,
    U,
    V,
    W,
    X,
    Y,
    Z,
}

This is both much simpler than the equivalent implementation using classes, but also much safer as the match construct requires that you have enough arms to cover all possible varients, and they must all return the same data type.

The de facto data serialization library for the Rust language is Serde, and when browsing the documentation I found the data type it uses to represent json values:

pub enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

It’s amazing to be able to represent any arbitrarily complex json value with a single small enum.