Skip to content

Semigroups: Our First Algebraic Structure

In the previous chapter, we discussed associativity, a key property of some binary operations. Now, let’s explore our first formal algebraic structure built around such operations: the Semigroup.

Understanding Semigroups Through Types and Operations

Section titled “Understanding Semigroups Through Types and Operations”

A semigroup is an algebraic structure (A, •) where:

  • A is a non-empty set (in programming terms, a Type).
  • (the “dot” operator, representing our binary operation) is a function that takes two elements from set A and returns an element also in A. In programming terms, this is a function with a generic type signature like 'a -> 'a -> 'a.
    • Recalling from Unit 1, this type signature (due to right-associativity, often related to currying) means the function takes a first argument of type 'a', and returns a new function of type 'a -> 'a'. This new function then takes the second argument of type 'a' and produces the final result, also of type 'a'.
    • Crucially, the operation is closed over the set/type: combining two 'a's always yields an 'a'.
  • The operation must satisfy the associative law: for all x, y, z in A, (x • y) • z = x • (y • z).

Let’s see this in familiar examples:

For the Type int (the set of integers):

  • Binary Operation: Addition, represented by the function (+).

    • Type: int -> int -> int. (Here, our generic 'a is specialized to int).
    • This function takes two ints and returns an int.
  • Associativity: (1 + 2) + 3 = 1 + (2 + 3) holds true.

  • Therefore, (int, (+)) forms a semigroup.

  • Binary Operation: Multiplication, represented by the function (*).

    • Type: int -> int -> int.
    • This function takes two ints and returns an int.
  • Associativity: (1 * 2) * 3 = 1 * (2 * 3) holds true.

  • Therefore, (int, (*)) forms a semigroup.

(Note: Integer subtraction (-) and division (/) are binary operations on integers, but they are not associative, so (int, (-)) and (int, (/)) do not form semigroups.)

For the Type string (the set of all possible strings):

  • Binary Operation: String concatenation, represented by the function (+) (in F# for strings).
    • Type: string -> string -> string.
    • This function takes two strings and returns a string.
  • Associativity: ("He" + "l") + "lo" = "He" + ("l" + "lo") holds true.
  • Therefore, (string, (+) for concatenation) forms a semigroup.
  • Type (Set): The set of all physical LEGO blocks.
  • Binary Operation (Function): The act of physically connecting two LEGO blocks. This operation takes two LEGO blocks and results in a new, combined LEGO block. Crucially, the result is still a “LEGO block” – the operation is closed within the type.
  • Associativity: As discussed, (A joined with B) joined with C is the same as A joined with (B joined with C).
  • Therefore, (LEGO blocks, join operation) forms a semigroup.
  • Type (Set): The set of all USB devices.
  • Binary Operation (Function): The act of connecting two USB devices via a USB hub. This operation takes two USB devices and results in a new, combined USB device configuration. The result is still a “USB device (set),” meaning the operation is closed within the type.
  • Associativity: Nesting hubs in different orders yields the same final USB device configuration.
  • Therefore, (USB devices, hub connection operation) forms a semigroup.

The real power of recognizing that a certain Type and a binary operation (a function of type 'a -> 'a -> 'a) form a semigroup is that we gain a guarantee:

  1. We have a well-defined set of values (our Type).
  2. We have a binary operation (our Function) that is:
    • Closed: It always takes two values of our type and returns a value of the same type.
    • Associative: It combines values in a predictable way, meaning the order of grouping for multiple operations doesn’t change the final outcome.

This allows us to reliably chain these operations together in our data transformation pipelines, much like we can confidently connect LEGO blocks in any grouping.

The semigroup structure ensures that for a sequence of operations a • b • c • d ..., we can compute it as ((a • b) • c) • d or a • (b • (c • d)) or any other valid grouping, and the result will be identical. This predictability is invaluable for building complex systems, optimizing computations (e.g., parallel processing), and reasoning about our code.