What is a probability?

A probability is a measurement of a possibility (relative to a range of possibilities). Probability theory is a way of formalizing this idea. The most common such formalization–the Kolmogorov axioms–can be thought of as defining: (i) what it means to be a possibility; and (ii) what it means to measure a possibility.1

What it means to be a possibility

The Kolmogorov axioms start by specifying a set \(\Omega\) that contains all and only the things that can possibly happen. This set is known as the sample space. So what it means to be a possibility is a brute fact: it’s all and only the things in \(\Omega\).

That’s very abstract, so let’s consider a few examples relevant to this class:

  1. \(\Omega\) could the set of all phonemes in a language (or some subset thereof)–e.g. the English vowels \(\Omega = \{\text{e, i, o, u, æ, ɑ, ɔ, ə, ɛ, ɪ, ʊ}\}\).
  2. \(\Omega\) could be the set of all pairs of first and second formants–represented as all pairs of positive real numbers \(\mathbb{R}_+^2\).2
  3. \(\Omega\) could be the set of all strings of phonemes in a language–e.g. if \(\Sigma\) is the set of phonemes, then \(\Omega = \Sigma^* = \bigcup_{i=0}^\infty \Sigma^i\).
  4. \(\Omega\) could be the set of all strings of morphemes in a language–e.g. if \(\Sigma\) is the set of morphemes, then \(\Omega = \Sigma^* = \bigcup_{i=0}^\infty \Sigma^i\).
  5. \(\Omega\) could be the set of all grammatical derivations for a grammar \(G\)–e.g. if \(G = \langle \Sigma, V, R, S \rangle\) (with \(R \subseteq V \times (V \cup \Sigma \cup \{\epsilon\})^+\)) is a context free grammar, then \(\Omega = \bigcup_{s \in L_G} P_G(s)\), where \(L_G\) is the language generated by \(G\) and \(P_G\) is a parser for \(G\).

The axioms then move forward by defining classes of possibilities \(F \subseteq \Omega\), which together form a classification of possibilities \(\mathcal{F} \subseteq 2^\Omega\). These classes of possibilities are known as events and the classification of possibilities is known as the event space. It is events, which can contain just a single possibility, that we measure the probability of.3

Two event spaces for (a subset of) English pronouns

The event space is where interesting linguistic structure enters the picture. Let’s look at a few examples of event spaces that assume that the sample space is the following set of pronouns of English: \(\Omega = \{\text{I}, \text{me}, \text{you}, \text{they}, \text{them}, \text{it}, \text{she}, \text{her}, \text{he}, \text{him}, \text{we}, \text{us}\}\).

emptyset = frozenset()
pronouns = frozenset({
    "I", "me", 
    "you", 
    "they", "them", 
    "it", 
    "she", "her", 
    "he", "him", 
    "we", "us",
})

The person event space

One possible event space distinguishes these pronouns with respect to third v. non-third: \(\mathcal{F}_\text{person} = \{F_\text{[+third]}, F_\text{[-third]}, \Omega, \emptyset\}\), with \(F_\text{[+third]} = \{\text{they}, \text{them}, \text{it}, \text{she}, \text{her}, \text{he}, \text{him}\}\) and \(F_\text{[-third]} = \Omega - F_\text{[+third]}\).

third = frozenset({"they", "them", "it", "she", "her", "he", "him",})
nonthird = pronouns - third

f_person = frozenset({
    frozenset(emptyset), 
    frozenset(third), frozenset(nonthird), 
    frozenset(pronouns)
})

You’ll notice that beyond having just the set of third v. non-third pronouns in the event space, we also have the entire set of pronouns \(\Omega\) itself alongside the empty set \(\emptyset\). The reasons for this are technical: to make certain aspects of the formalization of what it means to measure possibilities work out nicely, we need the event space \(\mathcal{F}\) to form what is known as a \(\sigma\)-algebra on the sample space \(\Omega\). All this means is that:

  1. \(\mathcal{F} \subseteq 2^\Omega\)
  2. \(E \in \mathcal{F}\) iff \(\Omega - E \in \mathcal{F}\) (closure under complement)
  3. \(\bigcup \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable union)
  4. \(\bigcap \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable intersection)

You can check that all of these conditions are satisfied for \(\mathcal{F}_\text{person}\) only if \(\Omega\) and \(\emptyset\) are both in \(\mathcal{F}\). When \(\mathcal{F} \subseteq 2^\Omega\) is a \(\sigma\)-algebra, the pair \(\langle \Omega, \mathcal{F} \rangle\) is referred to as a measurable space. When \(\Omega\) is finite–as it is here–we say that \(\langle \Omega, \mathcal{F} \rangle\) is more specifically a finite measurable space.

from typing import Set, FrozenSet, Iterable
from itertools import chain, combinations
from functools import reduce

SampleSpace = FrozenSet[str]
Event = FrozenSet[str]
SigmaAlgebra = FrozenSet[Event]

def powerset(iterable: Iterable) -> Iterable:
    """The power set of a set

    See https://docs.python.org/3/library/itertools.html#itertools-recipes

    Parameters
    ----------
    iterable
        The set to take the power set of
    """
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

class FiniteMeasurableSpace:
    """A finite measurable space
    
    Parameters
    ----------
    atoms
        The atoms of the space
    sigma_algebra
        The σ-algebra of the space    
    """
    def __init__(self, atoms: SampleSpace, sigma_algebra: SigmaAlgebra):
        self._atoms = atoms
        self._sigma_algebra = sigma_algebra

        self._validate()

    def _validate(self):
        for subset in self._sigma_algebra:
            # check powerset condition
            if not subset <= self._atoms:
                raise ValueError(
                    "All events must be a subset of the atoms. "
                    f"{set(subset)} is an event but not a subset."
                )

            # check closure under complement
            if not (self._atoms - subset) in self._sigma_algebra:
                raise ValueError(
                    "The σ-algebra must be closed under complements. "
                    f"{set(self._atoms - subset)} is the complement of {set(subset)}, "
                    "which is an event, but it is not an event."
                )

        for subsets in powerset(self._sigma_algebra):
            subsets = list(subsets)

            # python doesn't like to reduce empty iterables
            if not subsets:
                continue

            # check closure under finite union
            union = frozenset(reduce(frozenset.union, subsets))
            if union not in self._sigma_algebra:
                raise ValueError(
                    "The σ-algebra must be closed under countable union. "
                    f"{union} is a union of events {subsets} but not an event."
                )

            # check closure under finite intersection
            intersection = frozenset(reduce(frozenset.intersection, subsets))
            if intersection not in self._sigma_algebra:
                raise ValueError(
                    "The σ-algebra must be closed under finite intersection. "
                    f"{set(intersection)} is the intersection of events {subsets} but "
                    "not an event."
                )
                
        print("This pair is a finite measurable space.")

    @property
    def atoms(self) -> SampleSpace: 
        return self._atoms

    @property
    def sigma_algebra(self) -> SigmaAlgebra:
        return self._sigma_algebra

The \(\sigma\)-algebra conditions are checked as part of initializing the implementation of FiniteMeasurableSpace, and so we see that \(\langle \Omega, \mathcal{F}_\text{person}\rangle\) is a measurable space.

person_space = FiniteMeasurableSpace(pronouns, f_person)
This pair is a finite measurable space.

The case event space

Another possible event space that is slightly more interesting distinguishes pronouns with respect to case: \(\mathcal{F}_\text{case} = \{F_\text{[+acc]}, F_\text{[-acc]}, F_\text{[+acc]} \cap F_\text{[-acc]}, \Omega - F_\text{[+acc]}, \Omega - F_\text{[-acc]}, \Omega - [F_\text{[+acc]} \cap F_\text{[-acc]}], \Omega, \emptyset\}\), with \(F_\text{[+acc]} = \{\text{me}, \text{you}, \text{them}, \text{her}, \text{him}, \text{it}, \text{us}\}\) and \(F_\text{[-acc]} = \{\text{I}, \text{you}, \text{they}, \text{she}, \text{he}, \text{it}, \text{we}\}\). Beyond the set of pronouns \(\Omega\), the empty set \(\emptyset\), the set of accusative pronouns \(F_\text{[+acc]}\) and the set of non-accusative pronouns \(F_\text{[-acc]}\), we additionally need:

  1. The set of pronouns that can be either accusative or non-accusative \(F_\text{[+acc]} \cap F_\text{[-acc]} = \{\text{you}, \text{it}\}\).
  2. The set of non-accusatives that cannot be accusative \(\Omega - F_\text{[+acc]} = \{\text{I}, \text{they}, \text{he}, \text{she}, \text{we}\}\)
  3. The set of accusatives that cannot be non-accusative \(\Omega - F_\text{[-acc]} = \{{\text{me}, \text{them}, \text{her}, \text{us}, \text{him}}\}\)
  4. The set of pronouns that cannot be both accusative and non-accusative \(\Omega - [F_\text{[+acc]} \cap F_\text{[-acc]}]\).

The first set is required to be in \(\mathcal{F}_\text{case}\) according to condition 4 of being a \(\sigma\)-algebra.4 The other three sets are required to be in \(\mathcal{F}_\text{case}\) according to condition 2 of being a \(\sigma\)-algebra.5

acc = frozenset({"me", "you", "them", "her", "him", "it", "us"})
nonacc = frozenset({"I", "you", "they", "she", "he", "it", "we"})

f_case = frozenset({
    frozenset(emptyset), 
    frozenset(acc), frozenset(nonacc),
    frozenset(acc & nonacc),
    frozenset(pronouns - acc),
    frozenset(pronouns - nonacc),
    frozenset(pronouns - (acc & nonacc)),
    frozenset(pronouns)
})

case_space = FiniteMeasurableSpace(pronouns, f_case)
This pair is a finite measurable space.

Combining event spaces

Given two measurable spaces with the same sample space, such as \(\mathcal{F}_\text{person}\) and \(\mathcal{F}_\text{case}\), we might want to combine them to create a measurable space \(\mathcal{F}_\text{person-case}\) that contains events such as \(F_\text{[+third,+acc]}\).

Question

Can we define \(\mathcal{F}_\text{person-case} \equiv \mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\). If not, why not?

We cannot define \(\mathcal{F}_\text{person-case} \equiv \mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\). While Condition 1 above would be satisfied (that’s easy), we would be missing quite a few sets that Conditions 2-4 require. For instance, the third person accusative pronouns \(F_\text{[+third,+acc]} \equiv F_\text{[+third]} \cap F_\text{[+acc]}\) would not be an event.

try:
    person_space = FiniteMeasurableSpace(pronouns, f_person.union(f_case))
except ValueError as e:
    print(f"ValueError: {e}")
ValueError: The σ-algebra must be closed under countable union. frozenset({'he', 'me', 'we', 'I', 'they', 'it', 'she', 'you', 'us'}) is a union of events [frozenset({'they', 'he', 'it', 'she', 'you', 'we', 'I'}), frozenset({'me', 'we', 'I', 'you', 'us'})] but not an event.

This point demonstrates an important fact about \(\sigma\)-algebras: if you design a classification based on some (countable) set of features like person and case, the constraint that \(\mathcal{F}\) be a \(\sigma\)-algebra on \(\Omega\) implies that \(\mathcal{F}\) contains events corresponding to all possible conjunctions (e.g. third and accusative) and disjunctions (e.g. third and/or accusative) of those features. So we need to extend \(\mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\) with additional sets. We call this extension the \(\sigma\)-algebra generated by the family of sets \(\mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\), denoted \(\sigma\left(\mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\right)\).

def generate_sigma_algebra(family: SigmaAlgebra) -> SigmaAlgebra:
    """Generate a σ-algebra from a family of sets
    
    Parameters
    ----------
    family
        The family of sets from which to generate the σ-algebra
    """

    sigma_algebra = set(family)
    old_sigma_algebra = set(family)
    
    complete = False

    while not complete:
        for subsets in powerset(old_sigma_algebra):
            subsets = list(subsets)

            if not subsets:
                continue

            union = reduce(frozenset.union, subsets)
            sigma_algebra.add(union)

            intersection = reduce(frozenset.intersection, subsets)
            sigma_algebra.add(intersection)

        complete = sigma_algebra == old_sigma_algebra
        old_sigma_algebra = set(sigma_algebra)

    return frozenset(sigma_algebra)

One challenge is that generating this \(\sigma\)-algebra for even relatively small families of sets can take a non-trivial amount of time. So for the remainder of this review, I’m going to cheat a bit and artificially distinguish the pronouns whose accusative and non-accusative variants are the same.

pronouns = frozenset({
    "I", "me", 
    "you_nonacc", "you_acc", 
    "they", "them", 
    "it_nonacc", "it_acc", 
    "she", "her", 
    "he", "him", 
    "we", "us",
})

This move allows us to define the event space more simply.

acc = frozenset({"me", "you_acc", "them", "her", "him", "it_acc", "us"})
nonacc = frozenset({"I", "you_nonacc", "they", "she", "he", "it_nonacc", "we"})

f_case = frozenset({
    frozenset(emptyset), 
    frozenset(acc), frozenset(nonacc),
    frozenset(pronouns)
})

case_space = FiniteMeasurableSpace(pronouns, f_case)
This pair is a finite measurable space.

To ensure that the person and case spaces have the same sample space, we will similarly need to redefine the person space.

third = frozenset({"they", "them", "it_acc", "it_nonacc", "she", "her", "he", "him"})
nonthird = pronouns - third

f_person = frozenset({
    frozenset(emptyset), 
    frozenset(third), frozenset(nonthird), 
    frozenset(pronouns)
})

person_space = FiniteMeasurableSpace(pronouns, f_person)
This pair is a finite measurable space.

Finally, we can generate the \(\sigma\)-algebra for our person-case space and check that it’s valid.

f_person_case = generate_sigma_algebra(f_person | f_case)

person_case_space = FiniteMeasurableSpace(pronouns, f_person_case)
This pair is a finite measurable space.

Considerations around defining event spaces

This way of setting up sample spaces is useful when we have strong a priori assumptions we want to inject into our probability models. We’ll see cases of this assumption injection as we move through the course. In many cases, however, we want an event space that makes fewer assumptions. So when the sample space is finite–as it is here–we’ll often just default to \(\mathcal{F} \equiv 2^\Omega\), which is the “finest” event space on \(\Omega\) we can muster–i.e. it is a superset of all other possible event spaces. This sort of event space, which is often referred to as the discrete event space on \(\Omega\), will tend to ignore potentially useful prior knowledge we have about the sample space–e.g. morphosyntactic features that pronouns have–though it is possible to represent that knowledge “in the measurement”, as we’ll see.

When the sample space is infinite, things get a bit trickier: the powerset is uncountable for even a countably infinite sample space–something that we need to consider in the context of working with strings and derivations.6 This property can be a problem for reasons I’ll gesture at when we discuss continuous probability distributions. So in general, we won’t work with event spaces that are power sets of their corresponding sample space in this context. We’ll instead work with what are called Borel \(\sigma\)-algebras. It’s not important to understand the intricacies of what a Borel \(\sigma\)-algebra is; I’ll try to give you an intuition below.

What it means to measure a possibility

I said that a probability is a measurement of a possibility. We’ve now formalized what a possibility is in this context. Now let’s turn to the measurement part.

The Kolmogorov axioms build the notion of a probability measure from the more general concept of a measure. All a probability measure \(\mathbb{P}\) is going to do is to map from some event in the event space (e.g. third pronoun, accusative pronoun, etc.) to a non-negative real value–with values corresponding to higher probabilities. So it is a function \(\mathbb{P}: \mathcal{F} \rightarrow \mathbb{R}_+\). This condition is the first of the Kolmogorov axioms.

  1. \(\mathbb{P}: \mathcal{F} \rightarrow \mathbb{R}_+\)

You might be used to thinking of probabilities as being between \([0, 1]\). This property is a consequence of the two other axioms:

  1. The probability of the entire sample space \(\mathbb{P}(\Omega) = 1\) (the assumption of unit measure)
  2. Given a countable collection of events \(E_1, E_2, \ldots \in \mathcal{F}\) that is pairwise disjoint–i.e. \(E_i \cap E_j = \emptyset\) for all \(i \neq j\)\(\mathbb{P}\left(\bigcup_i E_i\right) = \sum_i \mathbb{P}(E_i)\) (the assumption of \(\sigma\)-additivity)
from typing import Dict

class ProbabilityMeasure:
    """A probability measure with finite support

    Parameters
    ----------
    domain
        The domain of the probability measure
    measure
        The graph of the measure
    """

    def __init__(self, domain: FiniteMeasurableSpace, measure: Dict[Event, float]):
        self._domain = domain
        self._measure = measure

        self._validate()

    def __call__(self, event: Event) -> float:
        return self._measure[event]

    def _validate(self):
        # check that the measure covers the domain
        for event in self._domain.sigma_algebra:
            if event not in self._measure:
                raise ValueError(
                    "Probability measure must be defined for all events."
                )

        # check the assumption of unit measure
        if self._measure[frozenset(self._domain.atoms)] != 1:
            raise ValueError(
                "The probability of the sample space must be 1."
            )

        # check assumption of 𝜎-additivity
        for events in powerset(self._domain.sigma_algebra):
            events = list(events)

            if not events:
                continue

            if not any(e1.intersection(e2) for e1, e2 in combinations(events, 2)):
                prob_union = self._measure[reduce(frozenset.union, events)]
                prob_sum = sum(self._measure[e] for e in events)

            if round(prob_union, 4) != round(prob_sum, 4):
                raise ValueError(
                    "The measure does not satisfy 𝜎-additivity."
                )
                
        print("This probability measure is valid for the given measurable space.")

One example of a probability measure for our measurable space \(\langle \Omega, \mathcal{F}_\text{person-case}\rangle\) is the uniform measure: \(\mathbb{P}(E) = \frac{|E|}{|\Omega|}\).

measure_person_case = ProbabilityMeasure(
    person_case_space,
    {e: len(e)/len(person_case_space.atoms) 
     for e in person_case_space.sigma_algebra} 
)
This probability measure is valid for the given measurable space.

These axioms imply that the range of \(\mathbb{P}\) is \([0, 1]\), even if its codomain is \(\mathbb{R}_+\); otherwise, it would have to be the case that \(\mathbb{P}(E) > 1\) for some \(E \subset \Omega\). (\(E\) would have to be a strict subset of \(\Omega\), since \(\Omega \supseteq E\) for all \(E \in \mathcal{F}\) and \(\mathbb{P}(\Omega) = 1\) by definition.) But \(\mathbb{P}(E) > 1\) cannot hold, since \(\mathbb{P}(\Omega - E)\)–which must be defined, given that \(\mathcal{F}\) is closed under complementation–is nonnegative; and thus \(\mathbb{P}(E) + \mathbb{P}(\Omega - E) > \mathbb{P}(\Omega) = 1\) contradicts the third axiom \(\mathbb{P}(E) + \mathbb{P}(\Omega - E) = \mathbb{P}(E \cup [\Omega - E]) = \mathbb{P}(\Omega) = 1\).

(One reason the codomain of \(\mathbb{P}\) is often specified as the more general \(\mathbb{R}_+\)–rather than \([0, 1]\) is to make salient the fact that probabilities are analogous to other kinds of measurements, like weight, height, temperature, etc.)

These axioms also imply that \(\mathbb{P}(\emptyset) = 0\), since \(\mathbb{P}(\Omega) = \mathbb{P}(\Omega \cup \emptyset) = \mathbb{P}(\Omega) + \mathbb{P}(\emptyset) = 1\), and so \(\mathbb{P}(\emptyset) = 1 - \mathbb{P}(\Omega) = 0\).

Summing up

We will formalize a probability space as a triple \(\langle \Omega, \mathcal{F}, \mathbb{P} \rangle\) with:

  1. A set \(\Omega\) (the sample space)
  2. A \(\sigma\)-algebra \(\mathcal{F}\) (the event space), where:
    1. \(\mathcal{F} \subseteq 2^\Omega\)
    2. \(E \in \mathcal{F}\) iff \(\Omega - E \in \mathcal{F}\) (closure under complement)
    3. \(\bigcup \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable union)
    4. \(\bigcap \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable intersection)
  3. A probability measure \(\mathbb{P}\), where:
    1. \(\mathbb{P}: \mathcal{F} \rightarrow \mathbb{R}_+\)
    2. The probability of the entire sample space \(\mathbb{P}(\Omega) = 1\) (the assumption of unit measure)
    3. Given a countable collection of events \(E_1, E_2, \ldots \in \mathcal{F}\) that is pairwise disjoint–i.e. \(E_i \cap E_j = \emptyset\) for all \(i \neq j\)\(\mathbb{P}\left(\bigcup_i E_i\right) = \sum_i \mathbb{P}(E_i)\) (the assumption of \(\sigma\)-additivity)

It is this core that we build on in developing probabilistic models. To develop these models, it is useful to develop a few additional definitions and theorems.

Mutual exclusivity

Two events \(A \in \mathcal{F}\) and \(B \in \mathcal{F}\) are mutually exclusive if they are disjoint: \(A \cap B = \emptyset\). This implies that \(\mathbb{P}(A \cap B) = \mathbb{P}(\emptyset) = 0\) for all mutually exclusive events \(A\) and \(B\).

class ProbabilityMeasure(ProbabilityMeasure):

    def are_mutually_exclusive(self, *events: Iterable[Event]):
        self._validate_events(events)
        return not any(e1.intersection(e2) for e1, e2 in combinations(events, 2))

    def _validate_events(self, events: Iterable[Event]):
        for i, event in enumerate(events):
            if event not in self._domain.sigma_algebra:
                raise ValueError(f"event{i} is not in the event space.")

In our running example, the set of third-person pronouns \(F_\text{[+third]}\) and the set of non-third person pronouns \(F_\text{[-third]}\) are mutually exclusive events because \(F_\text{[+third]} \cap F_\text{[-third]} = \emptyset\).

measure_person_case = ProbabilityMeasure(
    person_case_space,
    {e: len(e)/len(person_case_space.atoms) 
     for e in person_case_space.sigma_algebra} 
)

measure_person_case.are_mutually_exclusive(third, nonthird)
True

Joint probability

The joint probability \(\mathbb{P}(A, B)\) of two events \(A \in \mathcal{F}\) and \(B \in \mathcal{F}\) is defined as the probability of the intersection of those two events \(\mathbb{P}(A, B) = \mathbb{P}(A \cap B)\), which must be defined given that \(\mathcal{F}\) is closed under countable intersection.

from typing import List

class ProbabilityMeasure(ProbabilityMeasure):

    def __call__(self, *events: Iterable[Event]) -> float:
        self._validate_events(events)

        intersection = reduce(frozenset.intersection, events)

        return self._measure[intersection]

In our running example, the probability of a third-person accusative pronoun is the joint probability \(\mathbb{P}\left(F_\text{[+third]}, F_\text{[+acc]}\right)\).

measure_person_case = ProbabilityMeasure(
    person_case_space,
    {e: len(e)/len(person_case_space.atoms) 
     for e in person_case_space.sigma_algebra} 
)

measure_person_case(frozenset(third), frozenset(acc))
0.2857142857142857

Conditional probability

The probability of an event \(A \in \mathcal{F}\) conditioned on (or given) an event \(B \in \mathcal{F}\) is defined as \(\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A, B)}{\mathbb{P}(B)}\). Note that \(\mathbb{P}(A \mid B)\) is undefined if \(\mathbb{P}(B) = 0\).

class ProbabilityMeasure(ProbabilityMeasure):

    def __or__(self, conditions: Iterable[Event]) -> ProbabilityMeasure:
        condition = reduce(frozenset.intersection, conditions)

        self._validate_condition(condition)

        measure = {
            event: self(event, condition)/self(condition) 
            for event in self._domain.sigma_algebra
        }

        return ProbabilityMeasure(self._domain, measure)

    def _validate_condition(self, condition: Event):
        if condition not in self._domain.sigma_algebra:
            raise ValueError("The conditions must be in the event space.")

        if self._measure[condition] == 0:
            raise ZeroDivisionError("Conditions cannot have probability 0.")

In our running example, the probability that a pronoun is third-person given that it is accusative is the conditional probability \(\mathbb{P}\left(F_\text{[+third]} \mid F_\text{[+acc]}\right) = \frac{\mathbb{P}\left(F_\text{[+third]}, F_\text{[+acc]}\right)}{\mathbb{P}\left(F_\text{[+acc]}\right)}\).

person_case_measure = {
    event: len(event)/len(person_case_space.atoms) 
    for event in person_case_space.sigma_algebra
}

measure_person_case = ProbabilityMeasure(
    person_case_space,
    person_case_measure 
)

measure_given_back = measure_person_case | [acc]

measure_given_back(third)
0.5714285714285714

From this definition, it immediately follows that \(\mathbb{P}(A, B) = \mathbb{P}(A \mid B)\mathbb{P}(B) = \mathbb{P}(B \mid A)\mathbb{P}(A)\), which in turn implies Bayes’ theorem.

\[\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A, B)}{\mathbb{P}(B)} = \frac{\mathbb{P}(B \mid A)\mathbb{P}(A)}{\mathbb{P}(B)}\]

Bayes’ theorem will be very important in this course.

Another important consequence of the definition of conditional probability is the chain rule:

\[\begin{align*}\mathbb{P}(E_1, E_2, E_3, \ldots, E_N) &= \mathbb{P}(E_1)\mathbb{P}(E_2 \mid E_1)\mathbb{P}(E_3 \mid E_1, E_2)\ldots\mathbb{P}(E_N \mid E_1, E_2, \ldots, E_{N-1})\\ &= \mathbb{P}(E_1)\prod_{i=2}^N \mathbb{P}(E_i\mid E_1, \ldots, E_{i-1})\end{align*}\]

The chain rule will also be very important in this course.

Independence

An event \(A \in \mathcal{F}\) is independent of an event \(B \in \mathcal{F}\) (under \(\mathbb{P}\)) if \(\mathbb{P}(A \mid B) = \mathbb{P}(A)\). A theorem that immediately follows from this definition is that \(A\) and \(B\) are independent under \(\mathbb{P}\) if and only if \(\mathbb{P}(A, B) = \mathbb{P}(A \mid B)\mathbb{P}(B) = \mathbb{P}(A)\mathbb{P}(B)\).

class ProbabilityMeasure(ProbabilityMeasure):

    def are_independent(self, *events):
        self._validate_events(events)

        joint = self(*events)
        product = reduce(lambda x, y: x * y, [self(e) for e in events])

        return joint == product

In our running example of an event space structured by person and case, assuming all pronouns are equiprobable, none of the events are independent. In the discrete event space, many events will be independent.

measure_person_case = ProbabilityMeasure(
    person_case_space,
    {e: len(e)/len(person_case_space.atoms) 
     for e in person_case_space.sigma_algebra} 
)

measure_person_case.are_independent(frozenset(third), frozenset(acc))
True

Note that independence is not the same as mutual exclusivity; indeed, mutually exclusive events are not independent, since \(\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A, B)}{\mathbb{P}(B)} = \frac{0}{\mathbb{P}(B)} = 0\) (or is undefined if \(\mathbb{P}(B) = 0\)) regardless of \(\mathbb{P}(A)\), and therefore either \(\mathbb{P}(A \mid B)\) does not equal \(\mathbb{P}(A)\) or \(\mathbb{P}(B \mid A)\) is undefined (because \(\mathbb{P}(A) = 0\)), even when \(\mathbb{P}(B)\) is.

Footnotes

  1. What it means for a quantity to be a probability is a surprisingly contentious topic. It’s an interesting topic–and I encourage you to read about the various possibilities–but for the purposes of this course, we will tend to think of probabilities as a quantification of a degree of belief. This interpretation is sometimes referred to as the subjective or Bayesian interpretation.↩︎

  2. If you’ve taken a phonetics course, you know that this definition overgenerates possibilities, since the values that the first and second formats can take on are constrained by the structure of the human vocal tract.↩︎

  3. Don’t ask me why, but \(\mathcal{F}\) is standard notation for the event space. Why we don’t use \(\mathcal{E}\) is beyond me. It might be some convention from measure theory I’m not aware of; or it might have to do with not confusing the event space with the expectation \(\mathbb{E}\), which we’ll review below.↩︎

  4. The analogous set \(F_\text{[+third]} \cap F_\text{[-third]}\) for \(\mathcal{F}_\text{person}\) is already accounted for, since \(F_\text{[+third]}\) and \(F_\text{[-third]}\) are disjoint and thus \(F_\text{[+third]} \cap F_\text{[-third]} = \emptyset\), which is in \(\mathcal{F}_\text{person}\).↩︎

  5. Condition 4 of being a \(\sigma\)-algebra requires \(F_\text{[+acc]} \cup F_\text{[-acc]} \in \mathcal{F}_\text{person}\) (among other unions), but we do not need to explicitly say this, since \(F_\text{[+acc]} \cup F_\text{[-acc]} = \Omega\), which is already specified to be in \(\mathcal{F}_\text{case}\).↩︎

  6. Remember that \(2^{\Sigma^*}\) is the set of all languages on \(\Sigma\); and the set of all languages, even when \(\Sigma\) is finite, is uncountable.↩︎