= frozenset()
emptyset = frozenset({
pronouns "I", "me",
"you",
"they", "them",
"it",
"she", "her",
"he", "him",
"we", "us",
})
What is a probability?
A probability is a measurement of a possibility (relative to a range of possibilities). Probability theory is a way of formalizing this idea. The most common such formalization–the Kolmogorov axioms–can be thought of as defining: (i) what it means to be a possibility; and (ii) what it means to measure a possibility.1
What it means to be a possibility
The Kolmogorov axioms start by specifying a set \(\Omega\) that contains all and only the things that can possibly happen. This set is known as the sample space. So what it means to be a possibility is a brute fact: it’s all and only the things in \(\Omega\).
That’s very abstract, so let’s consider a few examples relevant to this class:
- \(\Omega\) could the set of all phonemes in a language (or some subset thereof)–e.g. the English vowels \(\Omega = \{\text{e, i, o, u, æ, ɑ, ɔ, ə, ɛ, ɪ, ʊ}\}\).
- \(\Omega\) could be the set of all pairs of first and second formants–represented as all pairs of positive real numbers \(\mathbb{R}_+^2\).2
- \(\Omega\) could be the set of all strings of phonemes in a language–e.g. if \(\Sigma\) is the set of phonemes, then \(\Omega = \Sigma^* = \bigcup_{i=0}^\infty \Sigma^i\).
- \(\Omega\) could be the set of all strings of morphemes in a language–e.g. if \(\Sigma\) is the set of morphemes, then \(\Omega = \Sigma^* = \bigcup_{i=0}^\infty \Sigma^i\).
- \(\Omega\) could be the set of all grammatical derivations for a grammar \(G\)–e.g. if \(G = \langle \Sigma, V, R, S \rangle\) (with \(R \subseteq V \times (V \cup \Sigma \cup \{\epsilon\})^+\)) is a context free grammar, then \(\Omega = \bigcup_{s \in L_G} P_G(s)\), where \(L_G\) is the language generated by \(G\) and \(P_G\) is a parser for \(G\).
The axioms then move forward by defining classes of possibilities \(F \subseteq \Omega\), which together form a classification of possibilities \(\mathcal{F} \subseteq 2^\Omega\). These classes of possibilities are known as events and the classification of possibilities is known as the event space. It is events, which can contain just a single possibility, that we measure the probability of.3
Two event spaces for (a subset of) English pronouns
The event space is where interesting linguistic structure enters the picture. Let’s look at a few examples of event spaces that assume that the sample space is the following set of pronouns of English: \(\Omega = \{\text{I}, \text{me}, \text{you}, \text{they}, \text{them}, \text{it}, \text{she}, \text{her}, \text{he}, \text{him}, \text{we}, \text{us}\}\).
The person event space
One possible event space distinguishes these pronouns with respect to third v. non-third: \(\mathcal{F}_\text{person} = \{F_\text{[+third]}, F_\text{[-third]}, \Omega, \emptyset\}\), with \(F_\text{[+third]} = \{\text{they}, \text{them}, \text{it}, \text{she}, \text{her}, \text{he}, \text{him}\}\) and \(F_\text{[-third]} = \Omega - F_\text{[+third]}\).
= frozenset({"they", "them", "it", "she", "her", "he", "him",})
third = pronouns - third
nonthird
= frozenset({
f_person frozenset(emptyset),
frozenset(third), frozenset(nonthird),
frozenset(pronouns)
})
You’ll notice that beyond having just the set of third v. non-third pronouns in the event space, we also have the entire set of pronouns \(\Omega\) itself alongside the empty set \(\emptyset\). The reasons for this are technical: to make certain aspects of the formalization of what it means to measure possibilities work out nicely, we need the event space \(\mathcal{F}\) to form what is known as a \(\sigma\)-algebra on the sample space \(\Omega\). All this means is that:
- \(\mathcal{F} \subseteq 2^\Omega\)
- \(E \in \mathcal{F}\) iff \(\Omega - E \in \mathcal{F}\) (closure under complement)
- \(\bigcup \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable union)
- \(\bigcap \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable intersection)
You can check that all of these conditions are satisfied for \(\mathcal{F}_\text{person}\) only if \(\Omega\) and \(\emptyset\) are both in \(\mathcal{F}\). When \(\mathcal{F} \subseteq 2^\Omega\) is a \(\sigma\)-algebra, the pair \(\langle \Omega, \mathcal{F} \rangle\) is referred to as a measurable space. When \(\Omega\) is finite–as it is here–we say that \(\langle \Omega, \mathcal{F} \rangle\) is more specifically a finite measurable space.
from typing import Set, FrozenSet, Iterable
from itertools import chain, combinations
from functools import reduce
= FrozenSet[str]
SampleSpace = FrozenSet[str]
Event = FrozenSet[Event]
SigmaAlgebra
def powerset(iterable: Iterable) -> Iterable:
"""The power set of a set
See https://docs.python.org/3/library/itertools.html#itertools-recipes
Parameters
----------
iterable
The set to take the power set of
"""
= list(iterable)
s return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
class FiniteMeasurableSpace:
"""A finite measurable space
Parameters
----------
atoms
The atoms of the space
sigma_algebra
The σ-algebra of the space
"""
def __init__(self, atoms: SampleSpace, sigma_algebra: SigmaAlgebra):
self._atoms = atoms
self._sigma_algebra = sigma_algebra
self._validate()
def _validate(self):
for subset in self._sigma_algebra:
# check powerset condition
if not subset <= self._atoms:
raise ValueError(
"All events must be a subset of the atoms. "
f"{set(subset)} is an event but not a subset."
)
# check closure under complement
if not (self._atoms - subset) in self._sigma_algebra:
raise ValueError(
"The σ-algebra must be closed under complements. "
f"{set(self._atoms - subset)} is the complement of {set(subset)}, "
"which is an event, but it is not an event."
)
for subsets in powerset(self._sigma_algebra):
= list(subsets)
subsets
# python doesn't like to reduce empty iterables
if not subsets:
continue
# check closure under finite union
= frozenset(reduce(frozenset.union, subsets))
union if union not in self._sigma_algebra:
raise ValueError(
"The σ-algebra must be closed under countable union. "
f"{union} is a union of events {subsets} but not an event."
)
# check closure under finite intersection
= frozenset(reduce(frozenset.intersection, subsets))
intersection if intersection not in self._sigma_algebra:
raise ValueError(
"The σ-algebra must be closed under finite intersection. "
f"{set(intersection)} is the intersection of events {subsets} but "
"not an event."
)
print("This pair is a finite measurable space.")
@property
def atoms(self) -> SampleSpace:
return self._atoms
@property
def sigma_algebra(self) -> SigmaAlgebra:
return self._sigma_algebra
The \(\sigma\)-algebra conditions are checked as part of initializing the implementation of FiniteMeasurableSpace
, and so we see that \(\langle \Omega, \mathcal{F}_\text{person}\rangle\) is a measurable space.
= FiniteMeasurableSpace(pronouns, f_person) person_space
This pair is a finite measurable space.
The case event space
Another possible event space that is slightly more interesting distinguishes pronouns with respect to case: \(\mathcal{F}_\text{case} = \{F_\text{[+acc]}, F_\text{[-acc]}, F_\text{[+acc]} \cap F_\text{[-acc]}, \Omega - F_\text{[+acc]}, \Omega - F_\text{[-acc]}, \Omega - [F_\text{[+acc]} \cap F_\text{[-acc]}], \Omega, \emptyset\}\), with \(F_\text{[+acc]} = \{\text{me}, \text{you}, \text{them}, \text{her}, \text{him}, \text{it}, \text{us}\}\) and \(F_\text{[-acc]} = \{\text{I}, \text{you}, \text{they}, \text{she}, \text{he}, \text{it}, \text{we}\}\). Beyond the set of pronouns \(\Omega\), the empty set \(\emptyset\), the set of accusative pronouns \(F_\text{[+acc]}\) and the set of non-accusative pronouns \(F_\text{[-acc]}\), we additionally need:
- The set of pronouns that can be either accusative or non-accusative \(F_\text{[+acc]} \cap F_\text{[-acc]} = \{\text{you}, \text{it}\}\).
- The set of non-accusatives that cannot be accusative \(\Omega - F_\text{[+acc]} = \{\text{I}, \text{they}, \text{he}, \text{she}, \text{we}\}\)
- The set of accusatives that cannot be non-accusative \(\Omega - F_\text{[-acc]} = \{{\text{me}, \text{them}, \text{her}, \text{us}, \text{him}}\}\)
- The set of pronouns that cannot be both accusative and non-accusative \(\Omega - [F_\text{[+acc]} \cap F_\text{[-acc]}]\).
The first set is required to be in \(\mathcal{F}_\text{case}\) according to condition 4 of being a \(\sigma\)-algebra.4 The other three sets are required to be in \(\mathcal{F}_\text{case}\) according to condition 2 of being a \(\sigma\)-algebra.5
= frozenset({"me", "you", "them", "her", "him", "it", "us"})
acc = frozenset({"I", "you", "they", "she", "he", "it", "we"})
nonacc
= frozenset({
f_case frozenset(emptyset),
frozenset(acc), frozenset(nonacc),
frozenset(acc & nonacc),
frozenset(pronouns - acc),
frozenset(pronouns - nonacc),
frozenset(pronouns - (acc & nonacc)),
frozenset(pronouns)
})
= FiniteMeasurableSpace(pronouns, f_case) case_space
This pair is a finite measurable space.
Combining event spaces
Given two measurable spaces with the same sample space, such as \(\mathcal{F}_\text{person}\) and \(\mathcal{F}_\text{case}\), we might want to combine them to create a measurable space \(\mathcal{F}_\text{person-case}\) that contains events such as \(F_\text{[+third,+acc]}\).
Can we define \(\mathcal{F}_\text{person-case} \equiv \mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\). If not, why not?
We cannot define \(\mathcal{F}_\text{person-case} \equiv \mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\). While Condition 1 above would be satisfied (that’s easy), we would be missing quite a few sets that Conditions 2-4 require. For instance, the third person accusative pronouns \(F_\text{[+third,+acc]} \equiv F_\text{[+third]} \cap F_\text{[+acc]}\) would not be an event.
try:
= FiniteMeasurableSpace(pronouns, f_person.union(f_case))
person_space except ValueError as e:
print(f"ValueError: {e}")
ValueError: The σ-algebra must be closed under countable union. frozenset({'he', 'me', 'we', 'I', 'they', 'it', 'she', 'you', 'us'}) is a union of events [frozenset({'they', 'he', 'it', 'she', 'you', 'we', 'I'}), frozenset({'me', 'we', 'I', 'you', 'us'})] but not an event.
This point demonstrates an important fact about \(\sigma\)-algebras: if you design a classification based on some (countable) set of features like person and case, the constraint that \(\mathcal{F}\) be a \(\sigma\)-algebra on \(\Omega\) implies that \(\mathcal{F}\) contains events corresponding to all possible conjunctions (e.g. third and accusative) and disjunctions (e.g. third and/or accusative) of those features. So we need to extend \(\mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\) with additional sets. We call this extension the \(\sigma\)-algebra generated by the family of sets \(\mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\), denoted \(\sigma\left(\mathcal{F}_\text{person} \cup \mathcal{F}_\text{case}\right)\).
def generate_sigma_algebra(family: SigmaAlgebra) -> SigmaAlgebra:
"""Generate a σ-algebra from a family of sets
Parameters
----------
family
The family of sets from which to generate the σ-algebra
"""
= set(family)
sigma_algebra = set(family)
old_sigma_algebra
= False
complete
while not complete:
for subsets in powerset(old_sigma_algebra):
= list(subsets)
subsets
if not subsets:
continue
= reduce(frozenset.union, subsets)
union
sigma_algebra.add(union)
= reduce(frozenset.intersection, subsets)
intersection
sigma_algebra.add(intersection)
= sigma_algebra == old_sigma_algebra
complete = set(sigma_algebra)
old_sigma_algebra
return frozenset(sigma_algebra)
One challenge is that generating this \(\sigma\)-algebra for even relatively small families of sets can take a non-trivial amount of time. So for the remainder of this review, I’m going to cheat a bit and artificially distinguish the pronouns whose accusative and non-accusative variants are the same.
= frozenset({
pronouns "I", "me",
"you_nonacc", "you_acc",
"they", "them",
"it_nonacc", "it_acc",
"she", "her",
"he", "him",
"we", "us",
})
This move allows us to define the event space more simply.
= frozenset({"me", "you_acc", "them", "her", "him", "it_acc", "us"})
acc = frozenset({"I", "you_nonacc", "they", "she", "he", "it_nonacc", "we"})
nonacc
= frozenset({
f_case frozenset(emptyset),
frozenset(acc), frozenset(nonacc),
frozenset(pronouns)
})
= FiniteMeasurableSpace(pronouns, f_case) case_space
This pair is a finite measurable space.
To ensure that the person and case spaces have the same sample space, we will similarly need to redefine the person space.
= frozenset({"they", "them", "it_acc", "it_nonacc", "she", "her", "he", "him"})
third = pronouns - third
nonthird
= frozenset({
f_person frozenset(emptyset),
frozenset(third), frozenset(nonthird),
frozenset(pronouns)
})
= FiniteMeasurableSpace(pronouns, f_person) person_space
This pair is a finite measurable space.
Finally, we can generate the \(\sigma\)-algebra for our person-case space and check that it’s valid.
= generate_sigma_algebra(f_person | f_case)
f_person_case
= FiniteMeasurableSpace(pronouns, f_person_case) person_case_space
This pair is a finite measurable space.
Considerations around defining event spaces
This way of setting up sample spaces is useful when we have strong a priori assumptions we want to inject into our probability models. We’ll see cases of this assumption injection as we move through the course. In many cases, however, we want an event space that makes fewer assumptions. So when the sample space is finite–as it is here–we’ll often just default to \(\mathcal{F} \equiv 2^\Omega\), which is the “finest” event space on \(\Omega\) we can muster–i.e. it is a superset of all other possible event spaces. This sort of event space, which is often referred to as the discrete event space on \(\Omega\), will tend to ignore potentially useful prior knowledge we have about the sample space–e.g. morphosyntactic features that pronouns have–though it is possible to represent that knowledge “in the measurement”, as we’ll see.
When the sample space is infinite, things get a bit trickier: the powerset is uncountable for even a countably infinite sample space–something that we need to consider in the context of working with strings and derivations.6 This property can be a problem for reasons I’ll gesture at when we discuss continuous probability distributions. So in general, we won’t work with event spaces that are power sets of their corresponding sample space in this context. We’ll instead work with what are called Borel \(\sigma\)-algebras. It’s not important to understand the intricacies of what a Borel \(\sigma\)-algebra is; I’ll try to give you an intuition below.
What it means to measure a possibility
I said that a probability is a measurement of a possibility. We’ve now formalized what a possibility is in this context. Now let’s turn to the measurement part.
The Kolmogorov axioms build the notion of a probability measure from the more general concept of a measure. All a probability measure \(\mathbb{P}\) is going to do is to map from some event in the event space (e.g. third pronoun, accusative pronoun, etc.) to a non-negative real value–with values corresponding to higher probabilities. So it is a function \(\mathbb{P}: \mathcal{F} \rightarrow \mathbb{R}_+\). This condition is the first of the Kolmogorov axioms.
- \(\mathbb{P}: \mathcal{F} \rightarrow \mathbb{R}_+\)
You might be used to thinking of probabilities as being between \([0, 1]\). This property is a consequence of the two other axioms:
- The probability of the entire sample space \(\mathbb{P}(\Omega) = 1\) (the assumption of unit measure)
- Given a countable collection of events \(E_1, E_2, \ldots \in \mathcal{F}\) that is pairwise disjoint–i.e. \(E_i \cap E_j = \emptyset\) for all \(i \neq j\)–\(\mathbb{P}\left(\bigcup_i E_i\right) = \sum_i \mathbb{P}(E_i)\) (the assumption of \(\sigma\)-additivity)
from typing import Dict
class ProbabilityMeasure:
"""A probability measure with finite support
Parameters
----------
domain
The domain of the probability measure
measure
The graph of the measure
"""
def __init__(self, domain: FiniteMeasurableSpace, measure: Dict[Event, float]):
self._domain = domain
self._measure = measure
self._validate()
def __call__(self, event: Event) -> float:
return self._measure[event]
def _validate(self):
# check that the measure covers the domain
for event in self._domain.sigma_algebra:
if event not in self._measure:
raise ValueError(
"Probability measure must be defined for all events."
)
# check the assumption of unit measure
if self._measure[frozenset(self._domain.atoms)] != 1:
raise ValueError(
"The probability of the sample space must be 1."
)
# check assumption of 𝜎-additivity
for events in powerset(self._domain.sigma_algebra):
= list(events)
events
if not events:
continue
if not any(e1.intersection(e2) for e1, e2 in combinations(events, 2)):
= self._measure[reduce(frozenset.union, events)]
prob_union = sum(self._measure[e] for e in events)
prob_sum
if round(prob_union, 4) != round(prob_sum, 4):
raise ValueError(
"The measure does not satisfy 𝜎-additivity."
)
print("This probability measure is valid for the given measurable space.")
One example of a probability measure for our measurable space \(\langle \Omega, \mathcal{F}_\text{person-case}\rangle\) is the uniform measure: \(\mathbb{P}(E) = \frac{|E|}{|\Omega|}\).
= ProbabilityMeasure(
measure_person_case
person_case_space,len(e)/len(person_case_space.atoms)
{e: for e in person_case_space.sigma_algebra}
)
This probability measure is valid for the given measurable space.
These axioms imply that the range of \(\mathbb{P}\) is \([0, 1]\), even if its codomain is \(\mathbb{R}_+\); otherwise, it would have to be the case that \(\mathbb{P}(E) > 1\) for some \(E \subset \Omega\). (\(E\) would have to be a strict subset of \(\Omega\), since \(\Omega \supseteq E\) for all \(E \in \mathcal{F}\) and \(\mathbb{P}(\Omega) = 1\) by definition.) But \(\mathbb{P}(E) > 1\) cannot hold, since \(\mathbb{P}(\Omega - E)\)–which must be defined, given that \(\mathcal{F}\) is closed under complementation–is nonnegative; and thus \(\mathbb{P}(E) + \mathbb{P}(\Omega - E) > \mathbb{P}(\Omega) = 1\) contradicts the third axiom \(\mathbb{P}(E) + \mathbb{P}(\Omega - E) = \mathbb{P}(E \cup [\Omega - E]) = \mathbb{P}(\Omega) = 1\).
(One reason the codomain of \(\mathbb{P}\) is often specified as the more general \(\mathbb{R}_+\)–rather than \([0, 1]\) is to make salient the fact that probabilities are analogous to other kinds of measurements, like weight, height, temperature, etc.)
These axioms also imply that \(\mathbb{P}(\emptyset) = 0\), since \(\mathbb{P}(\Omega) = \mathbb{P}(\Omega \cup \emptyset) = \mathbb{P}(\Omega) + \mathbb{P}(\emptyset) = 1\), and so \(\mathbb{P}(\emptyset) = 1 - \mathbb{P}(\Omega) = 0\).
Summing up
We will formalize a probability space as a triple \(\langle \Omega, \mathcal{F}, \mathbb{P} \rangle\) with:
- A set \(\Omega\) (the sample space)
- A \(\sigma\)-algebra \(\mathcal{F}\) (the event space), where:
- \(\mathcal{F} \subseteq 2^\Omega\)
- \(E \in \mathcal{F}\) iff \(\Omega - E \in \mathcal{F}\) (closure under complement)
- \(\bigcup \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable union)
- \(\bigcap \mathcal{E} \in \mathcal{F}\) for all countable \(\mathcal{E} \subseteq \mathcal{F}\) (closure under countable intersection)
- A probability measure \(\mathbb{P}\), where:
- \(\mathbb{P}: \mathcal{F} \rightarrow \mathbb{R}_+\)
- The probability of the entire sample space \(\mathbb{P}(\Omega) = 1\) (the assumption of unit measure)
- Given a countable collection of events \(E_1, E_2, \ldots \in \mathcal{F}\) that is pairwise disjoint–i.e. \(E_i \cap E_j = \emptyset\) for all \(i \neq j\)–\(\mathbb{P}\left(\bigcup_i E_i\right) = \sum_i \mathbb{P}(E_i)\) (the assumption of \(\sigma\)-additivity)
It is this core that we build on in developing probabilistic models. To develop these models, it is useful to develop a few additional definitions and theorems.
Mutual exclusivity
Two events \(A \in \mathcal{F}\) and \(B \in \mathcal{F}\) are mutually exclusive if they are disjoint: \(A \cap B = \emptyset\). This implies that \(\mathbb{P}(A \cap B) = \mathbb{P}(\emptyset) = 0\) for all mutually exclusive events \(A\) and \(B\).
class ProbabilityMeasure(ProbabilityMeasure):
def are_mutually_exclusive(self, *events: Iterable[Event]):
self._validate_events(events)
return not any(e1.intersection(e2) for e1, e2 in combinations(events, 2))
def _validate_events(self, events: Iterable[Event]):
for i, event in enumerate(events):
if event not in self._domain.sigma_algebra:
raise ValueError(f"event{i} is not in the event space.")
In our running example, the set of third-person pronouns \(F_\text{[+third]}\) and the set of non-third person pronouns \(F_\text{[-third]}\) are mutually exclusive events because \(F_\text{[+third]} \cap F_\text{[-third]} = \emptyset\).
= ProbabilityMeasure(
measure_person_case
person_case_space,len(e)/len(person_case_space.atoms)
{e: for e in person_case_space.sigma_algebra}
)
measure_person_case.are_mutually_exclusive(third, nonthird)
True
Joint probability
The joint probability \(\mathbb{P}(A, B)\) of two events \(A \in \mathcal{F}\) and \(B \in \mathcal{F}\) is defined as the probability of the intersection of those two events \(\mathbb{P}(A, B) = \mathbb{P}(A \cap B)\), which must be defined given that \(\mathcal{F}\) is closed under countable intersection.
from typing import List
class ProbabilityMeasure(ProbabilityMeasure):
def __call__(self, *events: Iterable[Event]) -> float:
self._validate_events(events)
= reduce(frozenset.intersection, events)
intersection
return self._measure[intersection]
In our running example, the probability of a third-person accusative pronoun is the joint probability \(\mathbb{P}\left(F_\text{[+third]}, F_\text{[+acc]}\right)\).
= ProbabilityMeasure(
measure_person_case
person_case_space,len(e)/len(person_case_space.atoms)
{e: for e in person_case_space.sigma_algebra}
)
frozenset(third), frozenset(acc)) measure_person_case(
0.2857142857142857
Conditional probability
The probability of an event \(A \in \mathcal{F}\) conditioned on (or given) an event \(B \in \mathcal{F}\) is defined as \(\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A, B)}{\mathbb{P}(B)}\). Note that \(\mathbb{P}(A \mid B)\) is undefined if \(\mathbb{P}(B) = 0\).
class ProbabilityMeasure(ProbabilityMeasure):
def __or__(self, conditions: Iterable[Event]) -> ProbabilityMeasure:
= reduce(frozenset.intersection, conditions)
condition
self._validate_condition(condition)
= {
measure self(event, condition)/self(condition)
event: for event in self._domain.sigma_algebra
}
return ProbabilityMeasure(self._domain, measure)
def _validate_condition(self, condition: Event):
if condition not in self._domain.sigma_algebra:
raise ValueError("The conditions must be in the event space.")
if self._measure[condition] == 0:
raise ZeroDivisionError("Conditions cannot have probability 0.")
In our running example, the probability that a pronoun is third-person given that it is accusative is the conditional probability \(\mathbb{P}\left(F_\text{[+third]} \mid F_\text{[+acc]}\right) = \frac{\mathbb{P}\left(F_\text{[+third]}, F_\text{[+acc]}\right)}{\mathbb{P}\left(F_\text{[+acc]}\right)}\).
= {
person_case_measure len(event)/len(person_case_space.atoms)
event: for event in person_case_space.sigma_algebra
}
= ProbabilityMeasure(
measure_person_case
person_case_space,
person_case_measure
)
= measure_person_case | [acc]
measure_given_back
measure_given_back(third)
0.5714285714285714
From this definition, it immediately follows that \(\mathbb{P}(A, B) = \mathbb{P}(A \mid B)\mathbb{P}(B) = \mathbb{P}(B \mid A)\mathbb{P}(A)\), which in turn implies Bayes’ theorem.
\[\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A, B)}{\mathbb{P}(B)} = \frac{\mathbb{P}(B \mid A)\mathbb{P}(A)}{\mathbb{P}(B)}\]
Bayes’ theorem will be very important in this course.
Another important consequence of the definition of conditional probability is the chain rule:
\[\begin{align*}\mathbb{P}(E_1, E_2, E_3, \ldots, E_N) &= \mathbb{P}(E_1)\mathbb{P}(E_2 \mid E_1)\mathbb{P}(E_3 \mid E_1, E_2)\ldots\mathbb{P}(E_N \mid E_1, E_2, \ldots, E_{N-1})\\ &= \mathbb{P}(E_1)\prod_{i=2}^N \mathbb{P}(E_i\mid E_1, \ldots, E_{i-1})\end{align*}\]
The chain rule will also be very important in this course.
Independence
An event \(A \in \mathcal{F}\) is independent of an event \(B \in \mathcal{F}\) (under \(\mathbb{P}\)) if \(\mathbb{P}(A \mid B) = \mathbb{P}(A)\). A theorem that immediately follows from this definition is that \(A\) and \(B\) are independent under \(\mathbb{P}\) if and only if \(\mathbb{P}(A, B) = \mathbb{P}(A \mid B)\mathbb{P}(B) = \mathbb{P}(A)\mathbb{P}(B)\).
class ProbabilityMeasure(ProbabilityMeasure):
def are_independent(self, *events):
self._validate_events(events)
= self(*events)
joint = reduce(lambda x, y: x * y, [self(e) for e in events])
product
return joint == product
In our running example of an event space structured by person and case, assuming all pronouns are equiprobable, none of the events are independent. In the discrete event space, many events will be independent.
= ProbabilityMeasure(
measure_person_case
person_case_space,len(e)/len(person_case_space.atoms)
{e: for e in person_case_space.sigma_algebra}
)
frozenset(third), frozenset(acc)) measure_person_case.are_independent(
True
Note that independence is not the same as mutual exclusivity; indeed, mutually exclusive events are not independent, since \(\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A, B)}{\mathbb{P}(B)} = \frac{0}{\mathbb{P}(B)} = 0\) (or is undefined if \(\mathbb{P}(B) = 0\)) regardless of \(\mathbb{P}(A)\), and therefore either \(\mathbb{P}(A \mid B)\) does not equal \(\mathbb{P}(A)\) or \(\mathbb{P}(B \mid A)\) is undefined (because \(\mathbb{P}(A) = 0\)), even when \(\mathbb{P}(B)\) is.
Footnotes
What it means for a quantity to be a probability is a surprisingly contentious topic. It’s an interesting topic–and I encourage you to read about the various possibilities–but for the purposes of this course, we will tend to think of probabilities as a quantification of a degree of belief. This interpretation is sometimes referred to as the subjective or Bayesian interpretation.↩︎
If you’ve taken a phonetics course, you know that this definition overgenerates possibilities, since the values that the first and second formats can take on are constrained by the structure of the human vocal tract.↩︎
Don’t ask me why, but \(\mathcal{F}\) is standard notation for the event space. Why we don’t use \(\mathcal{E}\) is beyond me. It might be some convention from measure theory I’m not aware of; or it might have to do with not confusing the event space with the expectation \(\mathbb{E}\), which we’ll review below.↩︎
The analogous set \(F_\text{[+third]} \cap F_\text{[-third]}\) for \(\mathcal{F}_\text{person}\) is already accounted for, since \(F_\text{[+third]}\) and \(F_\text{[-third]}\) are disjoint and thus \(F_\text{[+third]} \cap F_\text{[-third]} = \emptyset\), which is in \(\mathcal{F}_\text{person}\).↩︎
Condition 4 of being a \(\sigma\)-algebra requires \(F_\text{[+acc]} \cup F_\text{[-acc]} \in \mathcal{F}_\text{person}\) (among other unions), but we do not need to explicitly say this, since \(F_\text{[+acc]} \cup F_\text{[-acc]} = \Omega\), which is already specified to be in \(\mathcal{F}_\text{case}\).↩︎
Remember that \(2^{\Sigma^*}\) is the set of all languages on \(\Sigma\); and the set of all languages, even when \(\Sigma\) is finite, is uncountable.↩︎