Quantifiers

Note that ., the character ranges, and the escape characters match only a single character, and so to match more than one, we need more than one of whichever we are interested in matching.

Load IPA representation of CMU Pronouncing Dictionary

with open("cmudict-ipa") as f:
    entries: list[tuple[str, str]] = [
        l.strip().split(",") for l in f
    ]
    entries: dict[str, list[str]] = {
        w: ipa.split() for w, ipa in entries
    }

import re

regex_dot_stɹ_dot_kʃən = '.bstɹ.kʃən'
regex_dot_dot_tɹ_dot_kʃən = '..stɹ.kʃən'

print(regex_dot_stɹ_dot_kʃən, "matches:")
print()

for w, ipa in entries.items():    
    if re.fullmatch(regex_dot_stɹ_dot_kʃən, "".join(ipa)):
        print(regex_dot_stɹ_dot_kʃən, "matches", "".join(ipa), f"({w})")

print()

print(regex_dot_dot_tɹ_dot_kʃən, "matches:")
print()

for w, ipa in entries.items():
    if re.fullmatch(regex_dot_dot_tɹ_dot_kʃən, "".join(ipa)):
        print("".join(ipa), f"({w})")

.bstɹ.kʃən matches:

.bstɹ.kʃən matches æbstɹækʃən (abstraction)
.bstɹ.kʃən matches əbstɹəkʃən (obstruction)

..stɹ.kʃən matches:

æbstɹækʃən (abstraction)
dɪstɹəkʃən (destruction)
dɪstɹækʃən (distraction)
ɛkstɹækʃən (extraction)
ɪnstɹəkʃən (instruction)
əbstɹəkʃən (obstruction)
ɹistɹɪkʃən (restriction)

We can avoid writing out multiple by using a quantifier. There are a few different quantifiers. For instance, if you have an exact number in mind:

regex_dot2_tɹ_dot_kʃən = '.{2}stɹ.kʃən'

for w, ipa in entries.items():
    if re.fullmatch(regex_dot2_tɹ_dot_kʃən, "".join(ipa)):
        print("".join(ipa), f"({w})")

æbstɹækʃən (abstraction)
dɪstɹəkʃən (destruction)
dɪstɹækʃən (distraction)
ɛkstɹækʃən (extraction)
ɪnstɹəkʃən (instruction)
əbstɹəkʃən (obstruction)
ɹistɹɪkʃən (restriction)

Or if you had a range of numbers in mind:

regex_dot2_tɹæk_dot13 = '.{2}stɹək.{1,3}'

for w, ipa in entries.items():
    if re.fullmatch(regex_dot2_tɹæk_dot13, "".join(ipa)):
        print("".join(ipa), f"({w})")

dɪstɹəkt (destruct)
dɪstɹəktɪd (destructed)
dɪstɹəktɪŋ (destructing)
dɪstɹəkʃən (destruction)
dɪstɹəktɪv (destructive)
dɪstɹəkts (destructs)
ɛkstɹəkeɪt (extricate)
ɪnstɹəkt (instruct)
ɪnstɹəktəd (instructed)
ɪnstɹəktɪd (instructed(1))
ɪnstɹəktɪŋ (instructing)
ɪnstɹəkʃən (instruction)
ɪnstɹəktɪv (instructive)
ɪnstɹəktɝ (instructor)
ɪnstɹəktɝz (instructors)
ɪnstɹəkts (instructs)
əbstɹəkt (obstruct)
əbstɹəktɪd (obstructed)
əbstɹəktɪŋ (obstructing)
əbstɹəkʃən (obstruction)
əbstɹəktɪv (obstructive)
əbstɹəkts (obstructs)
ɹistɹəktʃɝ (restructure)
ənstɹəkʃɝd (unstructured)

You can also leave off one bound:

regex_dot2_tɹæk_dot03 = '.{2}stɹək.{,3}'
regex_dot2_tɹæk_dot1inf = '.{2}stɹək.{1,}'

print(regex_dot2_tɹæk_dot03, "matches:")
print()

n_matches = 0

for w, ipa in entries.items():
    if re.fullmatch(regex_dot2_tɹæk_dot03, "".join(ipa)):
        if n_matches < 10:
            n_matches += 1
            print("".join(ipa), f"({w})")
        else:
            print("...")
            break

print()
print(regex_dot2_tɹæk_dot1inf, "matches:")
print()

n_matches = 0

for w, ipa in entries.items():
    if re.fullmatch(regex_dot2_tɹæk_dot1inf, "".join(ipa)):
        if n_matches < 10:
            n_matches += 1
            print("".join(ipa), f"({w})")
        else:
            print("...")
            break

.{2}stɹək.{,3} matches:

dɪstɹəkt (destruct)
dɪstɹəktɪd (destructed)
dɪstɹəktɪŋ (destructing)
dɪstɹəkʃən (destruction)
dɪstɹəktɪv (destructive)
dɪstɹəkts (destructs)
ɛkstɹəkeɪt (extricate)
ɪnstɹəkt (instruct)
ɪnstɹəktəd (instructed)
ɪnstɹəktɪd (instructed(1))
...

.{2}stɹək.{1,} matches:

dɪstɹəkt (destruct)
dɪstɹəktəbəl (destructable)
dɪstɹəktɪd (destructed)
dɪstɹəktɪŋ (destructing)
dɪstɹəkʃən (destruction)
dɪstɹəktɪv (destructive)
dɪstɹəktɪvnɪs (destructiveness)
dɪstɹəkts (destructs)
ɛkstɹəkɝɪkjəlɝ (extracurricular)
ɛkstɹəkeɪt (extricate)
...

Note that {,} is equivalent to *. There is also a special quantifier symbol for {1,}: +

And if you wanted at least one character ot come after Aaron, but didn’t care after that you could use +.

regex_dot2_tɹæk_dotplus = '.{2}stɹək.+'

n_matches = 0

for w, ipa in entries.items():
    if re.fullmatch(regex_dot2_tɹæk_dotplus, "".join(ipa)):
        if n_matches < 10:
            n_matches += 1
            print("".join(ipa), f"({w})")
        else:
            print("...")
            break

dɪstɹəkt (destruct)
dɪstɹəktəbəl (destructable)
dɪstɹəktɪd (destructed)
dɪstɹəktɪŋ (destructing)
dɪstɹəkʃən (destruction)
dɪstɹəktɪv (destructive)
dɪstɹəktɪvnɪs (destructiveness)
dɪstɹəkts (destructs)
ɛkstɹəkɝɪkjəlɝ (extracurricular)
ɛkstɹəkeɪt (extricate)
...

Note that none of these quantifiers increase the expressive power of the regular expressions. We can always write their equivalents as a vanilla regular expression (in the sense of the formal definition we gave above); it would just be tedious in many cases.

Set complement

For any of these cases where we escape a lowercase alphabetic character to get a character set, the set complement can generally be gotten with by the uppercase version—e.g. \w goes to \W.

regex_notw_bstɹækt = '\Wbstɹəkt'

(re.fullmatch(regex_notw_bstɹækt, "".join(entries["obstruct"])),
 re.fullmatch(regex_notw_bstɹækt, '\n'+"".join(entries["obstruct"])[1:]))

(None, <re.Match object; span=(0, 8), match='\nbstɹəkt'>)

Sometimes you want the complement of a set that doesn’t have an associated escaped alphabetic character. For that you can use the same square bracket set notation but put a ^ after the first bracket.

regex_notæ_bstɹ_notæ_kt = '[^æ][^b]stɹ[^æ]kt'

for w, ipa in entries.items():    
    if re.fullmatch(regex_notæ_bstɹ_notæ_kt, "".join(ipa)):
        print("".join(ipa), f"({w})")

dɪstɹəkt (destruct)
dɪstɹɪkt (district)
ɪnstɹəkt (instruct)
mɑstɹɪkt (maastricht)
ɹistɹɪkt (restrict)

The placement of this ^ is really important, since it only has the negation interpretation directly after [.