Installation

The source files for this site are available on GitHub at aaronstevenwhite/intro-to-cl. Clone the repo to get started.

git clone https://github.com/aaronstevenwhite/intro-to-cl.git
cd intro-to-cl

All further commands on this page assume you are inside this directory.

Quarto and extensions

The site is built using Quarto. You will need to install Quarto and then install the extensions the site depends on:

quarto add quarto-ext/include-code-files
quarto add shafayetShafee/line-highlight
quarto add pandoc-ext/diagram

The include-code-files and line-highlight extensions are used for including and highlighting parts of external files; diagram is used for rendering diagrams.

Docker

All pages with executed code blocks are generated from Jupyter notebooks run inside a Docker container. The Dockerfile in the repo specifies the environment:

FROM quay.io/jupyter/minimal-notebook:python-3.13

RUN pip install --no-cache-dir \
    numpy \
    scipy \
    pandas \
    matplotlib \
    scikit-learn \
    nltk \
    pyparsing \
    hmmlearn \
    sklearn-crfsuite \
    rdflib \
    stanza

# pynini and arcweight require conda-forge
RUN conda install -c conda-forge pynini && conda clean -afy

Assuming you have Docker installed, build the image and start a container with:

docker build --platform linux/amd64 -t intro-to-cl .
docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work intro-to-cl

The container will print a URL with an access token. Copy and paste it into your browser to open JupyterLab. The URL will look something like:

http://127.0.0.1:8888/lab?token=8fc165776e7e99c98ec19883f750071a187e85a0a9253b81

You can change the host port by modifying the first number in -p—e.g. -p 10000:8888 forwards the container’s port 8888 to your machine’s port 10000, so you would access http://127.0.0.1:10000/lab?token=... instead.