diff --git a/sessions/statistical-bigram-model/statistical-bigram-model.ipynb b/sessions/statistical-bigram-model/statistical-bigram-model.ipynb
index 5180129720ac794f88949a7214c414c081b784e8..9081c11875e83ca6f1793dd18803cac868098e19 100644
--- a/sessions/statistical-bigram-model/statistical-bigram-model.ipynb
+++ b/sessions/statistical-bigram-model/statistical-bigram-model.ipynb
@@ -41,8 +41,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "with open('names.txt', encoding='utf-8') as fp:    \n",
-    "    names = [line.rstrip() for line in fp]"
+    "with open(\"names.txt\", encoding=\"utf-8\") as f:\n",
+    "    names = [line.rstrip() for line in f]"
    ]
   },
   {
@@ -86,9 +86,9 @@
    "id": "134a9c3b",
    "metadata": {},
    "source": [
-    "> **🤔 Problem 1: What’s in the data?**\n",
-    ">\n",
-    "> It is important to engage with the data you are working with. One way to do so is to ask questions. Is your own name included in the dataset? Is your name frequent or rare? Can you tell from a name whether the person with that name is male or female? Can you tell whether they are immigrants? What would be ethical and unethical uses of systems that *can* tell this?"
+    "#### 🎈 Task 1: What’s in the data?\n",
+    "\n",
+    "It is important to engage with the data you are working with. One way to do so is to ask questions. Is your own name included in the dataset? Is your name frequent or rare? Can you tell from a name whether the person with that name is male or female? Can you tell whether they are immigrants? What would be ethical and unethical uses of systems that *can* tell this?"
    ]
   },
   {
@@ -114,7 +114,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "char2idx = {'$': 0}\n",
+    "char2idx = {\"$\": 0}\n",
     "\n",
     "for name in names:\n",
     "    for char in name:\n",
@@ -137,7 +137,7 @@
    "source": [
     "> **🤯 What does this code do?**\n",
     ">\n",
-    "> Throughout the course, we often present code without detailed explanations – we assume you can follow along even so. If you need help understanding code, you can often get useful explanations from AI assistant services. For example, the following explanation of the above code was generated by ChatGPT&nbsp;3.5 (and lightly edited) as a reponse to the prompt *Explain the following code*.\n",
+    "> Throughout the course, we often present code without detailed explanations – we assume you can follow along even so. If you need help understanding code, you can often get useful explanations from AI assistant services such as [ChatGPT](https://chatgpt.com/) and [Copilot](https://copilot.microsoft.com/). For example, the following explanation of the above code was generated by ChatGPT&nbsp;3.5 (and lightly edited) as a reponse to the prompt *Explain the following code*.\n",
     ">\n",
     "> “The code begins by initializing a dictionary called `char2idx` with a special character `$` mapped to the index&nbsp;0. It then iterates through a collection of names, and further through each character in the name. Inside the nested loop, there is a conditional statement that checks if the character is already present in the dictionary. If it is not, the code assigns a unique index to that character. The index is determined by the current length of the dictionary, effectively assigning consecutive indices starting from&nbsp;1 (since `$` already occupies index&nbsp;0).”"
    ]
@@ -147,9 +147,9 @@
    "id": "54bfa279",
    "metadata": {},
    "source": [
-    "> **🤔 Problem 2: A look inside the vocabulary**\n",
-    ">\n",
-    "> Write code that prints the vocabulary for the names dataset. Would you have expected this vocabulary? Would you expect the same vocabulary for a list of English names? What would you expect regarding the frequency distribution of the characters?"
+    "#### 🎈 Task 2: A look inside the vocabulary\n",
+    "\n",
+    "Write code that prints the vocabulary for the names dataset. Would you have expected this vocabulary? Would you expect the same vocabulary for a list of English names? What would you expect regarding the frequency distribution of the characters?"
    ]
   },
   {
@@ -165,7 +165,7 @@
    "id": "3969b408",
    "metadata": {},
    "source": [
-    "As already mentioned, our model is based on pairs of contiguous characters. Such pairs are called **bigrams**. For example, the bigrams in the name *anna* are *an*, *nn*, and *na*. In addition to the standard characters, we also include the start and end marker `$`.\n",
+    "As already mentioned, our model is based on pairs of consecutive characters. Such pairs are called **bigrams**. For example, the bigrams in the name *anna* are *an*, *nn*, and *na*. In addition to the standard characters, we also include the start and end marker `$`.\n",
     "\n",
     "The code in the following cell generates all bigrams from a given iterable of names:"
    ]
@@ -179,7 +179,7 @@
    "source": [
     "def bigrams(names):\n",
     "    for name in names:\n",
-    "        for x, y in zip('$' + name, name + '$'):\n",
+    "        for x, y in zip(\"$\" + name, name + \"$\"):\n",
     "            yield x, y"
    ]
   },
@@ -255,7 +255,7 @@
    "source": [
     "Note that we represent matrices as *tensors* from the [PyTorch library](https://pytorch.org/). You will learn more about this library later in the course.\n",
     "\n",
-    "Now that we have the bigram counts, we are ready to define our bigram model. This model is essentially a conditional probability distribution over all possible next characters, given the immediately preceding character. Formally, using the notation introduced above, the model consists of probabilities of the form $P(c_j|c_i)$. To compute them, we divide each bigram count $M_{ij}$ by the sum along the row $M_{i:}$. We can accomplish this as follows:"
+    "Now that we have the bigram counts, we are ready to define our bigram model. This model is essentially a conditional probability distribution over all possible next characters, given the immediately preceding character. Formally, using the notation introduced above, the model consists of probabilities of the form $P(c_j\\,|\\,c_i)$, which quantify the probability of character&nbsp;$c_j$ given that the preceding character is&nbsp;$c_i$. To compute these probabilities, we divide each bigram count $M_{ij}$ by the sum along the row $M_{i:}$. We can accomplish this as follows:"
    ]
   },
   {
@@ -273,9 +273,9 @@
    "id": "bbe749cf",
    "metadata": {},
    "source": [
-    "> **🤔 Problem 3: Inspecting the model**\n",
-    ">\n",
-    "> In a name, which letter is most likely to come after the letter `a`? Which letter is most likely to start or end a name? Can you give an example of a name that is impossible according to our bigram model?"
+    "#### 🎈 Task 3: Inspecting the model\n",
+    "\n",
+    "In a name, which letter is most likely to come after the letter `a`? Which letter is most likely to start or end a name? Can you give an example of a name that is impossible according to our bigram model?"
    ]
   },
   {
@@ -306,9 +306,8 @@
     "\n",
     "# Generate 5 samples\n",
     "for _ in range(5):\n",
-    "\n",
     "    # We begin with the start-of-sequence marker\n",
-    "    generated = '$'\n",
+    "    generated = \"$\"\n",
     "\n",
     "    while True:\n",
     "        # Look up the integer index of the previous character\n",
@@ -324,7 +323,7 @@
     "        next_char = idx2char[next_idx]\n",
     "\n",
     "        # Break if the model generates the end-of-sequence marker\n",
-    "        if next_char == '$':\n",
+    "        if next_char == \"$\":\n",
     "            break\n",
     "\n",
     "        # Add the next character to the output\n",
@@ -347,9 +346,9 @@
    "id": "e0423a52",
    "metadata": {},
    "source": [
-    "> **🤔 Problem 4: Probability of a name**\n",
-    ">\n",
-    "> What is the probability for it to generate your name? What is the probability for our model to generate the single-letter “name” `s`?"
+    "#### 🎈 Task 4: Probability of a name\n",
+    "\n",
+    "What is the probability of our model generating your name? What is the probability of it generating the single-letter “name” `s`?"
    ]
   },
   {
@@ -365,7 +364,7 @@
    "id": "d49f4996",
    "metadata": {},
    "source": [
-    "Language models are commonly evaluated by computing their **perplexity**, which can be thought of as a measure of how “surprised” the model is when being exposed to a text. The larger the perplexity on a given text, the less likely it is that the model would have generated that text.\n",
+    "Language models are commonly evaluated by computing their **perplexity**, which can be thought of as a measure of how “perplexed” the model is when being exposed to a text. The larger the perplexity on a given text, the less likely the model would have generated that text.\n",
     "\n",
     "To compute the perplexity of our bigram model on a reference text, we first compute the average negative log likelihood that the model assigns to a gold-standard next character after having seen the previous character. To get the perplexity, we then exponentiate the result."
    ]
@@ -384,13 +383,13 @@
     "for prev_char, next_char in bigrams(names):\n",
     "    prev_i = char2idx[prev_char]\n",
     "    prev_j = char2idx[next_char]\n",
-    "    nlls.append(- math.log(model[prev_i][prev_j]))\n",
+    "    nlls.append(-math.log(model[prev_i][prev_j]))\n",
     "\n",
-    "# Compute the average\n",
-    "avg_nll = sum(nlls) / len(nlls)\n",
+    "# Compute the perplexity\n",
+    "ppl = math.exp(sum(nlls) / len(nlls))\n",
     "\n",
-    "# Print as perplexity\n",
-    "print(f'{math.exp(avg_nll):.1f}')"
+    "# Print the perplexity\n",
+    "print(f\"{ppl:.1f}\")"
    ]
   },
   {
@@ -398,9 +397,9 @@
    "id": "d3eb3029",
    "metadata": {},
    "source": [
-    "> **🤔 Problem 5: Upper bound on perplexity**\n",
-    ">\n",
-    "> The perplexity of our bigram model can be read as the average number of characters the model must choose from when trying to “guess” the held-out text. The lowest possible perplexity value is&nbsp;1, which corresponds to the case where each next character is completely certain and no guessing is necessary. What is a reasonable *upper bound* on the perplexity of our bigram model?"
+    "#### 🎈 Task 5: Upper bound on perplexity\n",
+    "\n",
+    "The perplexity of our bigram model can be interpreted as the average number of characters the model must choose from when trying to “guess” the held-out text. The lowest possible perplexity value is&nbsp;1, which corresponds to the case where each next character is completely certain and no guessing is necessary. What is a reasonable *upper bound* on the perplexity of our bigram model?"
    ]
   },
   {
@@ -414,7 +413,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },