You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by "liferoad (via GitHub)" <gi...@apache.org> on 2023/04/08 02:10:26 UTC

[GitHub] [beam] liferoad opened a new pull request, #26185: add one windowing example

liferoad opened a new pull request, #26185:
URL: https://github.com/apache/beam/pull/26185

   Add one end-to-end example about how to use Beam windowing for AQI analysis
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1181804168


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. \n",
+        "In other words, the input is a finite, bounded data set. Examples include payroll and billing systems, which have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events \n",
+        "within a short period of time or near real-time on an infinite, unbounded data set. \n",
+        "Examples include fraud detection or intrusion detection, which requires the continuous processing of transaction data.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. \n",
+        "To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "W_63UtsoBRql"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "## Load the Data\n",
+        "\n",
+        "Let's import the example data we will be using throughout this tutorial. The [dataset](https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india) consists of **hourly air quality data (PM 2.5) in India from November 2017 to June 2022**.\n",
+        "\n",
+        "> The World Health Organization (WHO) reports 7 million premature deaths linked to air pollution each year. In India alone, more than 90% of the country's population live in areas where air quality is below the WHO's standards.\n",
+        "\n",
+        "**What does the data look like?**\n",
+        "\n",
+        "The data set has 36,192 rows and 6 columns in total recording the following attributes:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `Year`: Year of the measure\n",
+        "3.   `Month`: Month of the measure\n",
+        "4.   `Day`: Day of the measure\n",
+        "5.   `Hour`: Hour of the measure\n",
+        "6.   `PM2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "**For our purposes, we will focus on only the first and last column of the** `air_quality` **DataFrame**:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `PM 2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "Run the following cell to load the data into our file directory."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GTteBUZ-7e2s",
+        "outputId": "3af9cdb0-c248-4c6d-96f6-c3739fb66014"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Copying gs://batch-processing-example/air-quality-india.csv...\n",
+            "/ [1 files][  1.4 MiB/  1.4 MiB]                                                \n",
+            "Operation completed over 1 objects/1.4 MiB.                                      \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Copy the dataset file into the local file system from Google Cloud Storage.\n",
+        "!mkdir -p data\n",
+        "!gsutil cp gs://batch-processing-example/air-quality-india.csv data/"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1NcmPl7C43lY"
+      },
+      "source": [
+        "#### Data Preparation\n",
+        "\n",
+        "Before we load the data into a Beam pipeline, let's use Pandas to select two columns."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "dq-k7hwRf4MA",
+        "outputId": "7d70a959-5278-453e-9315-f5ed06821744"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-ca65f108-edf8-4e2b-8152-6df025dc7ff8\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th>Year</th>\n",
+              "      <th>Month</th>\n",
+              "      <th>Day</th>\n",
+              "      <th>Hour</th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>2017-11-07 12:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>12</td>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2017-11-07 13:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>13</td>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>2017-11-07 14:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>14</td>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>2017-11-07 15:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>15</td>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>2017-11-07 16:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>16</td>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "             Timestamp  Year  Month  Day  Hour   PM2.5\n",
+              "0  2017-11-07 12:00:00  2017     11    7    12   64.51\n",
+              "1  2017-11-07 13:00:00  2017     11    7    13   69.95\n",
+              "2  2017-11-07 14:00:00  2017     11    7    14   92.79\n",
+              "3  2017-11-07 15:00:00  2017     11    7    15  109.66\n",
+              "4  2017-11-07 16:00:00  2017     11    7    16  116.50"
+            ]
+          },
+          "execution_count": 4,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Load the data into a Python Pandas DataFrame.\n",
+        "import pandas as pd\n",
+        "\n",
+        "air_quality = pd.read_csv(\"data/air-quality-india.csv\")\n",
+        "air_quality.head()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 237
+        },
+        "id": "WNXrvP-wDIkA",
+        "outputId": "3e932987-41b3-4aaf-b49f-3707a9728322"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-db4299d8-d441-4fc1-af7d-8d1c199492ed\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th></th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 12:00:00</th>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 13:00:00</th>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 14:00:00</th>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 15:00:00</th>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 16:00:00</th>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "                      PM2.5\n",
+              "Timestamp                  \n",
+              "2017-11-07 12:00:00   64.51\n",
+              "2017-11-07 13:00:00   69.95\n",
+              "2017-11-07 14:00:00   92.79\n",
+              "2017-11-07 15:00:00  109.66\n",
+              "2017-11-07 16:00:00  116.50"
+            ]
+          },
+          "execution_count": 5,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "import csv\n",
+        "\n",
+        "#Select only the two features of the DataFrame we're interested in.\n",
+        "airq = air_quality.loc[:, [\"Timestamp\", \"PM2.5\"]].set_index(\"Timestamp\")\n",
+        "saved_new = pd.DataFrame(airq)\n",
+        "saved_new.to_csv(\"data/air_quality.csv\")\n",
+        "airq.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "VRFkb_sLDUCD"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 1. Average Air Quality Index (AQI)\n",
+        "\n",
+        "Before we explore more advanced batch processing with different types of windowing, we will start with a simple batch analysis example.\n",
+        "\n",
+        "Our **objective** is to analyze the *entire* dataset to find the **average PM2.5 air quality index** in India across the entire 11/2017-6/2022 period.\n",
+        "\n",
+        "> This examples uses the `GlobalWindow`, which is a single window that covers the entire PCollection. All pipelines use the [`GlobalWindow`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.GlobalWindow) by default. In many cases, especially for batch pipelines, this is what we want since we want to analyze all the data that we have."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 34
+        },
+        "id": "v06NFe9sDYXc",
+        "outputId": "f65eae63-0424-4ac0-8609-78e98ac21bd0"
+      },
+      "outputs": [
+        {
+          "data": {
+            "application/javascript": "\n        if (typeof window.interactive_beam_jquery == 'undefined') {\n          var jqueryScript = document.createElement('script');\n          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n          jqueryScript.type = 'text/javascript';\n          jqueryScript.onload = function() {\n            var datatableScript = document.createElement('script');\n            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n            datatableScript.type = 'text/javascript';\n            datatableScript.onload = function() {\n              window.interactive_beam_jquery = jQuery.noConflict(true);\n              window.interactive_beam_jquery(document).ready(function($){\n                \n              });\n            }\n            document.head.appendChild(datatableScript);\n          };\n          document.head.appendChild(jqueryScript);\n        } else {\n          window.interactive
 _beam_jquery(document).ready(function($){\n            \n          });\n        }"
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "49.308428658266905\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        " (\n",
+        "      pipeline\n",
+        "      | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "      | 'Parse file' >> beam.Map(parse_file)\n",
+        "      | 'Get PM' >> beam.Map(lambda x: float(x[1])) # only process PM2.5\n",
+        "      | 'Get mean value' >> beam.combiners.Mean.Globally()\n",
+        "      | beam.Map(print))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GmHEE1G5Y1z-",
+        "outputId": "248ee3d7-43af-4b53-9832-8da0eb7ac974"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "49.30842865826703"
+            ]
+          },
+          "execution_count": 7,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# To verify, the above mean value matches what Pandas produces\n",
+        "airq[\"PM2.5\"].mean()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "b3gGxC6w6qXx"
+      },
+      "source": [
+        "**Congratulations!** You just created a simple aggregation processing pipeline in batch using `GlobalWindow`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "vRameihqDJ8l"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 2. Advanced Processing in Batch with Windowing\n",
+        "\n",
+        "Sometimes, we want to [aggregate](https://beam.apache.org/documentation/transforms/python/overview/#aggregation) data, like `GroupByKey` or `Combine`, only at certain intervals, like hourly or daily, instead of processing the entire `PCollection` of data only once.\n",
+        "\n",
+        "In this case, our **objective** is to determine the **fluctuations of air quality *every 30 days*.\n",
+        "\n",
+        "**_Windows_** in Beam allow us to process only certain data intervals at a time.\n",
+        "In this notebook, we will go through different ways of windowing our pipeline.\n",
+        "\n",
+        "We have already been introduced to the default GlobalWindow (see above) that covers the entire PCollection. Now we will dive into **fixed time windows, sliding time windows, and session windows**.\n",
+        "\n",
+        "> [Another windowing tutorial](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb) with a toy dataset is recommended to go through."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gj0_S5Ka3-zb"
+      },
+      "source": [
+        "### First, we need to convert timestamps to Unix time\n",
+        "\n",
+        "Apache Beam requires us to provide the timestamp as Unix time. Let us write the simple time conversion function:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "metadata": {
+        "id": "nKBYsxFg4SIa"
+      },
+      "outputs": [],
+      "source": [
+        "import time\n",
+        "\n",
+        "# This function is modifiable and can convert integers to time formats like unix\n",
+        "# Without this function and .strptime, you may run into comparison issues!\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "_mPge0KdRx20",
+        "outputId": "43475bbe-548a-4817-ed0b-534cebbe70ce"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "1634220000"
+            ]
+          },
+          "execution_count": 9,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "to_unix_time('2021-10-14 14:00:00')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lL0_QONF1aMH"
+      },
+      "source": [
+        "### Second, let us define some helper functions to develop our pipeline\n",
+        "\n",
+        "In this code, we have a transform (`PrintWindowInfo`) to help us analyze an element alongside its window information, and we have another transform\n",
+        "(`PrintWindowInfo`) to help us analyze how many elements landed into each window.\n",
+        "We use a custom [`DoFn`](https://beam.apache.org/documentation/transforms/python/elementwise/pardo)\n",
+        "to access that information."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 10,
+      "metadata": {
+        "id": "KtPL-echb2xv"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Helper functions to develop our pipeline\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "class PrintElementInfo(beam.DoFn):\n",
+        "  \"\"\"Prints an element with its Window information for debugging.\"\"\"\n",
+        "  def process(self, element, timestamp=beam.DoFn.TimestampParam, window=beam.DoFn.WindowParam):\n",
+        "    print(f'[{human_readable_window(window)}] {timestamp.to_utc_datetime()} -- {element}')\n",
+        "    yield element\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def PrintWindowInfo(pcollection):\n",

Review Comment:
   I added `LogElements` as the note. The issue for `LogElements` is the printing format is fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] svetakvsundhar commented on a diff in pull request #26185: add one windowing example

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.

svetakvsundhar commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1174446324


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",

Review Comment:
   This is a bit of a general statement. Can we either remove this or specify that we will learn the fundamentals of windowing?



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",

Review Comment:
   Given that this is only focusing on Batch examples, we should specify that here, and not bring in stream processing in this statement.



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ],
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ],
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ],
+      "metadata": {
+        "id": "zmJ0pCmSvD0-",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "execution_count": 1,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ],
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."
+      ],
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "## Load the Data\n",
+        "\n",
+        "Let's import the example data we will be using throughout this tutorial. The [dataset](https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india) consists of **hourly air quality data (PM 2.5) in India from November 2017 to June 2022**.\n",
+        "\n",
+        "> The World Health Organization (WHO) reports 7 million premature deaths linked to air pollution each year. In India alone, more than 90% of the country's population live in areas where air quality is below the WHO's standards.\n",
+        "\n",
+        "**What does the data look like?**\n",
+        "\n",
+        "The data set has 36,192 rows and 6 columns in total recording the following attributes:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `Year`: Year of the measure\n",
+        "3.   `Month`: Month of the measure\n",
+        "4.   `Day`: Day of the measure\n",
+        "5.   `Hour`: Hour of the measure\n",
+        "6.   `PM2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "**For our purposes, we will focus on only the first and last column of the** `air_quality` **DataFrame**:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `PM 2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "Run the following cell to load the data into our file directory."
+      ],
+      "metadata": {
+        "id": "W_63UtsoBRql"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "id": "GTteBUZ-7e2s",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "3af9cdb0-c248-4c6d-96f6-c3739fb66014"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Copying gs://batch-processing-example/air-quality-india.csv...\n",
+            "/ [1 files][  1.4 MiB/  1.4 MiB]                                                \n",
+            "Operation completed over 1 objects/1.4 MiB.                                      \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Copy the dataset file into the local file system from Google Cloud Storage.\n",
+        "!mkdir -p data\n",
+        "!gsutil cp gs://batch-processing-example/air-quality-india.csv data/"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Data Preparation\n",
+        "\n",
+        "Before we load the data into a Beam pipeline, let's use Pandas to select two columns."
+      ],
+      "metadata": {
+        "id": "1NcmPl7C43lY"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load the data into a Python Pandas DataFrame.\n",
+        "import pandas as pd\n",
+        "\n",
+        "air_quality = pd.read_csv(\"data/air-quality-india.csv\")\n",
+        "air_quality.head()"
+      ],
+      "metadata": {
+        "id": "dq-k7hwRf4MA",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "outputId": "7d70a959-5278-453e-9315-f5ed06821744"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "             Timestamp  Year  Month  Day  Hour   PM2.5\n",
+              "0  2017-11-07 12:00:00  2017     11    7    12   64.51\n",
+              "1  2017-11-07 13:00:00  2017     11    7    13   69.95\n",
+              "2  2017-11-07 14:00:00  2017     11    7    14   92.79\n",
+              "3  2017-11-07 15:00:00  2017     11    7    15  109.66\n",
+              "4  2017-11-07 16:00:00  2017     11    7    16  116.50"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-ca65f108-edf8-4e2b-8152-6df025dc7ff8\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th>Year</th>\n",
+              "      <th>Month</th>\n",
+              "      <th>Day</th>\n",
+              "      <th>Hour</th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>2017-11-07 12:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>12</td>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2017-11-07 13:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>13</td>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>2017-11-07 14:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>14</td>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>2017-11-07 15:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>15</td>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>2017-11-07 16:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>16</td>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import csv\n",
+        "\n",
+        "#Select only the two features of the DataFrame we're interested in.\n",
+        "airq = air_quality.loc[:, [\"Timestamp\", \"PM2.5\"]].set_index(\"Timestamp\")\n",
+        "saved_new = pd.DataFrame(airq)\n",
+        "saved_new.to_csv(\"data/air_quality.csv\")\n",
+        "airq.head()"
+      ],
+      "metadata": {
+        "id": "WNXrvP-wDIkA",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 237
+        },
+        "outputId": "3e932987-41b3-4aaf-b49f-3707a9728322"
+      },
+      "execution_count": 5,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "                      PM2.5\n",
+              "Timestamp                  \n",
+              "2017-11-07 12:00:00   64.51\n",
+              "2017-11-07 13:00:00   69.95\n",
+              "2017-11-07 14:00:00   92.79\n",
+              "2017-11-07 15:00:00  109.66\n",
+              "2017-11-07 16:00:00  116.50"
+            ],
+            "text/html": [
+              "\n",
+              "  <div id=\"df-db4299d8-d441-4fc1-af7d-8d1c199492ed\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th></th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 12:00:00</th>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 13:00:00</th>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 14:00:00</th>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 15:00:00</th>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 16:00:00</th>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ]
+          },
+          "metadata": {},
+          "execution_count": 5
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 1. Average Air Quality Index (AQI)\n",
+        "\n",
+        "Before we explore more advanced batch processing with different types of windowing, we will start with a simple batch analysis example.\n",
+        "\n",
+        "Our **objective** is to analyze the *entire* dataset to find the **average PM2.5 air quality index** in India across the entire 11/2017-6/2022 period.\n",
+        "\n",
+        "> This examples uses the `GlobalWindow`, which is a single window that covers the entire PCollection. All pipelines use the [`GlobalWindow`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.GlobalWindow) by default. In many cases, especially for batch pipelines, this is what we want since we want to analyze all the data that we have."
+      ],
+      "metadata": {
+        "id": "VRFkb_sLDUCD"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import apache_beam as beam\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        " (\n",
+        "      pipeline\n",
+        "      | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "      | 'Parse file' >> beam.Map(parse_file)\n",
+        "      | 'Get PM' >> beam.Map(lambda x: float(x[1])) # only process PM2.5\n",
+        "      | 'Get mean value' >> beam.combiners.Mean.Globally()\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "v06NFe9sDYXc",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 34
+        },
+        "outputId": "f65eae63-0424-4ac0-8609-78e98ac21bd0"
+      },
+      "execution_count": 6,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "application/javascript": [
+              "\n",
+              "        if (typeof window.interactive_beam_jquery == 'undefined') {\n",
+              "          var jqueryScript = document.createElement('script');\n",
+              "          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n",
+              "          jqueryScript.type = 'text/javascript';\n",
+              "          jqueryScript.onload = function() {\n",
+              "            var datatableScript = document.createElement('script');\n",
+              "            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n",
+              "            datatableScript.type = 'text/javascript';\n",
+              "            datatableScript.onload = function() {\n",
+              "              window.interactive_beam_jquery = jQuery.noConflict(true);\n",
+              "              window.interactive_beam_jquery(document).ready(function($){\n",
+              "                \n",
+              "              });\n",
+              "            }\n",
+              "            document.head.appendChild(datatableScript);\n",
+              "          };\n",
+              "          document.head.appendChild(jqueryScript);\n",
+              "        } else {\n",
+              "          window.interactive_beam_jquery(document).ready(function($){\n",
+              "            \n",
+              "          });\n",
+              "        }"
+            ]
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "49.308428658266905\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# To verify, the above mean value matches what Pandas produces\n",
+        "airq[\"PM2.5\"].mean()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GmHEE1G5Y1z-",
+        "outputId": "248ee3d7-43af-4b53-9832-8da0eb7ac974"
+      },
+      "execution_count": 7,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "49.30842865826703"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 7
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "**Congratulations!** You just created a simple aggregation processing pipeline in batch using `GlobalWindow`."
+      ],
+      "metadata": {
+        "id": "b3gGxC6w6qXx"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 2. Advanced Processing in Batch with Windowing\n",
+        "\n",
+        "Sometimes, we want to [aggregate](https://beam.apache.org/documentation/transforms/python/overview/#aggregation) data, like `GroupByKey` or `Combine`, only at certain intervals, like hourly or daily, instead of processing the entire `PCollection` of data only once.\n",
+        "\n",
+        "In this case, our **objective** is to determine the **fluctuations of air quality *every 30 days*.\n",
+        "\n",
+        "**_Windows_** in Beam allow us to process only certain data intervals at a time.\n",
+        "In this notebook, we will go through different ways of windowing our pipeline.\n",
+        "\n",
+        "We have already been introduced to the default GlobalWindow (see above) that covers the entire PCollection. Now we will dive into **fixed time windows, sliding time windows, and session windows**.\n",
+        "\n",
+        "> [Another windowing tutorial](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb) with a toy dataset is recommended to go through."
+      ],
+      "metadata": {
+        "id": "vRameihqDJ8l"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### First, we need to convert timestamps to Unix time\n",
+        "\n",
+        "Apache Beam requires us to provide the timestamp as Unix time. Let us write the simple time conversion function:"
+      ],
+      "metadata": {
+        "id": "gj0_S5Ka3-zb"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import time\n",
+        "\n",
+        "# This function is modifiable and can convert integers to time formats like unix\n",
+        "# Without this function and .strptime, you may run into comparison issues!\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))"
+      ],
+      "metadata": {
+        "id": "nKBYsxFg4SIa"
+      },
+      "execution_count": 8,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "to_unix_time('2021-10-14 14:00:00')"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "_mPge0KdRx20",
+        "outputId": "43475bbe-548a-4817-ed0b-534cebbe70ce"
+      },
+      "execution_count": 9,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "1634220000"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 9
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Second, let us define some helper functions to develop our pipeline\n",
+        "\n",
+        "In this code, we have a transform (`PrintWindowInfo`) to help us analyze an element alongside its window information, and we have another transform\n",
+        "(`PrintWindowInfo`) to help us analyze how many elements landed into each window.\n",
+        "We use a custom [`DoFn`](https://beam.apache.org/documentation/transforms/python/elementwise/pardo)\n",
+        "to access that information."
+      ],
+      "metadata": {
+        "id": "lL0_QONF1aMH"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Helper functions to develop our pipeline\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "class PrintElementInfo(beam.DoFn):\n",
+        "  \"\"\"Prints an element with its Window information for debugging.\"\"\"\n",
+        "  def process(self, element, timestamp=beam.DoFn.TimestampParam, window=beam.DoFn.WindowParam):\n",
+        "    print(f'[{human_readable_window(window)}] {timestamp.to_utc_datetime()} -- {element}')\n",
+        "    yield element\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def PrintWindowInfo(pcollection):\n",
+        "  \"\"\"Prints the Window information with AQI in that window for debugging.\"\"\"\n",
+        "  class PrintAQI(beam.DoFn):\n",
+        "    def process(self, mean_elements, window=beam.DoFn.WindowParam):\n",
+        "      print(f'>> Window [{human_readable_window(window)}], AQI: {mean_elements}')\n",
+        "      yield mean_elements\n",
+        "\n",
+        "  return (\n",
+        "      pcollection\n",
+        "      | 'Count elements per window' >> beam.combiners.Mean.Globally().without_defaults()\n",
+        "      | 'Print counts info' >> beam.ParDo(PrintAQI())\n",
+        "  )"
+      ],
+      "metadata": {
+        "id": "KtPL-echb2xv"
+      },
+      "execution_count": 10,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Note: when you run below code, pay attention to how the human readable windows varies for each window type.\n",
+        "\n",
+        "To illustrate how windowing works, we will use the below toy data:"
+      ],
+      "metadata": {
+        "id": "dtQbcRU6XYCr"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# a toy data\n",
+        "air_toy_data = [\n",
+        "    ['2021-10-14 14:00:00', '43.27'],\n",
+        "    ['2021-10-14 15:00:00', '44.17'],\n",
+        "    ['2021-10-14 16:00:00', '48.77'],\n",
+        "    ['2021-10-14 17:00:00', '55.57'],\n",
+        "    ['2021-10-14 18:00:00', '56.95'],\n",
+        "    ['2021-10-21 09:00:00', '36.77'],\n",
+        "    ['2021-10-21 10:00:00', '34.87'],\n",
+        "    ['2021-11-17 01:00:00', '62.64'],\n",
+        "    ['2021-11-17 02:00:00', '65.28'],\n",
+        "    ['2021-11-17 03:00:00', '65.53'],\n",
+        "    ['2021-11-17 04:00:00', '70.18'],\n",
+        "    ['2021-12-11 21:00:00', '69.07'],\n",
+        "    ['2022-01-02 21:00:00', '76.56'],\n",
+        "    ['2022-01-02 22:00:00', '78.77'],\n",
+        "    ['2022-01-02 23:00:00', '73.16'],\n",
+        "    ['2022-01-03 03:00:00', '74.05'],\n",
+        "    ['2022-01-03 19:00:00', '100.28'],\n",
+        "    ['2022-01-03 22:00:00', '80.92'],\n",
+        "    ['2022-01-04 05:00:00', '80.48'],\n",
+        "    ['2022-01-04 07:00:00', '84.0'],\n",
+        "    ['2022-01-04 18:00:00', '95.49'],\n",
+        "    ['2022-01-05 00:00:00', '69.01'],\n",
+        "    ['2022-01-05 07:00:00', '76.85'],]"
+      ],
+      "metadata": {
+        "id": "oOLx4IsZXoTO"
+      },
+      "execution_count": 11,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Fixed time windows\n",
+        "\n",
+        "`FixedWindows` allow us to create fixed-sized windows with evenly spaced intervals.\n",
+        "We only need to specify the _window size_ in seconds.\n",
+        "\n",
+        "In Python, we can use [`timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)\n",
+        "to help us do the conversion of minutes, hours, or days for us.\n",
+        "\n",
+        "We then use the [`WindowInto`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html?highlight=windowinto#apache_beam.transforms.core.WindowInto)\n",
+        "transform to apply the kind of window we want."
+      ],
+      "metadata": {
+        "id": "_QoStuYV-uku"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import time\n",
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "\n",
+        "# We first set the window size to around 1 month.\n",
+        "window_size = timedelta(days=30).total_seconds()  # in seconds\n",
+        "\n",
+        "\n",
+        "# Let us set up the windowed pipeline and compute AQI every 30 days\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Fixed windows' >> beam.WindowInto(beam.window.FixedWindows(window_size))\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "Q7Q4yVzh8XWO",
+        "outputId": "076752f2-1b40-4d07-9419-88757adc99be"
+      },
+      "execution_count": 12,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-11-29 00:00:00 - 2021-12-29 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-09-30 00:00:00 - 2021-10-30 00:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-10-30 00:00:00 - 2021-11-29 00:00:00], AQI: 65.9075\n",
+            ">> Window [2021-11-29 00:00:00 - 2021-12-29 00:00:00], AQI: 69.07\n",
+            ">> Window [2021-12-29 00:00:00 - 2022-01-28 00:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Sliding time windows\n",
+        "\n",
+        "[`Sliding windows`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.SlidingWindows)\n",
+        "allow us to compute AQI every 30 days but each window should cover the last 15 days.\n",
+        "We can specify the _window size_ in seconds just like with `FixedWindows` to define the window size. We also need to specify a _window period_ in seconds, which is how often we want to emit each window."
+      ],
+      "metadata": {
+        "id": "N-bqgv5K_INW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "# Sliding windows of 30 days and emit one every 15 days.\n",
+        "window_size = timedelta(days=30).total_seconds()  # in seconds\n",
+        "window_period = timedelta(days=15).total_seconds()  # in seconds\n",
+        "print(f'window_size:   {window_size} seconds')\n",
+        "print(f'window_period: {window_period} seconds')\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | 'Air Quality' >> beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Sliding windows' >> beam.WindowInto(\n",
+        "          beam.window.SlidingWindows(window_size, window_period)\n",
+        "      )\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ],
+      "metadata": {
+        "id": "Lrj1mb6Q_LW7",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "4418b5bd-6dc0-44a3-ca1a-d231ec9af9e1"
+      },
+      "execution_count": 13,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "window_size:   2592000.0 seconds\n",
+            "window_period: 1296000.0 seconds\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-10-15 00:00:00 - 2021-11-14 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-10-15 00:00:00 - 2021-11-14 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-11-29 00:00:00 - 2021-12-29 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-09-30 00:00:00 - 2021-10-30 00:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-09-15 00:00:00 - 2021-10-15 00:00:00], AQI: 49.746\n",
+            ">> Window [2021-10-15 00:00:00 - 2021-11-14 00:00:00], AQI: 35.82\n",
+            ">> Window [2021-11-14 00:00:00 - 2021-12-14 00:00:00], AQI: 66.53999999999999\n",
+            ">> Window [2021-10-30 00:00:00 - 2021-11-29 00:00:00], AQI: 65.9075\n",
+            ">> Window [2021-11-29 00:00:00 - 2021-12-29 00:00:00], AQI: 69.07\n",
+            ">> Window [2021-12-29 00:00:00 - 2022-01-28 00:00:00], AQI: 80.86999999999999\n",
+            ">> Window [2021-12-14 00:00:00 - 2022-01-13 00:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Session time windows\n",
+        "\n",
+        "Maybe we don't want regular windows, but instead, have the windows reflect periods where activity happened. [`Sessions`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.Sessions)\n",
+        "allow us to create those windows.\n",
+        "We only need to specify a _gap size_ in seconds, which is the maximum number of seconds of inactivity to close a session window. In this case, if no event happens within 10 days, the current session windown closes and is emitted and a new session windown is created whenenver the next event happens."
+      ],
+      "metadata": {
+        "id": "Y_SbevXv_MDk"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "# Sessions divided by 10 days.\n",
+        "gap_size = timedelta(days=10).total_seconds()  # in seconds\n",
+        "print(f'gap_size: {gap_size} seconds')\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | 'Air Quality' >> beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Session windows' >> beam.WindowInto(beam.window.Sessions(gap_size))\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ],
+      "metadata": {
+        "id": "zgNc-c3THB7G",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "d8d5fed1-8748-4845-cbc3-0b898ce4bcd8"
+      },
+      "execution_count": 14,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "gap_size: 864000.0 seconds\n",
+            "[2021-10-14 14:00:00 - 2021-10-24 14:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-10-14 15:00:00 - 2021-10-24 15:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-10-14 16:00:00 - 2021-10-24 16:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-10-14 17:00:00 - 2021-10-24 17:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-10-14 18:00:00 - 2021-10-24 18:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-10-21 09:00:00 - 2021-10-31 09:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-10-21 10:00:00 - 2021-10-31 10:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-11-17 01:00:00 - 2021-11-27 01:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-11-17 02:00:00 - 2021-11-27 02:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-11-17 03:00:00 - 2021-11-27 03:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-11-17 04:00:00 - 2021-11-27 04:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-12-11 21:00:00 - 2021-12-21 21:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2022-01-02 21:00:00 - 2022-01-12 21:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2022-01-02 22:00:00 - 2022-01-12 22:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2022-01-02 23:00:00 - 2022-01-12 23:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2022-01-03 03:00:00 - 2022-01-13 03:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2022-01-03 19:00:00 - 2022-01-13 19:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2022-01-03 22:00:00 - 2022-01-13 22:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2022-01-04 05:00:00 - 2022-01-14 05:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2022-01-04 07:00:00 - 2022-01-14 07:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2022-01-04 18:00:00 - 2022-01-14 18:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2022-01-05 00:00:00 - 2022-01-15 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2022-01-05 07:00:00 - 2022-01-15 07:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-10-14 14:00:00 - 2021-10-31 10:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-11-17 01:00:00 - 2021-11-27 04:00:00], AQI: 65.9075\n",
+            ">> Window [2021-12-11 21:00:00 - 2021-12-21 21:00:00], AQI: 69.07\n",
+            ">> Window [2022-01-02 21:00:00 - 2022-01-15 07:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Note how the above windows are irregular."
+      ],
+      "metadata": {
+        "id": "XcX0t6hya85F"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 3. Put All Together\n",
+        "\n",
+        "Section 2 uses the toy data to go through three different windowing types. Now it is time to analyze the real data (`data/air_quality.csv`).\n",
+        "\n",
+        "Can you build one Beam pipeline to compute all AQI values for these windows?\n"
+      ],
+      "metadata": {
+        "id": "EBSlyeBibNL0"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Edit This Code Cell\n",
+        "..."
+      ],
+      "metadata": {
+        "id": "plMjLuh-lKzr"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title Expand to see the answer\n",
+        "import apache_beam as beam\n",
+        "import csv\n",
+        "import time\n",
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "import apache_beam.runners.interactive.interactive_beam as ib\n",
+        "\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def OutputWindowInfo(pcollection):\n",
+        "  \"\"\"Output the Window information with AQI in that window.\"\"\"\n",
+        "  class GetAQI(beam.DoFn):\n",
+        "    def process(self, mean_elements, window=beam.DoFn.WindowParam):\n",
+        "      yield human_readable_window(window), mean_elements\n",
+        "  return (\n",
+        "    pcollection\n",
+        "    | 'Count elements per window' >> beam.combiners.Mean.Globally().without_defaults()\n",
+        "    | 'Output counts info' >> beam.ParDo(GetAQI())\n",
+        "  )\n",
+        "\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "p =  beam.Pipeline()\n",
+        "\n",
+        "# get the data\n",
+        "read_text = (p | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "| 'Parse file' >> beam.Map(parse_file)\n",
+        "| 'With timestamps' >>\n",
+        "beam.MapTuple(lambda timestamp, element:beam.window.TimestampedValue(float(element), to_unix_time(timestamp)))\n",
+        ")\n",
+        "\n",
+        "# define the fixed window\n",
+        "window_size = timedelta(days=30).total_seconds()  # in seconds\n",
+        "fixed_window =  (read_text | 'Fixed windows' >> beam.WindowInto(beam.window.FixedWindows(window_size))\n",
+        "| 'Output fixed window info' >> OutputWindowInfo()\n",
+        "| 'Write fixed window info'\n",
+        ">> beam.io.WriteToText('output_fixed', file_name_suffix = \".txt\")\n",
+        ")\n",
+        "\n",
+        "# define the sliding window\n",
+        "window_period = timedelta(days=15).total_seconds()  # in seconds\n",
+        "sliding_window =  (read_text | 'Sliding windows' >> beam.WindowInto(\n",
+        "          beam.window.SlidingWindows(window_size, window_period))\n",
+        "| 'Output sliding window info' >> OutputWindowInfo()\n",
+        "| 'Write sliding window info'\n",
+        ">> beam.io.WriteToText('output_sliding', file_name_suffix = \".txt\")\n",
+        ")\n",
+        "\n",
+        "# define the session window\n",
+        "gap_size = timedelta(days=10).total_seconds()  # in seconds\n",
+        "session_window =  (read_text | 'Session windows' >>\n",
+        "          beam.WindowInto(beam.window.Sessions(gap_size))\n",
+        "| 'Output session window info' >> OutputWindowInfo()\n",
+        "| 'Write session window info'\n",
+        ">> beam.io.WriteToText('output_session', file_name_suffix = \".txt\")\n",
+        ")\n"
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "t6vBMax0bubN"
+      },
+      "execution_count": 15,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# check the entire graph\n",
+        "ib.show_graph(p)"

Review Comment:
   I like this visual :)



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",

Review Comment:
   ```suggestion
           " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
   ```



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ],
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ],
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ],
+      "metadata": {
+        "id": "zmJ0pCmSvD0-",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "execution_count": 1,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ],
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection.\n",

Review Comment:
   Why not stick to the example given in batch processing? Maybe a system that classifies every bank transaction as a legitimate charge vs a fraud charge? 



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ],
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ],
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ],
+      "metadata": {
+        "id": "zmJ0pCmSvD0-",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "execution_count": 1,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ],
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."
+      ],
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "## Load the Data\n",
+        "\n",
+        "Let's import the example data we will be using throughout this tutorial. The [dataset](https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india) consists of **hourly air quality data (PM 2.5) in India from November 2017 to June 2022**.\n",
+        "\n",
+        "> The World Health Organization (WHO) reports 7 million premature deaths linked to air pollution each year. In India alone, more than 90% of the country's population live in areas where air quality is below the WHO's standards.\n",
+        "\n",
+        "**What does the data look like?**\n",
+        "\n",
+        "The data set has 36,192 rows and 6 columns in total recording the following attributes:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `Year`: Year of the measure\n",
+        "3.   `Month`: Month of the measure\n",
+        "4.   `Day`: Day of the measure\n",
+        "5.   `Hour`: Hour of the measure\n",
+        "6.   `PM2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "**For our purposes, we will focus on only the first and last column of the** `air_quality` **DataFrame**:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `PM 2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "Run the following cell to load the data into our file directory."
+      ],
+      "metadata": {
+        "id": "W_63UtsoBRql"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "id": "GTteBUZ-7e2s",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "3af9cdb0-c248-4c6d-96f6-c3739fb66014"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Copying gs://batch-processing-example/air-quality-india.csv...\n",
+            "/ [1 files][  1.4 MiB/  1.4 MiB]                                                \n",
+            "Operation completed over 1 objects/1.4 MiB.                                      \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Copy the dataset file into the local file system from Google Cloud Storage.\n",
+        "!mkdir -p data\n",
+        "!gsutil cp gs://batch-processing-example/air-quality-india.csv data/"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Data Preparation\n",
+        "\n",
+        "Before we load the data into a Beam pipeline, let's use Pandas to select two columns."
+      ],
+      "metadata": {
+        "id": "1NcmPl7C43lY"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load the data into a Python Pandas DataFrame.\n",

Review Comment:
   Nice, but maybe we should use the Beam Dataframe as an example? :P 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] svetakvsundhar commented on a diff in pull request #26185: add one windowing example

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.

svetakvsundhar commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1176900891


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1832 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",

Review Comment:
   ```suggestion
           "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. Examples include payroll and billing systems that have to be processed weekly or monthly.\n",
   ```



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1832 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection that need to continuously process transaction data.\n",

Review Comment:
   ```suggestion
           "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. Examples include fraud detection or intrusion detection, that require the continuous processing of transaction data.\n",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn merged pull request #26185: add one windowing example

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.

tvalentyn merged PR #26185:
URL: https://github.com/apache/beam/pull/26185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1180558205


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1832 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection that need to continuously process transaction data.\n",

Review Comment:
   Done. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1175800247


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ],
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ],
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ],
+      "metadata": {
+        "id": "zmJ0pCmSvD0-",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "execution_count": 1,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ],
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection.\n",

Review Comment:
   Check my above comment. I think here is more for users who do not have any data processing background.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1180558011


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1832 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",

Review Comment:
   Done. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on PR #26185:
URL: https://github.com/apache/beam/pull/26185#issuecomment-1500764003

   R: @tvalentyn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #26185: add one windowing example

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.

tvalentyn commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1180718062


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",

Review Comment:
   nit: you could clean the outputs



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. \n",
+        "In other words, the input is a finite, bounded data set. Examples include payroll and billing systems, which have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events \n",
+        "within a short period of time or near real-time on an infinite, unbounded data set. \n",
+        "Examples include fraud detection or intrusion detection, which requires the continuous processing of transaction data.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. \n",
+        "To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."

Review Comment:
   ```suggestion
           "To learn more about stream processing in Beam, check out the [Python Streaming](https://beam.apache.org/documentation/sdks/python-streaming/) page."
   ```



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. \n",
+        "In other words, the input is a finite, bounded data set. Examples include payroll and billing systems, which have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events \n",
+        "within a short period of time or near real-time on an infinite, unbounded data set. \n",
+        "Examples include fraud detection or intrusion detection, which requires the continuous processing of transaction data.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. \n",
+        "To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "W_63UtsoBRql"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "## Load the Data\n",
+        "\n",
+        "Let's import the example data we will be using throughout this tutorial. The [dataset](https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india) consists of **hourly air quality data (PM 2.5) in India from November 2017 to June 2022**.\n",
+        "\n",
+        "> The World Health Organization (WHO) reports 7 million premature deaths linked to air pollution each year. In India alone, more than 90% of the country's population live in areas where air quality is below the WHO's standards.\n",
+        "\n",
+        "**What does the data look like?**\n",
+        "\n",
+        "The data set has 36,192 rows and 6 columns in total recording the following attributes:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `Year`: Year of the measure\n",
+        "3.   `Month`: Month of the measure\n",
+        "4.   `Day`: Day of the measure\n",
+        "5.   `Hour`: Hour of the measure\n",
+        "6.   `PM2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "**For our purposes, we will focus on only the first and last column of the** `air_quality` **DataFrame**:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `PM 2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "Run the following cell to load the data into our file directory."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GTteBUZ-7e2s",
+        "outputId": "3af9cdb0-c248-4c6d-96f6-c3739fb66014"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Copying gs://batch-processing-example/air-quality-india.csv...\n",
+            "/ [1 files][  1.4 MiB/  1.4 MiB]                                                \n",
+            "Operation completed over 1 objects/1.4 MiB.                                      \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Copy the dataset file into the local file system from Google Cloud Storage.\n",
+        "!mkdir -p data\n",
+        "!gsutil cp gs://batch-processing-example/air-quality-india.csv data/"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1NcmPl7C43lY"
+      },
+      "source": [
+        "#### Data Preparation\n",
+        "\n",
+        "Before we load the data into a Beam pipeline, let's use Pandas to select two columns."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "dq-k7hwRf4MA",
+        "outputId": "7d70a959-5278-453e-9315-f5ed06821744"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-ca65f108-edf8-4e2b-8152-6df025dc7ff8\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th>Year</th>\n",
+              "      <th>Month</th>\n",
+              "      <th>Day</th>\n",
+              "      <th>Hour</th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>2017-11-07 12:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>12</td>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2017-11-07 13:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>13</td>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>2017-11-07 14:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>14</td>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>2017-11-07 15:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>15</td>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>2017-11-07 16:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>16</td>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "             Timestamp  Year  Month  Day  Hour   PM2.5\n",
+              "0  2017-11-07 12:00:00  2017     11    7    12   64.51\n",
+              "1  2017-11-07 13:00:00  2017     11    7    13   69.95\n",
+              "2  2017-11-07 14:00:00  2017     11    7    14   92.79\n",
+              "3  2017-11-07 15:00:00  2017     11    7    15  109.66\n",
+              "4  2017-11-07 16:00:00  2017     11    7    16  116.50"
+            ]
+          },
+          "execution_count": 4,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Load the data into a Python Pandas DataFrame.\n",
+        "import pandas as pd\n",
+        "\n",
+        "air_quality = pd.read_csv(\"data/air-quality-india.csv\")\n",
+        "air_quality.head()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 237
+        },
+        "id": "WNXrvP-wDIkA",
+        "outputId": "3e932987-41b3-4aaf-b49f-3707a9728322"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-db4299d8-d441-4fc1-af7d-8d1c199492ed\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th></th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 12:00:00</th>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 13:00:00</th>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 14:00:00</th>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 15:00:00</th>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 16:00:00</th>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "                      PM2.5\n",
+              "Timestamp                  \n",
+              "2017-11-07 12:00:00   64.51\n",
+              "2017-11-07 13:00:00   69.95\n",
+              "2017-11-07 14:00:00   92.79\n",
+              "2017-11-07 15:00:00  109.66\n",
+              "2017-11-07 16:00:00  116.50"
+            ]
+          },
+          "execution_count": 5,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "import csv\n",
+        "\n",
+        "#Select only the two features of the DataFrame we're interested in.\n",
+        "airq = air_quality.loc[:, [\"Timestamp\", \"PM2.5\"]].set_index(\"Timestamp\")\n",
+        "saved_new = pd.DataFrame(airq)\n",
+        "saved_new.to_csv(\"data/air_quality.csv\")\n",
+        "airq.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "VRFkb_sLDUCD"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 1. Average Air Quality Index (AQI)\n",
+        "\n",
+        "Before we explore more advanced batch processing with different types of windowing, we will start with a simple batch analysis example.\n",
+        "\n",
+        "Our **objective** is to analyze the *entire* dataset to find the **average PM2.5 air quality index** in India across the entire 11/2017-6/2022 period.\n",
+        "\n",
+        "> This examples uses the `GlobalWindow`, which is a single window that covers the entire PCollection. All pipelines use the [`GlobalWindow`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.GlobalWindow) by default. In many cases, especially for batch pipelines, this is what we want since we want to analyze all the data that we have."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 34
+        },
+        "id": "v06NFe9sDYXc",
+        "outputId": "f65eae63-0424-4ac0-8609-78e98ac21bd0"
+      },
+      "outputs": [
+        {
+          "data": {
+            "application/javascript": "\n        if (typeof window.interactive_beam_jquery == 'undefined') {\n          var jqueryScript = document.createElement('script');\n          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n          jqueryScript.type = 'text/javascript';\n          jqueryScript.onload = function() {\n            var datatableScript = document.createElement('script');\n            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n            datatableScript.type = 'text/javascript';\n            datatableScript.onload = function() {\n              window.interactive_beam_jquery = jQuery.noConflict(true);\n              window.interactive_beam_jquery(document).ready(function($){\n                \n              });\n            }\n            document.head.appendChild(datatableScript);\n          };\n          document.head.appendChild(jqueryScript);\n        } else {\n          window.interactive
 _beam_jquery(document).ready(function($){\n            \n          });\n        }"
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "49.308428658266905\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        " (\n",
+        "      pipeline\n",
+        "      | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "      | 'Parse file' >> beam.Map(parse_file)\n",
+        "      | 'Get PM' >> beam.Map(lambda x: float(x[1])) # only process PM2.5\n",
+        "      | 'Get mean value' >> beam.combiners.Mean.Globally()\n",
+        "      | beam.Map(print))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GmHEE1G5Y1z-",
+        "outputId": "248ee3d7-43af-4b53-9832-8da0eb7ac974"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "49.30842865826703"
+            ]
+          },
+          "execution_count": 7,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# To verify, the above mean value matches what Pandas produces\n",
+        "airq[\"PM2.5\"].mean()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "b3gGxC6w6qXx"
+      },
+      "source": [
+        "**Congratulations!** You just created a simple aggregation processing pipeline in batch using `GlobalWindow`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "vRameihqDJ8l"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 2. Advanced Processing in Batch with Windowing\n",
+        "\n",
+        "Sometimes, we want to [aggregate](https://beam.apache.org/documentation/transforms/python/overview/#aggregation) data, like `GroupByKey` or `Combine`, only at certain intervals, like hourly or daily, instead of processing the entire `PCollection` of data only once.\n",
+        "\n",
+        "In this case, our **objective** is to determine the **fluctuations of air quality *every 30 days*.\n",
+        "\n",
+        "**_Windows_** in Beam allow us to process only certain data intervals at a time.\n",
+        "In this notebook, we will go through different ways of windowing our pipeline.\n",
+        "\n",
+        "We have already been introduced to the default GlobalWindow (see above) that covers the entire PCollection. Now we will dive into **fixed time windows, sliding time windows, and session windows**.\n",
+        "\n",
+        "> [Another windowing tutorial](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb) with a toy dataset is recommended to go through."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gj0_S5Ka3-zb"
+      },
+      "source": [
+        "### First, we need to convert timestamps to Unix time\n",
+        "\n",
+        "Apache Beam requires us to provide the timestamp as Unix time. Let us write the simple time conversion function:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "metadata": {
+        "id": "nKBYsxFg4SIa"
+      },
+      "outputs": [],
+      "source": [
+        "import time\n",
+        "\n",
+        "# This function is modifiable and can convert integers to time formats like unix\n",
+        "# Without this function and .strptime, you may run into comparison issues!\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "_mPge0KdRx20",
+        "outputId": "43475bbe-548a-4817-ed0b-534cebbe70ce"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "1634220000"
+            ]
+          },
+          "execution_count": 9,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "to_unix_time('2021-10-14 14:00:00')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lL0_QONF1aMH"
+      },
+      "source": [
+        "### Second, let us define some helper functions to develop our pipeline\n",
+        "\n",
+        "In this code, we have a transform (`PrintWindowInfo`) to help us analyze an element alongside its window information, and we have another transform\n",
+        "(`PrintWindowInfo`) to help us analyze how many elements landed into each window.\n",
+        "We use a custom [`DoFn`](https://beam.apache.org/documentation/transforms/python/elementwise/pardo)\n",
+        "to access that information."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 10,
+      "metadata": {
+        "id": "KtPL-echb2xv"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Helper functions to develop our pipeline\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "class PrintElementInfo(beam.DoFn):\n",
+        "  \"\"\"Prints an element with its Window information for debugging.\"\"\"\n",
+        "  def process(self, element, timestamp=beam.DoFn.TimestampParam, window=beam.DoFn.WindowParam):\n",
+        "    print(f'[{human_readable_window(window)}] {timestamp.to_utc_datetime()} -- {element}')\n",
+        "    yield element\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def PrintWindowInfo(pcollection):\n",

Review Comment:
   beam.LogElements is supposed to be able to print elements and windwing info, but might have some rough edges.. cc: @svetakvsundhar in case you are interested to check if it works later. Doesn't have to block this.



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. \n",
+        "In other words, the input is a finite, bounded data set. Examples include payroll and billing systems, which have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events \n",
+        "within a short period of time or near real-time on an infinite, unbounded data set. \n",
+        "Examples include fraud detection or intrusion detection, which requires the continuous processing of transaction data.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. \n",
+        "To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "W_63UtsoBRql"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "## Load the Data\n",
+        "\n",
+        "Let's import the example data we will be using throughout this tutorial. The [dataset](https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india) consists of **hourly air quality data (PM 2.5) in India from November 2017 to June 2022**.\n",
+        "\n",
+        "> The World Health Organization (WHO) reports 7 million premature deaths linked to air pollution each year. In India alone, more than 90% of the country's population live in areas where air quality is below the WHO's standards.\n",
+        "\n",
+        "**What does the data look like?**\n",
+        "\n",
+        "The data set has 36,192 rows and 6 columns in total recording the following attributes:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `Year`: Year of the measure\n",
+        "3.   `Month`: Month of the measure\n",
+        "4.   `Day`: Day of the measure\n",
+        "5.   `Hour`: Hour of the measure\n",
+        "6.   `PM2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "**For our purposes, we will focus on only the first and last column of the** `air_quality` **DataFrame**:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `PM 2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "Run the following cell to load the data into our file directory."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GTteBUZ-7e2s",
+        "outputId": "3af9cdb0-c248-4c6d-96f6-c3739fb66014"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Copying gs://batch-processing-example/air-quality-india.csv...\n",
+            "/ [1 files][  1.4 MiB/  1.4 MiB]                                                \n",
+            "Operation completed over 1 objects/1.4 MiB.                                      \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Copy the dataset file into the local file system from Google Cloud Storage.\n",
+        "!mkdir -p data\n",
+        "!gsutil cp gs://batch-processing-example/air-quality-india.csv data/"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1NcmPl7C43lY"
+      },
+      "source": [
+        "#### Data Preparation\n",
+        "\n",
+        "Before we load the data into a Beam pipeline, let's use Pandas to select two columns."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "dq-k7hwRf4MA",
+        "outputId": "7d70a959-5278-453e-9315-f5ed06821744"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-ca65f108-edf8-4e2b-8152-6df025dc7ff8\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th>Year</th>\n",
+              "      <th>Month</th>\n",
+              "      <th>Day</th>\n",
+              "      <th>Hour</th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>2017-11-07 12:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>12</td>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2017-11-07 13:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>13</td>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>2017-11-07 14:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>14</td>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>2017-11-07 15:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>15</td>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>2017-11-07 16:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>16</td>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "             Timestamp  Year  Month  Day  Hour   PM2.5\n",
+              "0  2017-11-07 12:00:00  2017     11    7    12   64.51\n",
+              "1  2017-11-07 13:00:00  2017     11    7    13   69.95\n",
+              "2  2017-11-07 14:00:00  2017     11    7    14   92.79\n",
+              "3  2017-11-07 15:00:00  2017     11    7    15  109.66\n",
+              "4  2017-11-07 16:00:00  2017     11    7    16  116.50"
+            ]
+          },
+          "execution_count": 4,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Load the data into a Python Pandas DataFrame.\n",
+        "import pandas as pd\n",
+        "\n",
+        "air_quality = pd.read_csv(\"data/air-quality-india.csv\")\n",
+        "air_quality.head()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 237
+        },
+        "id": "WNXrvP-wDIkA",
+        "outputId": "3e932987-41b3-4aaf-b49f-3707a9728322"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-db4299d8-d441-4fc1-af7d-8d1c199492ed\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th></th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 12:00:00</th>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 13:00:00</th>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 14:00:00</th>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 15:00:00</th>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 16:00:00</th>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "                      PM2.5\n",
+              "Timestamp                  \n",
+              "2017-11-07 12:00:00   64.51\n",
+              "2017-11-07 13:00:00   69.95\n",
+              "2017-11-07 14:00:00   92.79\n",
+              "2017-11-07 15:00:00  109.66\n",
+              "2017-11-07 16:00:00  116.50"
+            ]
+          },
+          "execution_count": 5,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "import csv\n",
+        "\n",
+        "#Select only the two features of the DataFrame we're interested in.\n",
+        "airq = air_quality.loc[:, [\"Timestamp\", \"PM2.5\"]].set_index(\"Timestamp\")\n",
+        "saved_new = pd.DataFrame(airq)\n",
+        "saved_new.to_csv(\"data/air_quality.csv\")\n",
+        "airq.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "VRFkb_sLDUCD"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 1. Average Air Quality Index (AQI)\n",
+        "\n",
+        "Before we explore more advanced batch processing with different types of windowing, we will start with a simple batch analysis example.\n",
+        "\n",
+        "Our **objective** is to analyze the *entire* dataset to find the **average PM2.5 air quality index** in India across the entire 11/2017-6/2022 period.\n",
+        "\n",
+        "> This examples uses the `GlobalWindow`, which is a single window that covers the entire PCollection. All pipelines use the [`GlobalWindow`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.GlobalWindow) by default. In many cases, especially for batch pipelines, this is what we want since we want to analyze all the data that we have."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 34
+        },
+        "id": "v06NFe9sDYXc",
+        "outputId": "f65eae63-0424-4ac0-8609-78e98ac21bd0"
+      },
+      "outputs": [
+        {
+          "data": {
+            "application/javascript": "\n        if (typeof window.interactive_beam_jquery == 'undefined') {\n          var jqueryScript = document.createElement('script');\n          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n          jqueryScript.type = 'text/javascript';\n          jqueryScript.onload = function() {\n            var datatableScript = document.createElement('script');\n            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n            datatableScript.type = 'text/javascript';\n            datatableScript.onload = function() {\n              window.interactive_beam_jquery = jQuery.noConflict(true);\n              window.interactive_beam_jquery(document).ready(function($){\n                \n              });\n            }\n            document.head.appendChild(datatableScript);\n          };\n          document.head.appendChild(jqueryScript);\n        } else {\n          window.interactive
 _beam_jquery(document).ready(function($){\n            \n          });\n        }"
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "49.308428658266905\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        " (\n",
+        "      pipeline\n",
+        "      | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "      | 'Parse file' >> beam.Map(parse_file)\n",
+        "      | 'Get PM' >> beam.Map(lambda x: float(x[1])) # only process PM2.5\n",
+        "      | 'Get mean value' >> beam.combiners.Mean.Globally()\n",
+        "      | beam.Map(print))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GmHEE1G5Y1z-",
+        "outputId": "248ee3d7-43af-4b53-9832-8da0eb7ac974"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "49.30842865826703"
+            ]
+          },
+          "execution_count": 7,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# To verify, the above mean value matches what Pandas produces\n",
+        "airq[\"PM2.5\"].mean()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "b3gGxC6w6qXx"
+      },
+      "source": [
+        "**Congratulations!** You just created a simple aggregation processing pipeline in batch using `GlobalWindow`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "vRameihqDJ8l"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 2. Advanced Processing in Batch with Windowing\n",
+        "\n",
+        "Sometimes, we want to [aggregate](https://beam.apache.org/documentation/transforms/python/overview/#aggregation) data, like `GroupByKey` or `Combine`, only at certain intervals, like hourly or daily, instead of processing the entire `PCollection` of data only once.\n",
+        "\n",
+        "In this case, our **objective** is to determine the **fluctuations of air quality *every 30 days*.\n",
+        "\n",
+        "**_Windows_** in Beam allow us to process only certain data intervals at a time.\n",
+        "In this notebook, we will go through different ways of windowing our pipeline.\n",
+        "\n",
+        "We have already been introduced to the default GlobalWindow (see above) that covers the entire PCollection. Now we will dive into **fixed time windows, sliding time windows, and session windows**.\n",
+        "\n",
+        "> [Another windowing tutorial](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb) with a toy dataset is recommended to go through."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gj0_S5Ka3-zb"
+      },
+      "source": [
+        "### First, we need to convert timestamps to Unix time\n",
+        "\n",
+        "Apache Beam requires us to provide the timestamp as Unix time. Let us write the simple time conversion function:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "metadata": {
+        "id": "nKBYsxFg4SIa"
+      },
+      "outputs": [],
+      "source": [
+        "import time\n",
+        "\n",
+        "# This function is modifiable and can convert integers to time formats like unix\n",
+        "# Without this function and .strptime, you may run into comparison issues!\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "_mPge0KdRx20",
+        "outputId": "43475bbe-548a-4817-ed0b-534cebbe70ce"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "1634220000"
+            ]
+          },
+          "execution_count": 9,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "to_unix_time('2021-10-14 14:00:00')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lL0_QONF1aMH"
+      },
+      "source": [
+        "### Second, let us define some helper functions to develop our pipeline\n",
+        "\n",
+        "In this code, we have a transform (`PrintWindowInfo`) to help us analyze an element alongside its window information, and we have another transform\n",
+        "(`PrintWindowInfo`) to help us analyze how many elements landed into each window.\n",
+        "We use a custom [`DoFn`](https://beam.apache.org/documentation/transforms/python/elementwise/pardo)\n",
+        "to access that information."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 10,
+      "metadata": {
+        "id": "KtPL-echb2xv"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Helper functions to develop our pipeline\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "class PrintElementInfo(beam.DoFn):\n",
+        "  \"\"\"Prints an element with its Window information for debugging.\"\"\"\n",
+        "  def process(self, element, timestamp=beam.DoFn.TimestampParam, window=beam.DoFn.WindowParam):\n",
+        "    print(f'[{human_readable_window(window)}] {timestamp.to_utc_datetime()} -- {element}')\n",
+        "    yield element\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def PrintWindowInfo(pcollection):\n",
+        "  \"\"\"Prints the Window information with AQI in that window for debugging.\"\"\"\n",
+        "  class PrintAQI(beam.DoFn):\n",
+        "    def process(self, mean_elements, window=beam.DoFn.WindowParam):\n",
+        "      print(f'>> Window [{human_readable_window(window)}], AQI: {mean_elements}')\n",
+        "      yield mean_elements\n",
+        "\n",
+        "  return (\n",
+        "      pcollection\n",
+        "      | 'Count elements per window' >> beam.combiners.Mean.Globally().without_defaults()\n",
+        "      | 'Print counts info' >> beam.ParDo(PrintAQI())\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dtQbcRU6XYCr"
+      },
+      "source": [
+        "Note: when you run below code, pay attention to how the human readable windows varies for each window type.\n",
+        "\n",
+        "To illustrate how windowing works, we will use the below toy data:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "id": "oOLx4IsZXoTO"
+      },
+      "outputs": [],
+      "source": [
+        "# a toy data\n",
+        "air_toy_data = [\n",
+        "    ['2021-10-14 14:00:00', '43.27'],\n",
+        "    ['2021-10-14 15:00:00', '44.17'],\n",
+        "    ['2021-10-14 16:00:00', '48.77'],\n",
+        "    ['2021-10-14 17:00:00', '55.57'],\n",
+        "    ['2021-10-14 18:00:00', '56.95'],\n",
+        "    ['2021-10-21 09:00:00', '36.77'],\n",
+        "    ['2021-10-21 10:00:00', '34.87'],\n",
+        "    ['2021-11-17 01:00:00', '62.64'],\n",
+        "    ['2021-11-17 02:00:00', '65.28'],\n",
+        "    ['2021-11-17 03:00:00', '65.53'],\n",
+        "    ['2021-11-17 04:00:00', '70.18'],\n",
+        "    ['2021-12-11 21:00:00', '69.07'],\n",
+        "    ['2022-01-02 21:00:00', '76.56'],\n",
+        "    ['2022-01-02 22:00:00', '78.77'],\n",
+        "    ['2022-01-02 23:00:00', '73.16'],\n",
+        "    ['2022-01-03 03:00:00', '74.05'],\n",
+        "    ['2022-01-03 19:00:00', '100.28'],\n",
+        "    ['2022-01-03 22:00:00', '80.92'],\n",
+        "    ['2022-01-04 05:00:00', '80.48'],\n",
+        "    ['2022-01-04 07:00:00', '84.0'],\n",
+        "    ['2022-01-04 18:00:00', '95.49'],\n",
+        "    ['2022-01-05 00:00:00', '69.01'],\n",
+        "    ['2022-01-05 07:00:00', '76.85'],]"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "_QoStuYV-uku"
+      },
+      "source": [
+        "### Fixed time windows\n",
+        "\n",
+        "`FixedWindows` allow us to create fixed-sized windows with evenly spaced intervals.\n",
+        "We only need to specify the _window size_ in seconds.\n",
+        "\n",
+        "In Python, we can use [`timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)\n",
+        "to help us do the conversion of minutes, hours, or days for us.\n",
+        "\n",
+        "We then use the [`WindowInto`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html?highlight=windowinto#apache_beam.transforms.core.WindowInto)\n",
+        "transform to apply the kind of window we want."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 12,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "Q7Q4yVzh8XWO",
+        "outputId": "076752f2-1b40-4d07-9419-88757adc99be"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-11-29 00:00:00 - 2021-12-29 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-09-30 00:00:00 - 2021-10-30 00:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-10-30 00:00:00 - 2021-11-29 00:00:00], AQI: 65.9075\n",
+            ">> Window [2021-11-29 00:00:00 - 2021-12-29 00:00:00], AQI: 69.07\n",
+            ">> Window [2021-12-29 00:00:00 - 2022-01-28 00:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ],
+      "source": [
+        "import time\n",
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "\n",
+        "# We first set the window size to around 1 month.\n",
+        "window_size = timedelta(days=30).total_seconds()  # in seconds\n",
+        "\n",
+        "\n",
+        "# Let us set up the windowed pipeline and compute AQI every 30 days\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Fixed windows' >> beam.WindowInto(beam.window.FixedWindows(window_size))\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "N-bqgv5K_INW"
+      },
+      "source": [
+        "### Sliding time windows\n",
+        "\n",
+        "[`Sliding windows`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.SlidingWindows)\n",
+        "allow us to compute AQI every 30 days but each window should cover the last 15 days.\n",
+        "We can specify the _window size_ in seconds just like with `FixedWindows` to define the window size. We also need to specify a _window period_ in seconds, which is how often we want to emit each window."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 13,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "Lrj1mb6Q_LW7",
+        "outputId": "4418b5bd-6dc0-44a3-ca1a-d231ec9af9e1"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "window_size:   2592000.0 seconds\n",
+            "window_period: 1296000.0 seconds\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-10-15 00:00:00 - 2021-11-14 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-10-15 00:00:00 - 2021-11-14 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-11-29 00:00:00 - 2021-12-29 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-09-30 00:00:00 - 2021-10-30 00:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-09-15 00:00:00 - 2021-10-15 00:00:00], AQI: 49.746\n",
+            ">> Window [2021-10-15 00:00:00 - 2021-11-14 00:00:00], AQI: 35.82\n",
+            ">> Window [2021-11-14 00:00:00 - 2021-12-14 00:00:00], AQI: 66.53999999999999\n",
+            ">> Window [2021-10-30 00:00:00 - 2021-11-29 00:00:00], AQI: 65.9075\n",
+            ">> Window [2021-11-29 00:00:00 - 2021-12-29 00:00:00], AQI: 69.07\n",
+            ">> Window [2021-12-29 00:00:00 - 2022-01-28 00:00:00], AQI: 80.86999999999999\n",
+            ">> Window [2021-12-14 00:00:00 - 2022-01-13 00:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "# Sliding windows of 30 days and emit one every 15 days.\n",
+        "window_size = timedelta(days=30).total_seconds()  # in seconds\n",
+        "window_period = timedelta(days=15).total_seconds()  # in seconds\n",
+        "print(f'window_size:   {window_size} seconds')\n",
+        "print(f'window_period: {window_period} seconds')\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | 'Air Quality' >> beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Sliding windows' >> beam.WindowInto(\n",
+        "          beam.window.SlidingWindows(window_size, window_period)\n",
+        "      )\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Y_SbevXv_MDk"
+      },
+      "source": [
+        "### Session time windows\n",
+        "\n",
+        "Maybe we don't want regular windows, but instead, have the windows reflect periods where activity happened. [`Sessions`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.Sessions)\n",
+        "allow us to create those windows.\n",
+        "We only need to specify a _gap size_ in seconds, which is the maximum number of seconds of inactivity to close a session window. In this case, if no event happens within 10 days, the current session windown closes and is emitted and a new session windown is created whenenver the next event happens."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 14,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zgNc-c3THB7G",
+        "outputId": "d8d5fed1-8748-4845-cbc3-0b898ce4bcd8"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "gap_size: 864000.0 seconds\n",
+            "[2021-10-14 14:00:00 - 2021-10-24 14:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-10-14 15:00:00 - 2021-10-24 15:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-10-14 16:00:00 - 2021-10-24 16:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-10-14 17:00:00 - 2021-10-24 17:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-10-14 18:00:00 - 2021-10-24 18:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-10-21 09:00:00 - 2021-10-31 09:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-10-21 10:00:00 - 2021-10-31 10:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-11-17 01:00:00 - 2021-11-27 01:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-11-17 02:00:00 - 2021-11-27 02:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-11-17 03:00:00 - 2021-11-27 03:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-11-17 04:00:00 - 2021-11-27 04:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-12-11 21:00:00 - 2021-12-21 21:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2022-01-02 21:00:00 - 2022-01-12 21:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2022-01-02 22:00:00 - 2022-01-12 22:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2022-01-02 23:00:00 - 2022-01-12 23:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2022-01-03 03:00:00 - 2022-01-13 03:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2022-01-03 19:00:00 - 2022-01-13 19:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2022-01-03 22:00:00 - 2022-01-13 22:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2022-01-04 05:00:00 - 2022-01-14 05:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2022-01-04 07:00:00 - 2022-01-14 07:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2022-01-04 18:00:00 - 2022-01-14 18:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2022-01-05 00:00:00 - 2022-01-15 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2022-01-05 07:00:00 - 2022-01-15 07:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-10-14 14:00:00 - 2021-10-31 10:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-11-17 01:00:00 - 2021-11-27 04:00:00], AQI: 65.9075\n",
+            ">> Window [2021-12-11 21:00:00 - 2021-12-21 21:00:00], AQI: 69.07\n",
+            ">> Window [2022-01-02 21:00:00 - 2022-01-15 07:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "# Sessions divided by 10 days.\n",
+        "gap_size = timedelta(days=10).total_seconds()  # in seconds\n",
+        "print(f'gap_size: {gap_size} seconds')\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | 'Air Quality' >> beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Session windows' >> beam.WindowInto(beam.window.Sessions(gap_size))\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XcX0t6hya85F"
+      },
+      "source": [
+        "Note how the above windows are irregular."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "EBSlyeBibNL0"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 3. Put All Together\n",
+        "\n",
+        "Section 2 uses the toy data to go through three different windowing types. Now it is time to analyze the real data (`data/air_quality.csv`).\n",
+        "\n",
+        "Can you build one Beam pipeline to compute all AQI values for these windows?\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "plMjLuh-lKzr"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Edit This Code Cell\n",
+        "..."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 15,
+      "metadata": {
+        "cellView": "form",
+        "id": "t6vBMax0bubN"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Expand to see the answer\n",
+        "import apache_beam as beam\n",
+        "import csv\n",
+        "import time\n",
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "import apache_beam.runners.interactive.interactive_beam as ib\n",
+        "\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def OutputWindowInfo(pcollection):\n",
+        "  \"\"\"Output the Window information with AQI in that window.\"\"\"\n",
+        "  class GetAQI(beam.DoFn):\n",
+        "    def process(self, mean_elements, window=beam.DoFn.WindowParam):\n",
+        "      yield human_readable_window(window), mean_elements\n",
+        "  return (\n",
+        "    pcollection\n",
+        "    | 'Count elements per window' >> beam.combiners.Mean.Globally().without_defaults()\n",
+        "    | 'Output counts info' >> beam.ParDo(GetAQI())\n",
+        "  )\n",
+        "\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "p =  beam.Pipeline()\n",
+        "\n",
+        "# get the data\n",
+        "read_text = (p | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "| 'Parse file' >> beam.Map(parse_file)\n",

Review Comment:
   nit: This formatting looks a bit strange in this section, since typically there is an indent in the text below opened brackets.
   
   Also multiple long lines throghout the notebook.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] svetakvsundhar commented on a diff in pull request #26185: add one windowing example

Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.

svetakvsundhar commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1175893859


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1831 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection.\n",

Review Comment:
   ```suggestion
           "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection, that need to continuously process transaction data.\n",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1176890840


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1831 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection.\n",

Review Comment:
   Done. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on PR #26185:
URL: https://github.com/apache/beam/pull/26185#issuecomment-1520850637

   > 
   
   The concepts are indeed very similar in both notebooks. But this one is focused on using the real air quality data to illustrate how these concepts work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1175799801


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",

Review Comment:
   I removed the batch from the tittle. Since this notebook is for people who might not have any data processing experience, I think it will be better to give them some generic ideas.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1181801958


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",

Review Comment:
   Done. Thanks.



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. \n",
+        "In other words, the input is a finite, bounded data set. Examples include payroll and billing systems, which have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events \n",
+        "within a short period of time or near real-time on an infinite, unbounded data set. \n",
+        "Examples include fraud detection or intrusion detection, which requires the continuous processing of transaction data.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. \n",
+        "To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1175801699


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch and stream data processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ],
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ],
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ],
+      "metadata": {
+        "id": "zmJ0pCmSvD0-",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "execution_count": 1,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ],
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. In other words, the input is a finite, bounded data set. An example is payroll and billing systems that have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events within a short period of time or near real-time on an infinite, unbounded data set. An example would be fraud detection or intrusion detection.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."
+      ],
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "## Load the Data\n",
+        "\n",
+        "Let's import the example data we will be using throughout this tutorial. The [dataset](https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india) consists of **hourly air quality data (PM 2.5) in India from November 2017 to June 2022**.\n",
+        "\n",
+        "> The World Health Organization (WHO) reports 7 million premature deaths linked to air pollution each year. In India alone, more than 90% of the country's population live in areas where air quality is below the WHO's standards.\n",
+        "\n",
+        "**What does the data look like?**\n",
+        "\n",
+        "The data set has 36,192 rows and 6 columns in total recording the following attributes:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `Year`: Year of the measure\n",
+        "3.   `Month`: Month of the measure\n",
+        "4.   `Day`: Day of the measure\n",
+        "5.   `Hour`: Hour of the measure\n",
+        "6.   `PM2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "**For our purposes, we will focus on only the first and last column of the** `air_quality` **DataFrame**:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `PM 2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "Run the following cell to load the data into our file directory."
+      ],
+      "metadata": {
+        "id": "W_63UtsoBRql"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "id": "GTteBUZ-7e2s",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "3af9cdb0-c248-4c6d-96f6-c3739fb66014"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Copying gs://batch-processing-example/air-quality-india.csv...\n",
+            "/ [1 files][  1.4 MiB/  1.4 MiB]                                                \n",
+            "Operation completed over 1 objects/1.4 MiB.                                      \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Copy the dataset file into the local file system from Google Cloud Storage.\n",
+        "!mkdir -p data\n",
+        "!gsutil cp gs://batch-processing-example/air-quality-india.csv data/"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Data Preparation\n",
+        "\n",
+        "Before we load the data into a Beam pipeline, let's use Pandas to select two columns."
+      ],
+      "metadata": {
+        "id": "1NcmPl7C43lY"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Load the data into a Python Pandas DataFrame.\n",

Review Comment:
   well, Beam Dataframe already has a good notebook: https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/dataframes.ipynb. I am not sure whether it covers all the operations for this example.



##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1880 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ],
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        " # **Introduction to Windowing in Apache Beam**\n",

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on a diff in pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on code in PR #26185:
URL: https://github.com/apache/beam/pull/26185#discussion_r1181803515


##########
examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb:
##########
@@ -0,0 +1,1836 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/learn_beam_windowing_by_doing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "L7ZbRufePd2g"
+      },
+      "outputs": [],
+      "source": [
+        "#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License."
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "83TJhNxLD7-W"
+      },
+      "source": [
+        " # **Introduction to Windowing for Batch Processing in Apache Beam**\n",
+        "\n",
+        "In this notebook, we will learn the fundamentals of **batch processing** as we walk through a few introductory examples in Beam. The pipelines in these examples process real-world data for air quality levels in India between 2017 and 2022.\n",
+        "\n",
+        "After this tutorial you should have a basic understanding of the following:\n",
+        "\n",
+        "*   What is **batch vs. stream** data processing?\n",
+        "*   How can I use Beam to run a **simple batch analysis job**?\n",
+        "*   How can I use Beam's **windowing features** to process only certain intervals of data at a time?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Dj3ftRRqfumW"
+      },
+      "source": [
+        "To begin, run the following cell to set up Apache Beam."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zmJ0pCmSvD0-",
+        "outputId": "9041f637-12a0-4f78-f60b-ebd3c3a1c186"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.5/14.5 MB\u001b[0m \u001b[31m53.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.1/144.1 kB\u001b[0m \u001b[31m1.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.7/89.7 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m515.5/515.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m22.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m152.0/152.0 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for crcmod (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
+          ]
+        }
+      ],
+      "source": [
+        "# Install apache-beam.\n",
+        "!pip install --quiet apache-beam"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "7sBoLahzPlJ1"
+      },
+      "outputs": [],
+      "source": [
+        "# Set the logging level to reduce verbose information\n",
+        "import logging\n",
+        "\n",
+        "logging.root.setLevel(logging.ERROR)"
+      ]
+    },
+    {
+      "attachments": {},
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BB6FAwYj1dAi"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "<hr style=\"border: 1px solid #fdb515;\" />\n",
+        "\n",
+        "## Batch vs. Stream Data Processing\n",
+        "\n",
+        "What's the difference?\n",
+        "\n",
+        "**Batch processing** is when data processing and analysis happens on a set of data that have already been stored over a period of time. \n",
+        "In other words, the input is a finite, bounded data set. Examples include payroll and billing systems, which have to be processed weekly or monthly.\n",
+        "\n",
+        "**Stream processing** happens *as* data flows through a system. This results in analysis and reporting of events \n",
+        "within a short period of time or near real-time on an infinite, unbounded data set. \n",
+        "Examples include fraud detection or intrusion detection, which requires the continuous processing of transaction data.\n",
+        "\n",
+        "> This tutorial will focus on **batch processing** examples. \n",
+        "To learn more about stream processing in Beam, check out [this](https://beam.apache.org/documentation/sdks/python-streaming/)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "W_63UtsoBRql"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "## Load the Data\n",
+        "\n",
+        "Let's import the example data we will be using throughout this tutorial. The [dataset](https://www.kaggle.com/datasets/fedesoriano/air-quality-data-in-india) consists of **hourly air quality data (PM 2.5) in India from November 2017 to June 2022**.\n",
+        "\n",
+        "> The World Health Organization (WHO) reports 7 million premature deaths linked to air pollution each year. In India alone, more than 90% of the country's population live in areas where air quality is below the WHO's standards.\n",
+        "\n",
+        "**What does the data look like?**\n",
+        "\n",
+        "The data set has 36,192 rows and 6 columns in total recording the following attributes:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `Year`: Year of the measure\n",
+        "3.   `Month`: Month of the measure\n",
+        "4.   `Day`: Day of the measure\n",
+        "5.   `Hour`: Hour of the measure\n",
+        "6.   `PM2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "**For our purposes, we will focus on only the first and last column of the** `air_quality` **DataFrame**:\n",
+        "\n",
+        "1.   `Timestamp`: Timestamp in the format YYYY-MM-DD HH:MM:SS\n",
+        "2.   `PM 2.5`: Fine particulate matter air pollutant level in air\n",
+        "\n",
+        "Run the following cell to load the data into our file directory."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GTteBUZ-7e2s",
+        "outputId": "3af9cdb0-c248-4c6d-96f6-c3739fb66014"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Copying gs://batch-processing-example/air-quality-india.csv...\n",
+            "/ [1 files][  1.4 MiB/  1.4 MiB]                                                \n",
+            "Operation completed over 1 objects/1.4 MiB.                                      \n"
+          ]
+        }
+      ],
+      "source": [
+        "# Copy the dataset file into the local file system from Google Cloud Storage.\n",
+        "!mkdir -p data\n",
+        "!gsutil cp gs://batch-processing-example/air-quality-india.csv data/"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1NcmPl7C43lY"
+      },
+      "source": [
+        "#### Data Preparation\n",
+        "\n",
+        "Before we load the data into a Beam pipeline, let's use Pandas to select two columns."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 206
+        },
+        "id": "dq-k7hwRf4MA",
+        "outputId": "7d70a959-5278-453e-9315-f5ed06821744"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-ca65f108-edf8-4e2b-8152-6df025dc7ff8\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th>Year</th>\n",
+              "      <th>Month</th>\n",
+              "      <th>Day</th>\n",
+              "      <th>Hour</th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>2017-11-07 12:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>12</td>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>2017-11-07 13:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>13</td>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>2017-11-07 14:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>14</td>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>2017-11-07 15:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>15</td>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>4</th>\n",
+              "      <td>2017-11-07 16:00:00</td>\n",
+              "      <td>2017</td>\n",
+              "      <td>11</td>\n",
+              "      <td>7</td>\n",
+              "      <td>16</td>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ca65f108-edf8-4e2b-8152-6df025dc7ff8')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8 button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-ca65f108-edf8-4e2b-8152-6df025dc7ff8');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "             Timestamp  Year  Month  Day  Hour   PM2.5\n",
+              "0  2017-11-07 12:00:00  2017     11    7    12   64.51\n",
+              "1  2017-11-07 13:00:00  2017     11    7    13   69.95\n",
+              "2  2017-11-07 14:00:00  2017     11    7    14   92.79\n",
+              "3  2017-11-07 15:00:00  2017     11    7    15  109.66\n",
+              "4  2017-11-07 16:00:00  2017     11    7    16  116.50"
+            ]
+          },
+          "execution_count": 4,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# Load the data into a Python Pandas DataFrame.\n",
+        "import pandas as pd\n",
+        "\n",
+        "air_quality = pd.read_csv(\"data/air-quality-india.csv\")\n",
+        "air_quality.head()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 237
+        },
+        "id": "WNXrvP-wDIkA",
+        "outputId": "3e932987-41b3-4aaf-b49f-3707a9728322"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "\n",
+              "  <div id=\"df-db4299d8-d441-4fc1-af7d-8d1c199492ed\">\n",
+              "    <div class=\"colab-df-container\">\n",
+              "      <div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>PM2.5</th>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>Timestamp</th>\n",
+              "      <th></th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 12:00:00</th>\n",
+              "      <td>64.51</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 13:00:00</th>\n",
+              "      <td>69.95</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 14:00:00</th>\n",
+              "      <td>92.79</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 15:00:00</th>\n",
+              "      <td>109.66</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2017-11-07 16:00:00</th>\n",
+              "      <td>116.50</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>\n",
+              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "              title=\"Convert this dataframe to an interactive table.\"\n",
+              "              style=\"display:none;\">\n",
+              "        \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
+              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
+              "  </svg>\n",
+              "      </button>\n",
+              "      \n",
+              "  \n",
+              "    <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-db4299d8-d441-4fc1-af7d-8d1c199492ed')\"\n",
+              "            title=\"Generate charts.\"\n",
+              "            style=\"display:none;\">\n",
+              "      \n",
+              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
+              "       width=\"24px\">\n",
+              "      <g>\n",
+              "          <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
+              "      </g>\n",
+              "  </svg>\n",
+              "    </button>\n",
+              "   \n",
+              "  <style>\n",
+              "    .colab-df-quickchart {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-quickchart:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-quickchart:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "    <script>\n",
+              "      const quickchartButtonEl =\n",
+              "        document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-quickchart');\n",
+              "      quickchartButtonEl.style.display =\n",
+              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "      async function quickchart(key) {\n",
+              "        const containerElement = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "        const charts = await google.colab.kernel.invokeFunction(\n",
+              "            'generateCharts', [key], {})\n",
+              "      }\n",
+              "    </script>    \n",
+              "  <style>\n",
+              "    .colab-df-container {\n",
+              "      display:flex;\n",
+              "      flex-wrap:wrap;\n",
+              "      gap: 12px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert {\n",
+              "      background-color: #E8F0FE;\n",
+              "      border: none;\n",
+              "      border-radius: 50%;\n",
+              "      cursor: pointer;\n",
+              "      display: none;\n",
+              "      fill: #1967D2;\n",
+              "      height: 32px;\n",
+              "      padding: 0 0 0 0;\n",
+              "      width: 32px;\n",
+              "    }\n",
+              "\n",
+              "    .colab-df-convert:hover {\n",
+              "      background-color: #E2EBFA;\n",
+              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
+              "      fill: #174EA6;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert {\n",
+              "      background-color: #3B4455;\n",
+              "      fill: #D2E3FC;\n",
+              "    }\n",
+              "\n",
+              "    [theme=dark] .colab-df-convert:hover {\n",
+              "      background-color: #434B5C;\n",
+              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+              "      fill: #FFFFFF;\n",
+              "    }\n",
+              "  </style>\n",
+              "\n",
+              "      <script>\n",
+              "        const buttonEl =\n",
+              "          document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed button.colab-df-convert');\n",
+              "        buttonEl.style.display =\n",
+              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
+              "\n",
+              "        async function convertToInteractive(key) {\n",
+              "          const element = document.querySelector('#df-db4299d8-d441-4fc1-af7d-8d1c199492ed');\n",
+              "          const dataTable =\n",
+              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
+              "                                                     [key], {});\n",
+              "          if (!dataTable) return;\n",
+              "\n",
+              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
+              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
+              "            + ' to learn more about interactive tables.';\n",
+              "          element.innerHTML = '';\n",
+              "          dataTable['output_type'] = 'display_data';\n",
+              "          await google.colab.output.renderOutput(dataTable, element);\n",
+              "          const docLink = document.createElement('div');\n",
+              "          docLink.innerHTML = docLinkHtml;\n",
+              "          element.appendChild(docLink);\n",
+              "        }\n",
+              "      </script>\n",
+              "    </div>\n",
+              "  </div>\n",
+              "  "
+            ],
+            "text/plain": [
+              "                      PM2.5\n",
+              "Timestamp                  \n",
+              "2017-11-07 12:00:00   64.51\n",
+              "2017-11-07 13:00:00   69.95\n",
+              "2017-11-07 14:00:00   92.79\n",
+              "2017-11-07 15:00:00  109.66\n",
+              "2017-11-07 16:00:00  116.50"
+            ]
+          },
+          "execution_count": 5,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "import csv\n",
+        "\n",
+        "#Select only the two features of the DataFrame we're interested in.\n",
+        "airq = air_quality.loc[:, [\"Timestamp\", \"PM2.5\"]].set_index(\"Timestamp\")\n",
+        "saved_new = pd.DataFrame(airq)\n",
+        "saved_new.to_csv(\"data/air_quality.csv\")\n",
+        "airq.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "VRFkb_sLDUCD"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 1. Average Air Quality Index (AQI)\n",
+        "\n",
+        "Before we explore more advanced batch processing with different types of windowing, we will start with a simple batch analysis example.\n",
+        "\n",
+        "Our **objective** is to analyze the *entire* dataset to find the **average PM2.5 air quality index** in India across the entire 11/2017-6/2022 period.\n",
+        "\n",
+        "> This examples uses the `GlobalWindow`, which is a single window that covers the entire PCollection. All pipelines use the [`GlobalWindow`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.GlobalWindow) by default. In many cases, especially for batch pipelines, this is what we want since we want to analyze all the data that we have."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 34
+        },
+        "id": "v06NFe9sDYXc",
+        "outputId": "f65eae63-0424-4ac0-8609-78e98ac21bd0"
+      },
+      "outputs": [
+        {
+          "data": {
+            "application/javascript": "\n        if (typeof window.interactive_beam_jquery == 'undefined') {\n          var jqueryScript = document.createElement('script');\n          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n          jqueryScript.type = 'text/javascript';\n          jqueryScript.onload = function() {\n            var datatableScript = document.createElement('script');\n            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n            datatableScript.type = 'text/javascript';\n            datatableScript.onload = function() {\n              window.interactive_beam_jquery = jQuery.noConflict(true);\n              window.interactive_beam_jquery(document).ready(function($){\n                \n              });\n            }\n            document.head.appendChild(datatableScript);\n          };\n          document.head.appendChild(jqueryScript);\n        } else {\n          window.interactive
 _beam_jquery(document).ready(function($){\n            \n          });\n        }"
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "49.308428658266905\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        " (\n",
+        "      pipeline\n",
+        "      | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "      | 'Parse file' >> beam.Map(parse_file)\n",
+        "      | 'Get PM' >> beam.Map(lambda x: float(x[1])) # only process PM2.5\n",
+        "      | 'Get mean value' >> beam.combiners.Mean.Globally()\n",
+        "      | beam.Map(print))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GmHEE1G5Y1z-",
+        "outputId": "248ee3d7-43af-4b53-9832-8da0eb7ac974"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "49.30842865826703"
+            ]
+          },
+          "execution_count": 7,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "# To verify, the above mean value matches what Pandas produces\n",
+        "airq[\"PM2.5\"].mean()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "b3gGxC6w6qXx"
+      },
+      "source": [
+        "**Congratulations!** You just created a simple aggregation processing pipeline in batch using `GlobalWindow`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "vRameihqDJ8l"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 2. Advanced Processing in Batch with Windowing\n",
+        "\n",
+        "Sometimes, we want to [aggregate](https://beam.apache.org/documentation/transforms/python/overview/#aggregation) data, like `GroupByKey` or `Combine`, only at certain intervals, like hourly or daily, instead of processing the entire `PCollection` of data only once.\n",
+        "\n",
+        "In this case, our **objective** is to determine the **fluctuations of air quality *every 30 days*.\n",
+        "\n",
+        "**_Windows_** in Beam allow us to process only certain data intervals at a time.\n",
+        "In this notebook, we will go through different ways of windowing our pipeline.\n",
+        "\n",
+        "We have already been introduced to the default GlobalWindow (see above) that covers the entire PCollection. Now we will dive into **fixed time windows, sliding time windows, and session windows**.\n",
+        "\n",
+        "> [Another windowing tutorial](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb) with a toy dataset is recommended to go through."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gj0_S5Ka3-zb"
+      },
+      "source": [
+        "### First, we need to convert timestamps to Unix time\n",
+        "\n",
+        "Apache Beam requires us to provide the timestamp as Unix time. Let us write the simple time conversion function:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "metadata": {
+        "id": "nKBYsxFg4SIa"
+      },
+      "outputs": [],
+      "source": [
+        "import time\n",
+        "\n",
+        "# This function is modifiable and can convert integers to time formats like unix\n",
+        "# Without this function and .strptime, you may run into comparison issues!\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "_mPge0KdRx20",
+        "outputId": "43475bbe-548a-4817-ed0b-534cebbe70ce"
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "1634220000"
+            ]
+          },
+          "execution_count": 9,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "to_unix_time('2021-10-14 14:00:00')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lL0_QONF1aMH"
+      },
+      "source": [
+        "### Second, let us define some helper functions to develop our pipeline\n",
+        "\n",
+        "In this code, we have a transform (`PrintWindowInfo`) to help us analyze an element alongside its window information, and we have another transform\n",
+        "(`PrintWindowInfo`) to help us analyze how many elements landed into each window.\n",
+        "We use a custom [`DoFn`](https://beam.apache.org/documentation/transforms/python/elementwise/pardo)\n",
+        "to access that information."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 10,
+      "metadata": {
+        "id": "KtPL-echb2xv"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Helper functions to develop our pipeline\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "class PrintElementInfo(beam.DoFn):\n",
+        "  \"\"\"Prints an element with its Window information for debugging.\"\"\"\n",
+        "  def process(self, element, timestamp=beam.DoFn.TimestampParam, window=beam.DoFn.WindowParam):\n",
+        "    print(f'[{human_readable_window(window)}] {timestamp.to_utc_datetime()} -- {element}')\n",
+        "    yield element\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def PrintWindowInfo(pcollection):\n",
+        "  \"\"\"Prints the Window information with AQI in that window for debugging.\"\"\"\n",
+        "  class PrintAQI(beam.DoFn):\n",
+        "    def process(self, mean_elements, window=beam.DoFn.WindowParam):\n",
+        "      print(f'>> Window [{human_readable_window(window)}], AQI: {mean_elements}')\n",
+        "      yield mean_elements\n",
+        "\n",
+        "  return (\n",
+        "      pcollection\n",
+        "      | 'Count elements per window' >> beam.combiners.Mean.Globally().without_defaults()\n",
+        "      | 'Print counts info' >> beam.ParDo(PrintAQI())\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dtQbcRU6XYCr"
+      },
+      "source": [
+        "Note: when you run below code, pay attention to how the human readable windows varies for each window type.\n",
+        "\n",
+        "To illustrate how windowing works, we will use the below toy data:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "id": "oOLx4IsZXoTO"
+      },
+      "outputs": [],
+      "source": [
+        "# a toy data\n",
+        "air_toy_data = [\n",
+        "    ['2021-10-14 14:00:00', '43.27'],\n",
+        "    ['2021-10-14 15:00:00', '44.17'],\n",
+        "    ['2021-10-14 16:00:00', '48.77'],\n",
+        "    ['2021-10-14 17:00:00', '55.57'],\n",
+        "    ['2021-10-14 18:00:00', '56.95'],\n",
+        "    ['2021-10-21 09:00:00', '36.77'],\n",
+        "    ['2021-10-21 10:00:00', '34.87'],\n",
+        "    ['2021-11-17 01:00:00', '62.64'],\n",
+        "    ['2021-11-17 02:00:00', '65.28'],\n",
+        "    ['2021-11-17 03:00:00', '65.53'],\n",
+        "    ['2021-11-17 04:00:00', '70.18'],\n",
+        "    ['2021-12-11 21:00:00', '69.07'],\n",
+        "    ['2022-01-02 21:00:00', '76.56'],\n",
+        "    ['2022-01-02 22:00:00', '78.77'],\n",
+        "    ['2022-01-02 23:00:00', '73.16'],\n",
+        "    ['2022-01-03 03:00:00', '74.05'],\n",
+        "    ['2022-01-03 19:00:00', '100.28'],\n",
+        "    ['2022-01-03 22:00:00', '80.92'],\n",
+        "    ['2022-01-04 05:00:00', '80.48'],\n",
+        "    ['2022-01-04 07:00:00', '84.0'],\n",
+        "    ['2022-01-04 18:00:00', '95.49'],\n",
+        "    ['2022-01-05 00:00:00', '69.01'],\n",
+        "    ['2022-01-05 07:00:00', '76.85'],]"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "_QoStuYV-uku"
+      },
+      "source": [
+        "### Fixed time windows\n",
+        "\n",
+        "`FixedWindows` allow us to create fixed-sized windows with evenly spaced intervals.\n",
+        "We only need to specify the _window size_ in seconds.\n",
+        "\n",
+        "In Python, we can use [`timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)\n",
+        "to help us do the conversion of minutes, hours, or days for us.\n",
+        "\n",
+        "We then use the [`WindowInto`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html?highlight=windowinto#apache_beam.transforms.core.WindowInto)\n",
+        "transform to apply the kind of window we want."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 12,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "Q7Q4yVzh8XWO",
+        "outputId": "076752f2-1b40-4d07-9419-88757adc99be"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-11-29 00:00:00 - 2021-12-29 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-09-30 00:00:00 - 2021-10-30 00:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-10-30 00:00:00 - 2021-11-29 00:00:00], AQI: 65.9075\n",
+            ">> Window [2021-11-29 00:00:00 - 2021-12-29 00:00:00], AQI: 69.07\n",
+            ">> Window [2021-12-29 00:00:00 - 2022-01-28 00:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ],
+      "source": [
+        "import time\n",
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "\n",
+        "# We first set the window size to around 1 month.\n",
+        "window_size = timedelta(days=30).total_seconds()  # in seconds\n",
+        "\n",
+        "\n",
+        "# Let us set up the windowed pipeline and compute AQI every 30 days\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Fixed windows' >> beam.WindowInto(beam.window.FixedWindows(window_size))\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "N-bqgv5K_INW"
+      },
+      "source": [
+        "### Sliding time windows\n",
+        "\n",
+        "[`Sliding windows`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.SlidingWindows)\n",
+        "allow us to compute AQI every 30 days but each window should cover the last 15 days.\n",
+        "We can specify the _window size_ in seconds just like with `FixedWindows` to define the window size. We also need to specify a _window period_ in seconds, which is how often we want to emit each window."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 13,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "Lrj1mb6Q_LW7",
+        "outputId": "4418b5bd-6dc0-44a3-ca1a-d231ec9af9e1"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "window_size:   2592000.0 seconds\n",
+            "window_period: 1296000.0 seconds\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-09-15 00:00:00 - 2021-10-15 00:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-10-15 00:00:00 - 2021-11-14 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-10-15 00:00:00 - 2021-11-14 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-09-30 00:00:00 - 2021-10-30 00:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-10-30 00:00:00 - 2021-11-29 00:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-11-29 00:00:00 - 2021-12-29 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-11-14 00:00:00 - 2021-12-14 00:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2021-12-29 00:00:00 - 2022-01-28 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            "[2021-12-14 00:00:00 - 2022-01-13 00:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-09-30 00:00:00 - 2021-10-30 00:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-09-15 00:00:00 - 2021-10-15 00:00:00], AQI: 49.746\n",
+            ">> Window [2021-10-15 00:00:00 - 2021-11-14 00:00:00], AQI: 35.82\n",
+            ">> Window [2021-11-14 00:00:00 - 2021-12-14 00:00:00], AQI: 66.53999999999999\n",
+            ">> Window [2021-10-30 00:00:00 - 2021-11-29 00:00:00], AQI: 65.9075\n",
+            ">> Window [2021-11-29 00:00:00 - 2021-12-29 00:00:00], AQI: 69.07\n",
+            ">> Window [2021-12-29 00:00:00 - 2022-01-28 00:00:00], AQI: 80.86999999999999\n",
+            ">> Window [2021-12-14 00:00:00 - 2022-01-13 00:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "# Sliding windows of 30 days and emit one every 15 days.\n",
+        "window_size = timedelta(days=30).total_seconds()  # in seconds\n",
+        "window_period = timedelta(days=15).total_seconds()  # in seconds\n",
+        "print(f'window_size:   {window_size} seconds')\n",
+        "print(f'window_period: {window_period} seconds')\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | 'Air Quality' >> beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Sliding windows' >> beam.WindowInto(\n",
+        "          beam.window.SlidingWindows(window_size, window_period)\n",
+        "      )\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Y_SbevXv_MDk"
+      },
+      "source": [
+        "### Session time windows\n",
+        "\n",
+        "Maybe we don't want regular windows, but instead, have the windows reflect periods where activity happened. [`Sessions`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.window.html#apache_beam.transforms.window.Sessions)\n",
+        "allow us to create those windows.\n",
+        "We only need to specify a _gap size_ in seconds, which is the maximum number of seconds of inactivity to close a session window. In this case, if no event happens within 10 days, the current session windown closes and is emitted and a new session windown is created whenenver the next event happens."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 14,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "zgNc-c3THB7G",
+        "outputId": "d8d5fed1-8748-4845-cbc3-0b898ce4bcd8"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "gap_size: 864000.0 seconds\n",
+            "[2021-10-14 14:00:00 - 2021-10-24 14:00:00] 2021-10-14 14:00:00 -- 43.27\n",
+            "[2021-10-14 15:00:00 - 2021-10-24 15:00:00] 2021-10-14 15:00:00 -- 44.17\n",
+            "[2021-10-14 16:00:00 - 2021-10-24 16:00:00] 2021-10-14 16:00:00 -- 48.77\n",
+            "[2021-10-14 17:00:00 - 2021-10-24 17:00:00] 2021-10-14 17:00:00 -- 55.57\n",
+            "[2021-10-14 18:00:00 - 2021-10-24 18:00:00] 2021-10-14 18:00:00 -- 56.95\n",
+            "[2021-10-21 09:00:00 - 2021-10-31 09:00:00] 2021-10-21 09:00:00 -- 36.77\n",
+            "[2021-10-21 10:00:00 - 2021-10-31 10:00:00] 2021-10-21 10:00:00 -- 34.87\n",
+            "[2021-11-17 01:00:00 - 2021-11-27 01:00:00] 2021-11-17 01:00:00 -- 62.64\n",
+            "[2021-11-17 02:00:00 - 2021-11-27 02:00:00] 2021-11-17 02:00:00 -- 65.28\n",
+            "[2021-11-17 03:00:00 - 2021-11-27 03:00:00] 2021-11-17 03:00:00 -- 65.53\n",
+            "[2021-11-17 04:00:00 - 2021-11-27 04:00:00] 2021-11-17 04:00:00 -- 70.18\n",
+            "[2021-12-11 21:00:00 - 2021-12-21 21:00:00] 2021-12-11 21:00:00 -- 69.07\n",
+            "[2022-01-02 21:00:00 - 2022-01-12 21:00:00] 2022-01-02 21:00:00 -- 76.56\n",
+            "[2022-01-02 22:00:00 - 2022-01-12 22:00:00] 2022-01-02 22:00:00 -- 78.77\n",
+            "[2022-01-02 23:00:00 - 2022-01-12 23:00:00] 2022-01-02 23:00:00 -- 73.16\n",
+            "[2022-01-03 03:00:00 - 2022-01-13 03:00:00] 2022-01-03 03:00:00 -- 74.05\n",
+            "[2022-01-03 19:00:00 - 2022-01-13 19:00:00] 2022-01-03 19:00:00 -- 100.28\n",
+            "[2022-01-03 22:00:00 - 2022-01-13 22:00:00] 2022-01-03 22:00:00 -- 80.92\n",
+            "[2022-01-04 05:00:00 - 2022-01-14 05:00:00] 2022-01-04 05:00:00 -- 80.48\n",
+            "[2022-01-04 07:00:00 - 2022-01-14 07:00:00] 2022-01-04 07:00:00 -- 84.0\n",
+            "[2022-01-04 18:00:00 - 2022-01-14 18:00:00] 2022-01-04 18:00:00 -- 95.49\n",
+            "[2022-01-05 00:00:00 - 2022-01-15 00:00:00] 2022-01-05 00:00:00 -- 69.01\n",
+            "[2022-01-05 07:00:00 - 2022-01-15 07:00:00] 2022-01-05 07:00:00 -- 76.85\n",
+            ">> Window [2021-10-14 14:00:00 - 2021-10-31 10:00:00], AQI: 45.76714285714286\n",
+            ">> Window [2021-11-17 01:00:00 - 2021-11-27 04:00:00], AQI: 65.9075\n",
+            ">> Window [2021-12-11 21:00:00 - 2021-12-21 21:00:00], AQI: 69.07\n",
+            ">> Window [2022-01-02 21:00:00 - 2022-01-15 07:00:00], AQI: 80.86999999999999\n"
+          ]
+        }
+      ],
+      "source": [
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "\n",
+        "# Sessions divided by 10 days.\n",
+        "gap_size = timedelta(days=10).total_seconds()  # in seconds\n",
+        "print(f'gap_size: {gap_size} seconds')\n",
+        "\n",
+        "with beam.Pipeline() as pipeline:\n",
+        "  (\n",
+        "      pipeline\n",
+        "      | 'Air Quality' >> beam.Create(air_toy_data)\n",
+        "      | 'With timestamps' >> beam.MapTuple(\n",
+        "          lambda timestamp, element:\n",
+        "              beam.window.TimestampedValue(float(element), to_unix_time(timestamp))\n",
+        "      )\n",
+        "      | 'Session windows' >> beam.WindowInto(beam.window.Sessions(gap_size))\n",
+        "      | 'Print element info' >> beam.ParDo(PrintElementInfo())\n",
+        "      | 'Print window info' >> PrintWindowInfo()\n",
+        "  )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XcX0t6hya85F"
+      },
+      "source": [
+        "Note how the above windows are irregular."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "EBSlyeBibNL0"
+      },
+      "source": [
+        "<hr style=\"border: 5px solid #003262;\" />\n",
+        "\n",
+        "# 3. Put All Together\n",
+        "\n",
+        "Section 2 uses the toy data to go through three different windowing types. Now it is time to analyze the real data (`data/air_quality.csv`).\n",
+        "\n",
+        "Can you build one Beam pipeline to compute all AQI values for these windows?\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "plMjLuh-lKzr"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Edit This Code Cell\n",
+        "..."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 15,
+      "metadata": {
+        "cellView": "form",
+        "id": "t6vBMax0bubN"
+      },
+      "outputs": [],
+      "source": [
+        "#@title Expand to see the answer\n",
+        "import apache_beam as beam\n",
+        "import csv\n",
+        "import time\n",
+        "import apache_beam as beam\n",
+        "from datetime import timedelta\n",
+        "import apache_beam.runners.interactive.interactive_beam as ib\n",
+        "\n",
+        "def to_unix_time(time_str: str, time_format='%Y-%m-%d %H:%M:%S') -> int:\n",
+        "  \"\"\"Converts a time string into Unix time.\"\"\"\n",
+        "  time_tuple = time.strptime(time_str, time_format)\n",
+        "  return int(time.mktime(time_tuple))\n",
+        "\n",
+        "def human_readable_window(window) -> str:\n",
+        "  \"\"\"Formats a window object into a human readable string.\"\"\"\n",
+        "  if isinstance(window, beam.window.GlobalWindow):\n",
+        "    return str(window)\n",
+        "  return f'{window.start.to_utc_datetime()} - {window.end.to_utc_datetime()}'\n",
+        "\n",
+        "@beam.ptransform_fn\n",
+        "def OutputWindowInfo(pcollection):\n",
+        "  \"\"\"Output the Window information with AQI in that window.\"\"\"\n",
+        "  class GetAQI(beam.DoFn):\n",
+        "    def process(self, mean_elements, window=beam.DoFn.WindowParam):\n",
+        "      yield human_readable_window(window), mean_elements\n",
+        "  return (\n",
+        "    pcollection\n",
+        "    | 'Count elements per window' >> beam.combiners.Mean.Globally().without_defaults()\n",
+        "    | 'Output counts info' >> beam.ParDo(GetAQI())\n",
+        "  )\n",
+        "\n",
+        "\n",
+        "def parse_file(element):\n",
+        "  file = csv.reader([element], quotechar='\"', delimiter=',',\n",
+        "                    quoting=csv.QUOTE_ALL, skipinitialspace=True)\n",
+        "  for line in file:\n",
+        "    return line\n",
+        "\n",
+        "p =  beam.Pipeline()\n",
+        "\n",
+        "# get the data\n",
+        "read_text = (p | 'Read input file' >> beam.io.ReadFromText(\"data/air_quality.csv\",\n",
+        "                                                  skip_header_lines=1)\n",
+        "| 'Parse file' >> beam.Map(parse_file)\n",

Review Comment:
   Formatting the cells is so painful. Colab does not provide the easy way to do that. I reformatted this cell and cleaned up some long lines (without a good formatting tool). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] github-actions[bot] commented on pull request #26185: add one windowing example

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #26185:
URL: https://github.com/apache/beam/pull/26185#issuecomment-1500764176

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] liferoad commented on pull request #26185: add one windowing example

Posted by "liferoad (via GitHub)" <gi...@apache.org>.

liferoad commented on PR #26185:
URL: https://github.com/apache/beam/pull/26185#issuecomment-1500764106

   R: @davidcavazos


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org