You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/05 11:07:07 UTC

[GitHub] [flink-web] MarkSfik commented on a change in pull request #335: Add Blog: "Flink SQL Demo: Building an End to End Streaming Application"

MarkSfik commented on a change in pull request #335:
URL: https://github.com/apache/flink-web/pull/335#discussion_r420028791



##########
File path: _posts/2020-05-03-flink-sql-demo-building-e2e-streaming-application.md
##########
@@ -0,0 +1,338 @@
+---
+layout: post
+title: "Flink SQL Demo: Building an End-to-End Streaming Application"
+date: 2020-05-03T12:00:00.000Z
+categories: news
+authors:
+- jark:
+  name: "Jark Wu"
+  twitter: "JarkWu"
+excerpt: Apache Flink 1.10 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.
+---
+
+Apache Flink 1.10.0 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.
+
+In the following sections, we describe how to integrate Kafka, MySQL, Elasticsearch, and Kibana with Flink SQL to analyze ecommerce user behavior in real-time. All exercises in this article are performed in the Flink SQL CLI, while the entire process uses plain SQL text, without a single line of Java or Scala code or IDE installation. The final result of this demo is shown in the following figure:
+
+<center>
+<img src="{{ site.baseurl }}/img/blog/2020-05-03-flink-sql-demo/image1.png" width="800px" alt="Demo Overview"/>
+</center>
+<br>
+
+# Preparation
+
+Prepare a Linux or MacOS computer with Docker and Java 8 installed. A Java environment is required because we will install and run Flink cluster in the host environment, not in a Docker container.
+
+## Use Docker Compose to Start Demo Environment
+
+The components required in this demo (except for Flink) are all managed in containers, so we will use `docker-compose` to start them. First, download the `docker-compose.yml` file that defines the demo environment, for example by running the following commands:
+
+```
+mkdir flink-demo; cd flink-demo;
+wget https://raw.githubusercontent.com/wuchong/flink-sql-demo/master/docker-compose.yml
+```
+
+The Docker Compose environment consists of the following containers:
+
+- **MySQL:** MySQL 5.7 and a `category` table in the database. The `category` table will be joined with data in Kafka to enrich the real-time data.
+- **Kafka:** It is mainly used as a data source. The DataGen component automatically writes data into a Kafka topic.
+- **Zookeeper:** This component is required by Kafka.
+- **Elasticsearch:** It is mainly used as a data sink.
+- **Kibana:** It's used to visualize the data in Elasticsearch.
+- **DataGen:** It is the data generator. After the container is started, user behavior data is automatically generated and sent to the Kafka topic. By default, 2000 data entries are generated each second for about 1.5 hours. You can modify datagen's `speedup` parameter in `docker-compose.yml` to adjust the generation rate (which takes effect after docker compose is restarted).
+
+**Important:** Before starting the containers, we recommend configuring Docker so that sufficient resources are available and the environment does not become unresponsive. We suggest running Docker at 3-4 GB memory and 3-4 CPU cores.
+
+To start all containers, run the following command in the directory that contains the `docker-compose.yml` file.
+
+```
+docker-compose up -d
+```
+
+This command automatically starts all the containers defined in the Docker Compose configuration in a detached mode. Run `docker ps` to check whether the five containers are running properly. You can also visit [http://localhost:5601/](http://localhost:5601/) to see if Kibana is running normally.
+
+Don’t forget to run the following command to stop all containers after you finished the tutorial:
+
+```
+docker-compose down
+```
+
+## Download and Install Flink Cluster
+
+We recommend to manually download and install Flink on your host system, instead of starting Flink through Docker because you’ll get a more intuitive understanding of the components, dependencies, and scripts of Flink.
+
+1. Download and decompress [Apache Flink 1.10.0](https://www.apache.org/dist/flink/flink-1.10.0/flink-1.10.0-bin-scala_2.11.tgz) into the `flink-1.10.0` directory:
+2. Go to the `flink-1.10.0` directory by running `cd flink-1.10.0`.
+3. Run the following command to download the JAR dependency package and copy it to the `lib/` directory.
+
+    ```
+wget -P ./lib/ https://repo1.maven.org/maven2/org/apache/flink/flink-json/1.10.0/flink-json-1.10.0.jar | \
+    wget -P ./lib/ https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.10.0/flink-sql-connector-kafka_2.11-1.10.0.jar | \
+    wget -P ./lib/ https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-elasticsearch6_2.11/1.10.0/flink-sql-connector-elasticsearch6_2.11-1.10.0.jar | \
+    wget -P ./lib/ https://repo1.maven.org/maven2/org/apache/flink/flink-jdbc_2.11/1.10.0/flink-jdbc_2.11-1.10.0.jar | \
+    wget -P ./lib/ https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.48/mysql-connector-java-5.1.48.jar
+```
+
+4. In `conf/flink-conf.yaml`, set `taskmanager.numberOfTaskSlots` to `10`, since during this demo we will be launching multiple jobs.
+5. Run `./bin/start-cluster.sh` to start the cluster. Check if Flink is up by accessing the Flink Web UI at [http://localhost:8081](http://localhost:8081). The number of available slots should be 10.

Review comment:
       ```suggestion
   5. Run `./bin/start-cluster.sh` to start the cluster. Check that Flink is up and running by accessing the Flink Web UI at [http://localhost:8081](http://localhost:8081). The number of available slots should be 10.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org