You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/08/24 02:50:04 UTC

[GitHub] [flink] dianfu commented on a change in pull request #13206: [FLINK-18948][python] Add end to end test for Python DataStream API.

dianfu commented on a change in pull request #13206:
URL: https://github.com/apache/flink/pull/13206#discussion_r475312734



##########
File path: flink-end-to-end-tests/test-scripts/test_pyflink_datastream.sh
##########
@@ -0,0 +1,177 @@
+#!/usr/bin/env bash
+################################################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+set -Eeuo pipefail
+
+KAFKA_VERSION="2.2.0"
+CONFLUENT_VERSION="5.0.0"
+CONFLUENT_MAJOR_VERSION="5.0"
+KAFKA_SQL_VERSION="universal"
+SQL_JARS_DIR=$END_TO_END_DIR/flink-sql-client-test/target/sql-jars
+KAFKA_SQL_JAR=$(find "$SQL_JARS_DIR" | grep "kafka_" )
+
+function create_data_stream_kafka_source_1 {
+    topicName="test-data-stream-source"

Review comment:
       ```suggestion
       topicName="test-python-data-stream-source"
   ```

##########
File path: flink-end-to-end-tests/flink-python-test/python/datastream/isolated_functions.py
##########
@@ -0,0 +1,40 @@
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+from pyflink.datastream.functions import CoMapFunction
+
+
+def calc_str_len(value):

Review comment:
       What about remove the prefix **calc_**? It's a term from SQL.

##########
File path: flink-end-to-end-tests/run-nightly-tests.sh
##########
@@ -217,6 +217,7 @@ run_test "Shaded Hadoop S3A with credentials provider end-to-end test" "$END_TO_
 
 if [[ `uname -i` != 'aarch64' ]]; then
     run_test "PyFlink end-to-end test" "$END_TO_END_DIR/test-scripts/test_pyflink.sh" "skip_check_exceptions"

Review comment:
       ```suggestion
       run_test "PyFlink Table end-to-end test" "$END_TO_END_DIR/test-scripts/test_pyflink_table.sh" "skip_check_exceptions"
   ```

##########
File path: flink-end-to-end-tests/test-scripts/test_pyflink_datastream.sh
##########
@@ -0,0 +1,177 @@
+#!/usr/bin/env bash
+################################################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+set -Eeuo pipefail
+
+KAFKA_VERSION="2.2.0"
+CONFLUENT_VERSION="5.0.0"
+CONFLUENT_MAJOR_VERSION="5.0"
+KAFKA_SQL_VERSION="universal"
+SQL_JARS_DIR=$END_TO_END_DIR/flink-sql-client-test/target/sql-jars
+KAFKA_SQL_JAR=$(find "$SQL_JARS_DIR" | grep "kafka_" )
+
+function create_data_stream_kafka_source_1 {

Review comment:
       ```suggestion
   function create_data_stream_kafka_source {
   ```

##########
File path: flink-end-to-end-tests/flink-python-test/python/datastream/data_stream_job.py
##########
@@ -0,0 +1,59 @@
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+from pyflink.common.serialization import JsonRowSerializationSchema, \
+    JsonRowDeserializationSchema
+from pyflink.common.typeinfo import Types
+from pyflink.datastream import StreamExecutionEnvironment
+from pyflink.datastream.connectors import FlinkKafkaProducer, FlinkKafkaConsumer
+
+from isolated_functions import m_flat_map, calc_add_one
+
+
+def my_data_stream_job():

Review comment:
       rename to `python_data_stream_example`?

##########
File path: flink-end-to-end-tests/flink-python-test/python/datastream/isolated_functions.py
##########
@@ -0,0 +1,40 @@
+################################################################################

Review comment:
       What about remove the prefix **isolated_**? It's a little confusing.

##########
File path: flink-end-to-end-tests/flink-python-test/python/datastream/isolated_functions.py
##########
@@ -0,0 +1,40 @@
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+from pyflink.datastream.functions import CoMapFunction
+
+
+def calc_str_len(value):
+    return value[0], len(value[0]), value[1]
+
+
+def calc_add_one(value):
+    return value[0], value[1]+1, value[1]

Review comment:
       ```suggestion
       return value[0], value[1] + 1, value[1]
   ```

##########
File path: flink-end-to-end-tests/flink-python-test/python/datastream/isolated_functions.py
##########
@@ -0,0 +1,40 @@
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+from pyflink.datastream.functions import CoMapFunction
+
+
+def calc_str_len(value):
+    return value[0], len(value[0]), value[1]
+
+
+def calc_add_one(value):

Review comment:
       ditto

##########
File path: flink-end-to-end-tests/test-scripts/test_pyflink_datastream.sh
##########
@@ -0,0 +1,177 @@
+#!/usr/bin/env bash
+################################################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+set -Eeuo pipefail
+
+KAFKA_VERSION="2.2.0"
+CONFLUENT_VERSION="5.0.0"
+CONFLUENT_MAJOR_VERSION="5.0"
+KAFKA_SQL_VERSION="universal"
+SQL_JARS_DIR=$END_TO_END_DIR/flink-sql-client-test/target/sql-jars
+KAFKA_SQL_JAR=$(find "$SQL_JARS_DIR" | grep "kafka_" )
+
+function create_data_stream_kafka_source_1 {
+    topicName="test-data-stream-source"
+
+    echo "Sending messages to Kafka..."
+
+    send_messages_to_kafka '{"f0": "a", "f1": 1}' $topicName
+    send_messages_to_kafka '{"f0": "ab", "f1": 2}' $topicName
+    send_messages_to_kafka '{"f0": "abc", "f1": 3}' $topicName
+    send_messages_to_kafka '{"f0": "abcd", "f1": 4}' $topicName
+    send_messages_to_kafka '{"f0": "abcde", "f1": 5}' $topicName
+}
+
+function sort_msg {
+    arr=()
+    while read line
+    do
+        value=$line
+        arr+=("$value")
+    done <<< "$1"
+    IFS=$'\n' sorted=($(sort <<< "${arr[*]}")); unset IFS
+    echo "${sorted[*]}"
+}
+
+function test_clean_up {
+    stop_kafka_cluster
+}
+
+source "$(dirname "$0")"/common.sh
+source "$(dirname "$0")"/kafka_sql_common.sh \
+  $KAFKA_VERSION \
+  $CONFLUENT_VERSION \
+  $CONFLUENT_MAJOR_VERSION \
+  $KAFKA_SQL_VERSION
+
+
+echo "Preparing Flink..."
+
+CURRENT_DIR=`cd "$(dirname "$0")" && pwd -P`
+source "${CURRENT_DIR}"/common.sh
+
+cp -r "${FLINK_DIR}/conf" "${TEST_DATA_DIR}/conf"
+
+echo "taskmanager.memory.task.off-heap.size: 768m" >> "${TEST_DATA_DIR}/conf/flink-conf.yaml"
+echo "taskmanager.memory.process.size: 3172m" >> "${TEST_DATA_DIR}/conf/flink-conf.yaml"
+echo "taskmanager.numberOfTaskSlots: 5" >> "${TEST_DATA_DIR}/conf/flink-conf.yaml"
+
+export FLINK_CONF_DIR="${TEST_DATA_DIR}/conf"
+
+FLINK_PYTHON_DIR=`cd "${CURRENT_DIR}/../../flink-python" && pwd -P`
+
+CONDA_HOME="${FLINK_PYTHON_DIR}/dev/.conda"
+
+"${FLINK_PYTHON_DIR}/dev/lint-python.sh" -s basic
+
+PYTHON_EXEC="${CONDA_HOME}/bin/python"
+
+source "${CONDA_HOME}/bin/activate"
+
+cd "${FLINK_PYTHON_DIR}"
+
+rm -rf dist
+
+python setup.py sdist
+
+pip install dist/*
+
+cd dev
+
+conda install -y -q zip=3.0
+
+rm -rf .conda/pkgs
+
+zip -q -r "${TEST_DATA_DIR}/venv.zip" .conda
+
+deactivate
+
+cd "${CURRENT_DIR}"
+
+start_cluster
+
+on_exit test_clean_up
+
+# prepare Kafka
+echo "Preparing Kafka..."
+
+setup_kafka_dist
+
+start_kafka_cluster
+
+create_kafka_topic 1 1 test-data-stream-sink

Review comment:
       move this into **create_data_stream_kafka_source_1**?

##########
File path: flink-end-to-end-tests/test-scripts/test_pyflink_datastream.sh
##########
@@ -0,0 +1,177 @@
+#!/usr/bin/env bash
+################################################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+set -Eeuo pipefail
+
+KAFKA_VERSION="2.2.0"
+CONFLUENT_VERSION="5.0.0"
+CONFLUENT_MAJOR_VERSION="5.0"
+KAFKA_SQL_VERSION="universal"
+SQL_JARS_DIR=$END_TO_END_DIR/flink-sql-client-test/target/sql-jars
+KAFKA_SQL_JAR=$(find "$SQL_JARS_DIR" | grep "kafka_" )
+
+function create_data_stream_kafka_source_1 {
+    topicName="test-data-stream-source"
+
+    echo "Sending messages to Kafka..."
+
+    send_messages_to_kafka '{"f0": "a", "f1": 1}' $topicName
+    send_messages_to_kafka '{"f0": "ab", "f1": 2}' $topicName
+    send_messages_to_kafka '{"f0": "abc", "f1": 3}' $topicName
+    send_messages_to_kafka '{"f0": "abcd", "f1": 4}' $topicName
+    send_messages_to_kafka '{"f0": "abcde", "f1": 5}' $topicName
+}
+
+function sort_msg {
+    arr=()
+    while read line
+    do
+        value=$line
+        arr+=("$value")
+    done <<< "$1"
+    IFS=$'\n' sorted=($(sort <<< "${arr[*]}")); unset IFS
+    echo "${sorted[*]}"
+}
+
+function test_clean_up {
+    stop_kafka_cluster
+}
+
+source "$(dirname "$0")"/common.sh
+source "$(dirname "$0")"/kafka_sql_common.sh \
+  $KAFKA_VERSION \
+  $CONFLUENT_VERSION \
+  $CONFLUENT_MAJOR_VERSION \
+  $KAFKA_SQL_VERSION
+
+
+echo "Preparing Flink..."
+
+CURRENT_DIR=`cd "$(dirname "$0")" && pwd -P`
+source "${CURRENT_DIR}"/common.sh
+
+cp -r "${FLINK_DIR}/conf" "${TEST_DATA_DIR}/conf"
+
+echo "taskmanager.memory.task.off-heap.size: 768m" >> "${TEST_DATA_DIR}/conf/flink-conf.yaml"
+echo "taskmanager.memory.process.size: 3172m" >> "${TEST_DATA_DIR}/conf/flink-conf.yaml"
+echo "taskmanager.numberOfTaskSlots: 5" >> "${TEST_DATA_DIR}/conf/flink-conf.yaml"
+
+export FLINK_CONF_DIR="${TEST_DATA_DIR}/conf"
+
+FLINK_PYTHON_DIR=`cd "${CURRENT_DIR}/../../flink-python" && pwd -P`
+
+CONDA_HOME="${FLINK_PYTHON_DIR}/dev/.conda"
+
+"${FLINK_PYTHON_DIR}/dev/lint-python.sh" -s basic
+
+PYTHON_EXEC="${CONDA_HOME}/bin/python"
+
+source "${CONDA_HOME}/bin/activate"
+
+cd "${FLINK_PYTHON_DIR}"
+
+rm -rf dist
+
+python setup.py sdist
+
+pip install dist/*
+
+cd dev
+
+conda install -y -q zip=3.0
+
+rm -rf .conda/pkgs
+
+zip -q -r "${TEST_DATA_DIR}/venv.zip" .conda
+
+deactivate
+
+cd "${CURRENT_DIR}"
+
+start_cluster

Review comment:
       It seems that `stop_cluster` is never called?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org