You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by twalthr <gi...@git.apache.org> on 2018/04/18 09:58:32 UTC

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/5867

    [FLINK-8686] [sql-client] Improve basic embedded SQL client

    ## What is the purpose of the change
    
    This PR covers the following minor improvements:
    
    - Configure JVM heap size
    - "The input is invalid please check it again." => add allowed range
    - Add more logging instead of swallowing exceptions
    - List properties in error message about missing TS factory sorted by name
    - Add switch to show full stacktraces of exceptions (solved through logging)
    - Minor improvements
    
    
    ## Brief change log
    
    Various changes.
    
    
    ## Verifying this change
    
    Manually verified and existing tests adapted.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? yes
      - If yes, how is the feature documented? JavaDocs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink FLINK-8686

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5867
    
----
commit 39064a5c6b94438d495e93663c00ec6d1b057672
Author: Timo Walther <tw...@...>
Date:   2018-04-16T11:40:53Z

    [FLINK-8686] [sql-client] Fix invalid cached session properties

commit dbc8bead807c603f23c2708de33e30b4815cb3bd
Author: Timo Walther <tw...@...>
Date:   2018-04-16T12:02:18Z

    [FLINK-8686] [sql-client] Sort given table source properties

commit 224b34167eb2e8c093acf823abdd8fdaa5bb295b
Author: Timo Walther <tw...@...>
Date:   2018-04-16T15:25:12Z

    [FLINK-8686] [sql-client] Add logging for exceptions and cluster communication

commit d575f2c01283bd18de352bdc62c9c7d3ba5f075a
Author: Timo Walther <tw...@...>
Date:   2018-04-16T15:33:00Z

    [FLINK-8686] [sql-client] Add valid page range

commit f3205c935937330241e4157905ba6c6d4b9166f4
Author: Timo Walther <tw...@...>
Date:   2018-04-16T16:02:47Z

    [FLINK-8686] [sql-client] Fix highlighting during result refresh

commit a195924ed748d25641fc4c7073adbf3f4a42f0c8
Author: Timo Walther <tw...@...>
Date:   2018-04-17T16:25:35Z

    [FLINK-8686] [sql-client] Make JVM heap size configurable

----


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r183729075
  
    --- Diff: flink-libraries/flink-sql-client/src/main/java/org/apache/flink/table/client/config/Environment.java ---
    @@ -91,6 +92,22 @@ public Deployment getDeployment() {
     		return deployment;
     	}
     
    +	public String explain() {
    --- End diff --
    
    I would not call this method `explain()` but maybe `showEnvironmentConfig()` or similar.
    In the context of DBMS and SQL, EXPLAIN is used to show the execution plan of a query.


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r183746951
  
    --- Diff: flink-libraries/flink-sql-client/src/main/java/org/apache/flink/table/client/config/Environment.java ---
    @@ -91,6 +92,22 @@ public Deployment getDeployment() {
     		return deployment;
     	}
     
    +	public String explain() {
    +		final StringBuilder sb = new StringBuilder();
    +		sb.append("===================== Tables =====================\n");
    +		tables.forEach((name, table) -> {
    +			sb.append("- name: ").append(name).append("\n");
    +			final DescriptorProperties props = new DescriptorProperties(true);
    +			table.addProperties(props);
    --- End diff --
    
    Should we make a copy of `table` before modifying it?


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r183721315
  
    --- Diff: flink-libraries/flink-sql-client/bin/sql-client.sh ---
    @@ -44,20 +44,32 @@ bin=`dirname "$target"`
     . "$bin"/config.sh
     
     if [ "$FLINK_IDENT_STRING" = "" ]; then
    -        FLINK_IDENT_STRING="$USER"
    +    FLINK_IDENT_STRING="$USER"
     fi
     
     CC_CLASSPATH=`constructFlinkClassPath`
     
    +export FLINK_ROOT_DIR
    +export FLINK_CONF_DIR
    +
     ################################################################################
    -# SQL client specific logic
    +# SQL Client CLI specific logic
     ################################################################################
     
    -log=$FLINK_LOG_DIR/flink-$FLINK_IDENT_STRING-sql-client-$HOSTNAME.log
    +log=$FLINK_LOG_DIR/flink-$FLINK_IDENT_STRING-sql-client-cli-$HOSTNAME.log
     log_setting=(-Dlog.file="$log" -Dlog4j.configuration=file:"$FLINK_CONF_DIR"/log4j-cli.properties -Dlogback.configurationFile=file:"$FLINK_CONF_DIR"/logback.xml)
     
    -export FLINK_ROOT_DIR
    -export FLINK_CONF_DIR
    +if [[ ! ${FLINK_SCC_HEAP} =~ ${IS_NUMBER} ]] || [[ "${FLINK_SCC_HEAP}" -lt "0" ]]; then
    +    echo "[ERROR] Configured SQL Client CLI JVM heap size is not a number. Please set '${KEY_SCC_MEM_SIZE}' in ${FLINK_CONF_FILE}."
    +    exit 1
    +fi
    +
    +if [ "${FLINK_SCC_HEAP}" -gt "0" ]; then
    +    export JVM_ARGS="$JVM_ARGS -Xms"$FLINK_SCC_HEAP"m -Xmx"$FLINK_SCC_HEAP"m"
    --- End diff --
    
    Yes, I think that's a valid concern. 
    Reusing the variable here, might render it unusable to define common options for TMs and JMs. 


---

[GitHub] flink issue #5867: [FLINK-8686] [sql-client] Improve basic embedded SQL clie...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/5867
  
    @fhueske @walterddr @suez1224 I added the possibility to make the JVM heap size configurable. After a short offline discussion with @StephanEwen, I'm not sure if we should add such configuration to the flink-conf.yaml or in a separate configuration file. I don't know which configuration we will need in the future outside of the SQL Client environment files. I don't know if it is worth to have a separate file just for JVM options. Or we hard code it into the sql-client.sh script that can be modified by a variable. What do you think?


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5867


---

[GitHub] flink issue #5867: [FLINK-8686] [sql-client] Improve basic embedded SQL clie...

Posted by walterddr <gi...@git.apache.org>.
Github user walterddr commented on the issue:

    https://github.com/apache/flink/pull/5867
  
    Hi @twalthr, having JVM heap size configurable is definitely a great benefit. Just to clarify, this is only changing the Client JVM heap size, correct? I am assuming this is mainly for manipulating large size data when retrieving results from the query. 
    
    I think adding it to `flink-conf.yaml` should be fine. There are already specific configurations commented out for components like Kerberos, ZK, etc. which makes adding a specific conf to sql-client less disturbing. We can always break it out to another file later when extra configuration files are needed for sql-client in the future. 
    
    What do you guys think?


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by suez1224 <gi...@git.apache.org>.
Github user suez1224 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r182650740
  
    --- Diff: flink-libraries/flink-sql-client/bin/sql-client.sh ---
    @@ -44,20 +44,32 @@ bin=`dirname "$target"`
     . "$bin"/config.sh
     
     if [ "$FLINK_IDENT_STRING" = "" ]; then
    -        FLINK_IDENT_STRING="$USER"
    +    FLINK_IDENT_STRING="$USER"
     fi
     
     CC_CLASSPATH=`constructFlinkClassPath`
     
    +export FLINK_ROOT_DIR
    +export FLINK_CONF_DIR
    +
     ################################################################################
    -# SQL client specific logic
    +# SQL Client CLI specific logic
     ################################################################################
     
    -log=$FLINK_LOG_DIR/flink-$FLINK_IDENT_STRING-sql-client-$HOSTNAME.log
    +log=$FLINK_LOG_DIR/flink-$FLINK_IDENT_STRING-sql-client-cli-$HOSTNAME.log
     log_setting=(-Dlog.file="$log" -Dlog4j.configuration=file:"$FLINK_CONF_DIR"/log4j-cli.properties -Dlogback.configurationFile=file:"$FLINK_CONF_DIR"/logback.xml)
     
    -export FLINK_ROOT_DIR
    -export FLINK_CONF_DIR
    +if [[ ! ${FLINK_SCC_HEAP} =~ ${IS_NUMBER} ]] || [[ "${FLINK_SCC_HEAP}" -lt "0" ]]; then
    +    echo "[ERROR] Configured SQL Client CLI JVM heap size is not a number. Please set '${KEY_SCC_MEM_SIZE}' in ${FLINK_CONF_FILE}."
    +    exit 1
    +fi
    +
    +if [ "${FLINK_SCC_HEAP}" -gt "0" ]; then
    +    export JVM_ARGS="$JVM_ARGS -Xms"$FLINK_SCC_HEAP"m -Xmx"$FLINK_SCC_HEAP"m"
    --- End diff --
    
    Do we need to inherit $JVM_ARGS here? The JVM_ARGS from config.sh is used for the JVMs of JobManager and TaskManagers as suggested in the comment. The sql client might inherit some unwanted JVM options.


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r184099775
  
    --- Diff: flink-dist/src/main/resources/flink-conf.yaml ---
    @@ -229,17 +229,26 @@ web.port: 8081
     
     # Directory to upload completed jobs to. Add this directory to the list of
     # monitored directories of the HistoryServer as well (see below).
    -#jobmanager.archive.fs.dir: hdfs:///completed-jobs/
    +# jobmanager.archive.fs.dir: hdfs:///completed-jobs/
     
     # The address under which the web-based HistoryServer listens.
    -#historyserver.web.address: 0.0.0.0
    +# historyserver.web.address: 0.0.0.0
     
     # The port under which the web-based HistoryServer listens.
    -#historyserver.web.port: 8082
    +# historyserver.web.port: 8082
     
     # Comma separated list of directories to monitor for completed jobs.
    -#historyserver.archive.fs.dir: hdfs:///completed-jobs/
    +# historyserver.archive.fs.dir: hdfs:///completed-jobs/
     
     # Interval in milliseconds for refreshing the monitored directories.
    -#historyserver.archive.fs.refresh-interval: 10000
    +# historyserver.archive.fs.refresh-interval: 10000
    +
    +#==============================================================================
    +# SQL Client
    +#==============================================================================
    +
    +# The SQL Client CLI can be started via bin/sql-client.sh embedded
    +
    +# The heap size for the SQL Client CLI JVM
    +# sqlclient.cli.heap.mb: 1024
    --- End diff --
    
    OK. But there's another discussion thread whether it is useful to share properties across JM/TM JVMs and client processes.


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r183748841
  
    --- Diff: flink-dist/src/main/resources/flink-conf.yaml ---
    @@ -229,17 +229,26 @@ web.port: 8081
     
     # Directory to upload completed jobs to. Add this directory to the list of
     # monitored directories of the HistoryServer as well (see below).
    -#jobmanager.archive.fs.dir: hdfs:///completed-jobs/
    +# jobmanager.archive.fs.dir: hdfs:///completed-jobs/
     
     # The address under which the web-based HistoryServer listens.
    -#historyserver.web.address: 0.0.0.0
    +# historyserver.web.address: 0.0.0.0
     
     # The port under which the web-based HistoryServer listens.
    -#historyserver.web.port: 8082
    +# historyserver.web.port: 8082
     
     # Comma separated list of directories to monitor for completed jobs.
    -#historyserver.archive.fs.dir: hdfs:///completed-jobs/
    +# historyserver.archive.fs.dir: hdfs:///completed-jobs/
     
     # Interval in milliseconds for refreshing the monitored directories.
    -#historyserver.archive.fs.refresh-interval: 10000
    +# historyserver.archive.fs.refresh-interval: 10000
    +
    +#==============================================================================
    +# SQL Client
    +#==============================================================================
    +
    +# The SQL Client CLI can be started via bin/sql-client.sh embedded
    +
    +# The heap size for the SQL Client CLI JVM
    +# sqlclient.cli.heap.mb: 1024
    --- End diff --
    
    Regarding the question whether to add this property of the `flink-conf.yaml` or a CLI client specific file, are there other (existing) properties in the `flink-conf.yaml` file that are used for the CLI client, besides the connection info to the cluster?


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r184096235
  
    --- Diff: flink-dist/src/main/resources/flink-conf.yaml ---
    @@ -229,17 +229,26 @@ web.port: 8081
     
     # Directory to upload completed jobs to. Add this directory to the list of
     # monitored directories of the HistoryServer as well (see below).
    -#jobmanager.archive.fs.dir: hdfs:///completed-jobs/
    +# jobmanager.archive.fs.dir: hdfs:///completed-jobs/
     
     # The address under which the web-based HistoryServer listens.
    -#historyserver.web.address: 0.0.0.0
    +# historyserver.web.address: 0.0.0.0
     
     # The port under which the web-based HistoryServer listens.
    -#historyserver.web.port: 8082
    +# historyserver.web.port: 8082
     
     # Comma separated list of directories to monitor for completed jobs.
    -#historyserver.archive.fs.dir: hdfs:///completed-jobs/
    +# historyserver.archive.fs.dir: hdfs:///completed-jobs/
     
     # Interval in milliseconds for refreshing the monitored directories.
    -#historyserver.archive.fs.refresh-interval: 10000
    +# historyserver.archive.fs.refresh-interval: 10000
    +
    +#==============================================================================
    +# SQL Client
    +#==============================================================================
    +
    +# The SQL Client CLI can be started via bin/sql-client.sh embedded
    +
    +# The heap size for the SQL Client CLI JVM
    +# sqlclient.cli.heap.mb: 1024
    --- End diff --
    
    If the user wants to specify global JVM options for all Flink scripts, this also happens in the `flink-conf.yaml`.


---

[GitHub] flink issue #5867: [FLINK-8686] [sql-client] Improve basic embedded SQL clie...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/5867
  
    Thank you for the feedback. I will merge the first 5 commits and will open a new PR for an additional configuration file.


---

[GitHub] flink issue #5867: [FLINK-8686] [sql-client] Improve basic embedded SQL clie...

Posted by suez1224 <gi...@git.apache.org>.
Github user suez1224 commented on the issue:

    https://github.com/apache/flink/pull/5867
  
    Hi @twalthr ,
    
    IMO, it's better to seperate the configuration of SQL client from the configuration of the Flink job. From the user's perspective, many of them probably do not need to know or use SQL client, putting a SQL client section in flink-conf.yaml will just confuse the users because they might think that's part of their flink job configuration. 
    
    I suggest, as you already mentioned, we can either 1) add some variable in the sql-client.sh to allow the advanced users to modify the JVM ARGS within the script or 2) add a sql-client-conf.yaml to allow advanced user to edit it.
    
    Assuming that it is uncommon to modify the JVM options of the sql client, I think both options are fine, but I slightly prefer option 2 since sql-client.sh is not supposed to modified by the users.


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r183743259
  
    --- Diff: flink-dist/src/main/flink-bin/bin/config.sh ---
    @@ -95,6 +95,8 @@ DEFAULT_ENV_LOG_MAX=5                               # Maximum number of old log
     DEFAULT_ENV_JAVA_OPTS=""                            # Optional JVM args
     DEFAULT_ENV_JAVA_OPTS_JM=""                         # Optional JVM args (JobManager)
     DEFAULT_ENV_JAVA_OPTS_TM=""                         # Optional JVM args (TaskManager)
    +DEFAULT_ENV_JAVA_OPTS_SCC=""                        # Optional JVM args (SQL Client CLI)
    +DEFAULT_ENV_JAVA_OPTS_SCG=""                        # Optional JVM args (SQL Client Gateway)
    --- End diff --
    
    The Gateway parameter is not used yet. I think we can added when we need it


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r184098778
  
    --- Diff: flink-libraries/flink-sql-client/src/main/java/org/apache/flink/table/client/config/Environment.java ---
    @@ -91,6 +92,22 @@ public Deployment getDeployment() {
     		return deployment;
     	}
     
    +	public String explain() {
    +		final StringBuilder sb = new StringBuilder();
    +		sb.append("===================== Tables =====================\n");
    +		tables.forEach((name, table) -> {
    +			sb.append("- name: ").append(name).append("\n");
    +			final DescriptorProperties props = new DescriptorProperties(true);
    +			table.addProperties(props);
    --- End diff --
    
    The properties are added to `props` not `table`. Maybe the name is a bit misleading.


---

[GitHub] flink pull request #5867: [FLINK-8686] [sql-client] Improve basic embedded S...

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5867#discussion_r184098177
  
    --- Diff: flink-libraries/flink-sql-client/src/main/java/org/apache/flink/table/client/config/Environment.java ---
    @@ -91,6 +92,22 @@ public Deployment getDeployment() {
     		return deployment;
     	}
     
    +	public String explain() {
    --- End diff --
    
    I will just use the `toString` method instead.


---