You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/07/31 10:49:33 UTC

[GitHub] spark pull request: [SPARK-2767] [SQL] Fix Bug of no output in cli...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/1686

    [SPARK-2767] [SQL] Fix Bug of no output in cli if exception thrown internally

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark spark_sql_cli

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1686
    
----
commit 93b0382460981e682272e04d8c7d5825b8877943
Author: Cheng Hao <ha...@intel.com>
Date:   2014-07-31T08:40:39Z

    Fix Bug of no output in cli if exception thrown internally

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50892217
  
    QA tests have started for PR 1686. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17674/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50840620
  
    @liancheng @chenghao-intel , actually jdbc not use the driver, and jdbc not crash when give a illegal sql.
    i test it use beeline like this
    0: jdbc:hive2://localhost:10000> show asfa;
    Error: org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: show asfa (state=,code=0)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50845495
  
    Discussed with @chenghao-intel offline. In Shark, `SharkServer` does rely on `SharkDriver`, but `SharkServer2` doesn't. In the case of Spark SQL, firstly we don't have a service equivalent to `SharkServer`, and `HiveThriftServer2`, which is equivalent to `SharkServer2` doesn't depend on `SparkSQLDriver`. And `SparkSQLCLIService` doesn't crash in front of bad queries, as @scwf has suggested.
    
    Just added a comment related to error reporting. Otherwise LGTM. This one is indeed cleaner than #1661. Thanks for working on this @chenghao-intel !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50843218
  
    Thanks you @scwf, that's really informative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50803116
  
    Hey @chenghao-intel, first of all, this PR is definitely cleaner then #1661. But what does sharing `SparkSQLDriver` between `SparkSQLCLIDriver` and `SparkSQLCLIService` mean? I'm not super familiar with Shark code yet, but I didn't see the related code in Shark. Did I miss something here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50733944
  
    This will also fix the bug https://github.com/apache/spark/pull/1661.
    @scwf @liancheng I think it's better fix the exception in SparkSQLDriver, not in the CLI, which probably benefit the ThriftService.
    
    BTW: @liancheng, can you also check if an invalid query will make the SparkSQLCLIService crashed? I feel most likely from the source code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50891834
  
    Thank you very much @liancheng , I've updated the code, can you check it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50733913
  
    QA tests have started for PR 1686. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17573/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50901349
  
    QA results for PR 1686:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17674/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50740375
  
    QA results for PR 1686:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17573/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50837487
  
    @liancheng , sorry for the confusing. From the architecture point of view (see http://www.slideshare.net/Hadoop_Summit/spark-and-shark page 22 & page 23), `CLI` and `JDBC Service` have lots of common functions which is done within the `Driver`. So, I think it's better to fix the #1661 in `Driver` rather than in `CLI`, currently it doesn't matter, but we need to test if JDBCService has the same bug. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1686#issuecomment-50920145
  
    Thanks! I've merged this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1686


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2767] [SQL] SparkSQL CLI doens't output...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1686#discussion_r15681232
  
    --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala ---
    @@ -53,10 +53,9 @@ private[hive] class SparkSQLDriver(val context: HiveContext = SparkSQLEnv.hiveCo
       }
     
       override def run(command: String): CommandProcessorResponse = {
    -    val execution = context.executePlan(context.hql(command).logicalPlan)
    -
         // TODO unify the error code
         try {
    +      val execution = context.executePlan(context.hql(command).logicalPlan)
    --- End diff --
    
    (Actually I'd like to add the comment to the `catch` clause of this `try` block, but GitHub doesn't allow me to.)
    
    One annoying issue here is that the exception throw is only recorded to the logger, not reported to the console. Thus, when using `bin/spark-sql`, users can only see the following unintuitive exception:
    
    ```
    $ ./bin/spark-sql
    spark-sql> foo;
    NoViableAltException(26@[])
            at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:902)
            at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
            at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
            ...
    ```
    
    Unless they run `spark-sql --hiveconf hive.root.logger=INFO,console` (similar to `shark-withinfo` in Shark):
    
    ```
    $ ./bin/spark-sql --hiveconf hive.root.logger=INFO,console
    spark-sql> foo;
    spark-sql> foo;
    14/08/01 11:17:03 INFO parse.ParseDriver: Parsing command: foo
    NoViableAltException(26@[])
            at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:902)
            at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
            at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
            ...
    14/08/01 11:17:04 ERROR thriftserver.SparkSQLDriver: Failed in [foo]
    org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: foo
            at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:214)
            at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:76)
            at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:79)
            ...
    ```
    
    To fix this, we can print the exception (if any) to console after checking response code of the `CommandProcessorResponse` object in [`SparkSQLCLIDriver`](https://github.com/apache/spark/blob/8f51491ea78d8e88fc664c2eac3b4ac14226d98f/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L291).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---