You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "Saisai Shao (JIRA)" <ji...@apache.org> on 2018/04/18 02:31:00 UTC

[jira] [Resolved] (LIVY-457) PySpark `sqlContext.sparkSession` incorrect on Spark 2.x

     [ https://issues.apache.org/jira/browse/LIVY-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Saisai Shao resolved LIVY-457.
------------------------------
       Resolution: Fixed
    Fix Version/s: 0.6.0
                   0.5.1

Issue resolved by pull request 86
[https://github.com/apache/incubator-livy/pull/86]

> PySpark `sqlContext.sparkSession` incorrect on Spark 2.x
> --------------------------------------------------------
>
>                 Key: LIVY-457
>                 URL: https://issues.apache.org/jira/browse/LIVY-457
>             Project: Livy
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>         Environment: RHEL6, Spark 2.1.2.1
>            Reporter: Dan Fike
>            Priority: Major
>             Fix For: 0.5.1, 0.6.0
>
>
> It looks like the {{SQLContext}} we create in {{PySpark}} sessions isn't constructed correctly. Compare how the behavior has changed between Livy 0.4.0 and what is currently on {{master}} (0.6.0).
> Livy 0.4.0
> {code}
> $ curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions | python -m json.tool
> $ curl --silent localhost:8998/sessions/1/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sqlContext.sparkSession"}' | python -m json.tool
> $ curl --silent localhost:8998/sessions/1/statements/0 | python -m json.tool
> {
>     "id": 0,
>     "state": "available",
>     "output": {
>         "status": "ok",
>         "execution_count": 0,
>         "data": {
>             "text/plain": "<pyspark.sql.session.SparkSession object at 0x15a26d0>"
>         }
>     },
>     "progress": 1.0
> }
> {code}
> Livy 0.6.0
> {code}
> $ curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions | python -m json.tool
> $ curl --silent localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sqlContext.sparkSession"}' | python -m json.tool
> $ curl --silent localhost:8998/sessions/0/statements/0 | python -m json.tool
> {
>     "id": 0,
>     "code": "sqlContext.sparkSession",
>     "state": "available",
>     "output": {
>         "status": "ok",
>         "execution_count": 0,
>         "data": {
>             "text/plain": "JavaObject id=o4"
>         }
>     },
>     "progress": 1.0
> }
> $ curl --silent localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sqlContext.sparkSession.toString()"}' | python -m json.tool
> $ curl --silent localhost:8998/sessions/0/statements/1 | python -m json.tool
> {
>     "id": 1,
>     "code": "sqlContext.sparkSession.toString()",
>     "state": "available",
>     "output": {
>         "status": "ok",
>         "execution_count": 1,
>         "data": {
>             "text/plain": "'org.apache.spark.sql.hive.HiveContext@200334d0'"
>         }
>     },
>     "progress": 1.0
> }
> {code}
> Notice how the value of {{sqlContext.sparkSession}} went from a {{pyspark.sql.session.SparkSession}} to a {{org.apache.spark.sql.hive.HiveContext}}?
> I suspect this is because of the change @ https://github.com/apache/incubator-livy/commit/c1aafeb6cb87f2bd7f4cb7cf538822b59fb34a9c#diff-c58e3946d3530f54014129c268988e01R563 passing {{jsqlc}} in as the second positional parameter to {{SQLContext}}, whereas the diff @ https://github.com/apache/spark/commit/89addd40abdacd65cc03ac8aa5f9cf3dd4a4c19b#diff-74ba016ef40c1cb268e14aee817d71bdR50 suggests it should be the _third_ positional parameter.
> I'd wager the fix is simply to explicitly pass that parameter as a keyword argument instead.
> {code}
> sqlc = SQLContext(sc, jsqlContext=jsqlc)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)