You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2016/06/16 20:19:05 UTC

[jira] [Resolved] (SPARK-15981) Fix bug in python DataStreamReader

     [ https://issues.apache.org/jira/browse/SPARK-15981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shixiong Zhu resolved SPARK-15981.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

> Fix bug in python DataStreamReader
> ----------------------------------
>
>                 Key: SPARK-15981
>                 URL: https://issues.apache.org/jira/browse/SPARK-15981
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> Bug in Python DataStreamReader API made it unusable. Because a single path was being converted to a array before calling Java DataStreamReader method (which takes a string only), it gave the following error. 
> {code}
> File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", line 947, in pyspark.sql.readwriter.DataStreamReader.json
> Failed example:
>     json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 'data'),                 schema = sdf_schema)
> Exception raised:
>     Traceback (most recent call last):
>       File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run
>         compileflags, 1) in test.globs
>       File "<doctest pyspark.sql.readwriter.DataStreamReader.json[0]>", line 1, in <module>
>         json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 'data'),                 schema = sdf_schema)
>       File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", line 963, in json
>         return self._df(self._jreader.json(path))
>       File "/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
>         answer, self.gateway_client, self.target_id, self.name)
>       File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/utils.py", line 63, in deco
>         return f(*a, **kw)
>       File "/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 316, in get_return_value
>         format(target_id, ".", name, value))
>     Py4JError: An error occurred while calling o121.json. Trace:
>     py4j.Py4JException: Method json([class java.util.ArrayList]) does not exist
>     	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
>     	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
>     	at py4j.Gateway.invoke(Gateway.java:272)
>     	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
>     	at py4j.commands.CallCommand.execute(CallCommand.java:79)
>     	at py4j.GatewayConnection.run(GatewayConnection.java:211)
>     	at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org