You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hurshal Patel (JIRA)" <ji...@apache.org> on 2015/12/16 00:36:46 UTC
[jira] [Updated] (SPARK-12348) PySpark _inferSchema crashes with incorrect exception on an empty RDD

     [ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hurshal Patel updated SPARK-12348:
----------------------------------
    Description: 
{code}
>>> rdd = sc.emptyRDD()
>>> df = sqlContext.createDataFrame(rdd)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 404, in createDataFrame
    rdd, schema = self._createFromRDD(data, schema, samplingRatio)
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 285, in _createFromRDD
    struct = self._inferSchema(rdd, samplingRatio)
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 229, in _inferSchema
    first = rdd.first()
  File "/home/memsql/spark/python/pyspark/rdd.py", line 1320, in first
    raise ValueError("RDD is empty")
ValueError: RDD is empty
{code}
throws "RDD is empty" in rdd.first() instead of the intended message "The first row in RDD is empty, can not infer schema" in sqlContext._inferSchema

  was:
{code:python}
>>> rdd = sc.emptyRDD()
>>> df = sqlContext.createDataFrame(rdd)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 404, in createDataFrame
    rdd, schema = self._createFromRDD(data, schema, samplingRatio)
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 285, in _createFromRDD
    struct = self._inferSchema(rdd, samplingRatio)
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 229, in _inferSchema
    first = rdd.first()
  File "/home/memsql/spark/python/pyspark/rdd.py", line 1320, in first
    raise ValueError("RDD is empty")
ValueError: RDD is empty
{code}
throws "RDD is empty" in rdd.first() instead of the intended message "The first row in RDD is empty, can not infer schema" in sqlContext._inferSchema


> PySpark _inferSchema crashes with incorrect exception on an empty RDD
> ---------------------------------------------------------------------
>
>                 Key: SPARK-12348
>                 URL: https://issues.apache.org/jira/browse/SPARK-12348
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.0
>            Reporter: Hurshal Patel
>            Priority: Minor
>
> {code}
> >>> rdd = sc.emptyRDD()
> >>> df = sqlContext.createDataFrame(rdd)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/memsql/spark/python/pyspark/sql/context.py", line 404, in createDataFrame
>     rdd, schema = self._createFromRDD(data, schema, samplingRatio)
>   File "/home/memsql/spark/python/pyspark/sql/context.py", line 285, in _createFromRDD
>     struct = self._inferSchema(rdd, samplingRatio)
>   File "/home/memsql/spark/python/pyspark/sql/context.py", line 229, in _inferSchema
>     first = rdd.first()
>   File "/home/memsql/spark/python/pyspark/rdd.py", line 1320, in first
>     raise ValueError("RDD is empty")
> ValueError: RDD is empty
> {code}
> throws "RDD is empty" in rdd.first() instead of the intended message "The first row in RDD is empty, can not infer schema" in sqlContext._inferSchema



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org