You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Chang Ya-Hsuan <su...@gmail.com> on 2015/12/08 10:25:53 UTC

Failed to generate predicate Error when using dropna

spark version: spark-1.5.2-bin-hadoop2.6
python version: 2.7.9
os: ubuntu 14.04

code to reproduce error

# write.py

import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df = sqlc.range(10)
df1 = df.withColumn('a', df['id'] * 2)
df1.write.partitionBy('id').parquet('./data')


# read.py

import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df2 = sqlc.read.parquet('./data')
df2.dropna().count()


$ spark-submit write.py
$ spark-submit read.py

# error message

15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to
interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Binding attribute, tree: a#0L
...

If write data without partitionBy, the error won't happen
any suggestion?
Thanks!

-- 
-- 張雅軒

Re: Failed to generate predicate Error when using dropna

Posted by Chang Ya-Hsuan <su...@gmail.com>.
https://issues.apache.org/jira/browse/SPARK-12231

this is my first time to create JIRA ticket.
is this ticket proper?
thanks

On Tue, Dec 8, 2015 at 9:59 PM, Reynold Xin <rx...@databricks.com> wrote:

> Can you create a JIRA ticket for this? Thanks.
>
>
> On Tue, Dec 8, 2015 at 5:25 PM, Chang Ya-Hsuan <su...@gmail.com> wrote:
>
>> spark version: spark-1.5.2-bin-hadoop2.6
>> python version: 2.7.9
>> os: ubuntu 14.04
>>
>> code to reproduce error
>>
>> # write.py
>>
>> import pyspark
>> sc = pyspark.SparkContext()
>> sqlc = pyspark.SQLContext(sc)
>> df = sqlc.range(10)
>> df1 = df.withColumn('a', df['id'] * 2)
>> df1.write.partitionBy('id').parquet('./data')
>>
>>
>> # read.py
>>
>> import pyspark
>> sc = pyspark.SparkContext()
>> sqlc = pyspark.SQLContext(sc)
>> df2 = sqlc.read.parquet('./data')
>> df2.dropna().count()
>>
>>
>> $ spark-submit write.py
>> $ spark-submit read.py
>>
>> # error message
>>
>> 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to
>> interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
>> Binding attribute, tree: a#0L
>> ...
>>
>> If write data without partitionBy, the error won't happen
>> any suggestion?
>> Thanks!
>>
>> --
>> -- 張雅軒
>>
>
>


-- 
-- 張雅軒

Re: Failed to generate predicate Error when using dropna

Posted by Reynold Xin <rx...@databricks.com>.
Can you create a JIRA ticket for this? Thanks.


On Tue, Dec 8, 2015 at 5:25 PM, Chang Ya-Hsuan <su...@gmail.com> wrote:

> spark version: spark-1.5.2-bin-hadoop2.6
> python version: 2.7.9
> os: ubuntu 14.04
>
> code to reproduce error
>
> # write.py
>
> import pyspark
> sc = pyspark.SparkContext()
> sqlc = pyspark.SQLContext(sc)
> df = sqlc.range(10)
> df1 = df.withColumn('a', df['id'] * 2)
> df1.write.partitionBy('id').parquet('./data')
>
>
> # read.py
>
> import pyspark
> sc = pyspark.SparkContext()
> sqlc = pyspark.SQLContext(sc)
> df2 = sqlc.read.parquet('./data')
> df2.dropna().count()
>
>
> $ spark-submit write.py
> $ spark-submit read.py
>
> # error message
>
> 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to
> interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
> Binding attribute, tree: a#0L
> ...
>
> If write data without partitionBy, the error won't happen
> any suggestion?
> Thanks!
>
> --
> -- 張雅軒
>