You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2015/07/25 21:06:04 UTC
[jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame
in pyspark
Felix Cheung created ZEPPELIN-185:
-------------------------------------
Summary: z.show does not work on DataFrame in pyspark
Key: ZEPPELIN-185
URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
Project: Zeppelin
Issue Type: Bug
Components: Core, Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung
I’ve tested this out and found these issues. Firstly,
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
# Code should be changed to this – it does not work in pyspark CLI otherwise
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
Secondly,
z.show() doesn’t seem to work properly in Python – I see the same error below: “AttributeError: 'DataFrame' object has no attribute '_get_object_id'"
#Python/PySpark – doesn’t work
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
print df
print df.collect()
z.show(df)
AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'
#Scala – this works
val a = sc.parallelize(List("1", "2", "3"))
val df = a.toDF()
z.show(df)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame
in pyspark
Posted by IT CTO <go...@gmail.com>.
Hi,
I tested this one and it works for me.
Why is the JIRA bug still open?
Eran
On Mon, Aug 10, 2015 at 7:02 PM IT CTO <go...@gmail.com> wrote:
> Greate, I did not know. I will test it tomorrow.
> Eran
>
> בתאריך יום ב׳, 10 באוג׳ 2015, 18:48 מאת Felix Cheung <
> felixcheung_m@hotmail.com>:
>
>> Could you elaborate? Are you referring to working around this issue?The
>> fix for this has been merged.
>>
>> > From: goi.cto@gmail.com
>> > Date: Mon, 10 Aug 2015 11:48:13 +0000
>> > Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on
>> DataFrame in pyspark
>> > To: dev@zeppelin.incubator.apache.org
>> >
>> > Does anyone knows how to solve this one? my users are using python and
>> > iterating through the DF each time is not useful
>> > Eran
>> >
>> > On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
>> > wrote:
>> >
>> > > Felix Cheung created ZEPPELIN-185:
>> > > -------------------------------------
>> > >
>> > > Summary: z.show does not work on DataFrame in pyspark
>> > > Key: ZEPPELIN-185
>> > > URL:
>> https://issues.apache.org/jira/browse/ZEPPELIN-185
>> > > Project: Zeppelin
>> > > Issue Type: Bug
>> > > Components: Core, Interpreters
>> > > Affects Versions: 0.6.0
>> > > Reporter: Felix Cheung
>> > > Assignee: Felix Cheung
>> > >
>> > >
>> > > I’ve tested this out and found these issues. Firstly,
>> > >
>> > >
>> > >
>> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
>> > > # Code should be changed to this – it does not work in pyspark CLI
>> > > otherwise
>> > > rdd = sc.parallelize(["1","2","3"])
>> > > Data = Row('first')
>> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>> > >
>> > > Secondly,
>> > > z.show() doesn’t seem to work properly in Python – I see the same
>> error
>> > > below: “AttributeError: 'DataFrame' object has no attribute
>> > > '_get_object_id'"
>> > > #Python/PySpark – doesn’t work
>> > > rdd = sc.parallelize(["1","2","3"])
>> > > Data = Row('first')
>> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>> > > print df
>> > > print df.collect()
>> > > z.show(df)
>> > > AttributeError: 'DataFrame' object has no attribute
>> > > ‘_get_object_id'
>> > >
>> > > #Scala – this works
>> > > val a = sc.parallelize(List("1", "2", "3"))
>> > > val df = a.toDF()
>> > > z.show(df)
>> > >
>> > >
>> > >
>> > > --
>> > > This message was sent by Atlassian JIRA
>> > > (v6.3.4#6332)
>> > >
>>
>
>
Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame
in pyspark
Posted by IT CTO <go...@gmail.com>.
Greate, I did not know. I will test it tomorrow.
Eran
בתאריך יום ב׳, 10 באוג׳ 2015, 18:48 מאת Felix Cheung <
felixcheung_m@hotmail.com>:
> Could you elaborate? Are you referring to working around this issue?The
> fix for this has been merged.
>
> > From: goi.cto@gmail.com
> > Date: Mon, 10 Aug 2015 11:48:13 +0000
> > Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on
> DataFrame in pyspark
> > To: dev@zeppelin.incubator.apache.org
> >
> > Does anyone knows how to solve this one? my users are using python and
> > iterating through the DF each time is not useful
> > Eran
> >
> > On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
> > wrote:
> >
> > > Felix Cheung created ZEPPELIN-185:
> > > -------------------------------------
> > >
> > > Summary: z.show does not work on DataFrame in pyspark
> > > Key: ZEPPELIN-185
> > > URL:
> https://issues.apache.org/jira/browse/ZEPPELIN-185
> > > Project: Zeppelin
> > > Issue Type: Bug
> > > Components: Core, Interpreters
> > > Affects Versions: 0.6.0
> > > Reporter: Felix Cheung
> > > Assignee: Felix Cheung
> > >
> > >
> > > I’ve tested this out and found these issues. Firstly,
> > >
> > >
> > >
> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
> > > # Code should be changed to this – it does not work in pyspark CLI
> > > otherwise
> > > rdd = sc.parallelize(["1","2","3"])
> > > Data = Row('first')
> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> > >
> > > Secondly,
> > > z.show() doesn’t seem to work properly in Python – I see the same error
> > > below: “AttributeError: 'DataFrame' object has no attribute
> > > '_get_object_id'"
> > > #Python/PySpark – doesn’t work
> > > rdd = sc.parallelize(["1","2","3"])
> > > Data = Row('first')
> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> > > print df
> > > print df.collect()
> > > z.show(df)
> > > AttributeError: 'DataFrame' object has no attribute
> > > ‘_get_object_id'
> > >
> > > #Scala – this works
> > > val a = sc.parallelize(List("1", "2", "3"))
> > > val df = a.toDF()
> > > z.show(df)
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
>
RE: [jira] [Created] (ZEPPELIN-185) z.show does not work on
DataFrame in pyspark
Posted by Felix Cheung <fe...@hotmail.com>.
Could you elaborate? Are you referring to working around this issue?The fix for this has been merged.
> From: goi.cto@gmail.com
> Date: Mon, 10 Aug 2015 11:48:13 +0000
> Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark
> To: dev@zeppelin.incubator.apache.org
>
> Does anyone knows how to solve this one? my users are using python and
> iterating through the DF each time is not useful
> Eran
>
> On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
> wrote:
>
> > Felix Cheung created ZEPPELIN-185:
> > -------------------------------------
> >
> > Summary: z.show does not work on DataFrame in pyspark
> > Key: ZEPPELIN-185
> > URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
> > Project: Zeppelin
> > Issue Type: Bug
> > Components: Core, Interpreters
> > Affects Versions: 0.6.0
> > Reporter: Felix Cheung
> > Assignee: Felix Cheung
> >
> >
> > I’ve tested this out and found these issues. Firstly,
> >
> >
> > http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
> > # Code should be changed to this – it does not work in pyspark CLI
> > otherwise
> > rdd = sc.parallelize(["1","2","3"])
> > Data = Row('first')
> > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> >
> > Secondly,
> > z.show() doesn’t seem to work properly in Python – I see the same error
> > below: “AttributeError: 'DataFrame' object has no attribute
> > '_get_object_id'"
> > #Python/PySpark – doesn’t work
> > rdd = sc.parallelize(["1","2","3"])
> > Data = Row('first')
> > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> > print df
> > print df.collect()
> > z.show(df)
> > AttributeError: 'DataFrame' object has no attribute
> > ‘_get_object_id'
> >
> > #Scala – this works
> > val a = sc.parallelize(List("1", "2", "3"))
> > val df = a.toDF()
> > z.show(df)
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame
in pyspark
Posted by IT CTO <go...@gmail.com>.
Does anyone knows how to solve this one? my users are using python and
iterating through the DF each time is not useful
Eran
On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
wrote:
> Felix Cheung created ZEPPELIN-185:
> -------------------------------------
>
> Summary: z.show does not work on DataFrame in pyspark
> Key: ZEPPELIN-185
> URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
> Project: Zeppelin
> Issue Type: Bug
> Components: Core, Interpreters
> Affects Versions: 0.6.0
> Reporter: Felix Cheung
> Assignee: Felix Cheung
>
>
> I’ve tested this out and found these issues. Firstly,
>
>
> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
> # Code should be changed to this – it does not work in pyspark CLI
> otherwise
> rdd = sc.parallelize(["1","2","3"])
> Data = Row('first')
> df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>
> Secondly,
> z.show() doesn’t seem to work properly in Python – I see the same error
> below: “AttributeError: 'DataFrame' object has no attribute
> '_get_object_id'"
> #Python/PySpark – doesn’t work
> rdd = sc.parallelize(["1","2","3"])
> Data = Row('first')
> df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> print df
> print df.collect()
> z.show(df)
> AttributeError: 'DataFrame' object has no attribute
> ‘_get_object_id'
>
> #Scala – this works
> val a = sc.parallelize(List("1", "2", "3"))
> val df = a.toDF()
> z.show(df)
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>