You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2015/07/25 21:06:04 UTC

[jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark

Felix Cheung created ZEPPELIN-185:
-------------------------------------

             Summary: z.show does not work on DataFrame in pyspark
                 Key: ZEPPELIN-185
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
             Project: Zeppelin
          Issue Type: Bug
          Components: Core, Interpreters
    Affects Versions: 0.6.0
            Reporter: Felix Cheung
            Assignee: Felix Cheung


I’ve tested this out and found these issues. Firstly,

http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
# Code should be changed to this – it does not work in pyspark CLI otherwise
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))

Secondly,
z.show() doesn’t seem to work properly in Python – I see the same error below: “AttributeError: 'DataFrame' object has no attribute '_get_object_id'"
#Python/PySpark – doesn’t work
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
print df
print df.collect()
z.show(df)
	AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'

#Scala – this works
val a = sc.parallelize(List("1", "2", "3"))
val df = a.toDF()
z.show(df)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark

Posted by IT CTO <go...@gmail.com>.

Hi,
I tested this one and it works for me.
Why is the JIRA bug still open?
Eran

On Mon, Aug 10, 2015 at 7:02 PM IT CTO <go...@gmail.com> wrote:

> Greate, I did not know. I will test it tomorrow.
> Eran
>
> בתאריך יום ב׳, 10 באוג׳ 2015, 18:48 מאת Felix Cheung <
> felixcheung_m@hotmail.com>:
>
>> Could you elaborate? Are you referring to working around this issue?The
>> fix for this has been merged.
>>
>> > From: goi.cto@gmail.com
>> > Date: Mon, 10 Aug 2015 11:48:13 +0000
>> > Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on
>> DataFrame in pyspark
>> > To: dev@zeppelin.incubator.apache.org
>> >
>> > Does anyone knows how to solve this one? my users are using python and
>> > iterating through the DF each time is not useful
>> > Eran
>> >
>> > On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
>> > wrote:
>> >
>> > > Felix Cheung created ZEPPELIN-185:
>> > > -------------------------------------
>> > >
>> > >              Summary: z.show does not work on DataFrame in pyspark
>> > >                  Key: ZEPPELIN-185
>> > >                  URL:
>> https://issues.apache.org/jira/browse/ZEPPELIN-185
>> > >              Project: Zeppelin
>> > >           Issue Type: Bug
>> > >           Components: Core, Interpreters
>> > >     Affects Versions: 0.6.0
>> > >             Reporter: Felix Cheung
>> > >             Assignee: Felix Cheung
>> > >
>> > >
>> > > I’ve tested this out and found these issues. Firstly,
>> > >
>> > >
>> > >
>> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
>> > > # Code should be changed to this – it does not work in pyspark CLI
>> > > otherwise
>> > > rdd = sc.parallelize(["1","2","3"])
>> > > Data = Row('first')
>> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>> > >
>> > > Secondly,
>> > > z.show() doesn’t seem to work properly in Python – I see the same
>> error
>> > > below: “AttributeError: 'DataFrame' object has no attribute
>> > > '_get_object_id'"
>> > > #Python/PySpark – doesn’t work
>> > > rdd = sc.parallelize(["1","2","3"])
>> > > Data = Row('first')
>> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>> > > print df
>> > > print df.collect()
>> > > z.show(df)
>> > >         AttributeError: 'DataFrame' object has no attribute
>> > > ‘_get_object_id'
>> > >
>> > > #Scala – this works
>> > > val a = sc.parallelize(List("1", "2", "3"))
>> > > val df = a.toDF()
>> > > z.show(df)
>> > >
>> > >
>> > >
>> > > --
>> > > This message was sent by Atlassian JIRA
>> > > (v6.3.4#6332)
>> > >
>>
>
>

Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark

Posted by IT CTO <go...@gmail.com>.

Greate, I did not know. I will test it tomorrow.
Eran

בתאריך יום ב׳, 10 באוג׳ 2015, 18:48 מאת Felix Cheung <
felixcheung_m@hotmail.com>:

> Could you elaborate? Are you referring to working around this issue?The
> fix for this has been merged.
>
> > From: goi.cto@gmail.com
> > Date: Mon, 10 Aug 2015 11:48:13 +0000
> > Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on
> DataFrame in pyspark
> > To: dev@zeppelin.incubator.apache.org
> >
> > Does anyone knows how to solve this one? my users are using python and
> > iterating through the DF each time is not useful
> > Eran
> >
> > On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
> > wrote:
> >
> > > Felix Cheung created ZEPPELIN-185:
> > > -------------------------------------
> > >
> > >              Summary: z.show does not work on DataFrame in pyspark
> > >                  Key: ZEPPELIN-185
> > >                  URL:
> https://issues.apache.org/jira/browse/ZEPPELIN-185
> > >              Project: Zeppelin
> > >           Issue Type: Bug
> > >           Components: Core, Interpreters
> > >     Affects Versions: 0.6.0
> > >             Reporter: Felix Cheung
> > >             Assignee: Felix Cheung
> > >
> > >
> > > I’ve tested this out and found these issues. Firstly,
> > >
> > >
> > >
> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
> > > # Code should be changed to this – it does not work in pyspark CLI
> > > otherwise
> > > rdd = sc.parallelize(["1","2","3"])
> > > Data = Row('first')
> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> > >
> > > Secondly,
> > > z.show() doesn’t seem to work properly in Python – I see the same error
> > > below: “AttributeError: 'DataFrame' object has no attribute
> > > '_get_object_id'"
> > > #Python/PySpark – doesn’t work
> > > rdd = sc.parallelize(["1","2","3"])
> > > Data = Row('first')
> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> > > print df
> > > print df.collect()
> > > z.show(df)
> > >         AttributeError: 'DataFrame' object has no attribute
> > > ‘_get_object_id'
> > >
> > > #Scala – this works
> > > val a = sc.parallelize(List("1", "2", "3"))
> > > val df = a.toDF()
> > > z.show(df)
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
>

RE: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark

Posted by Felix Cheung <fe...@hotmail.com>.

Could you elaborate? Are you referring to working around this issue?The fix for this has been merged.

> From: goi.cto@gmail.com
> Date: Mon, 10 Aug 2015 11:48:13 +0000
> Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark
> To: dev@zeppelin.incubator.apache.org
> 
> Does anyone knows how to solve this one? my users are using python and
> iterating through the DF each time is not useful
> Eran
> 
> On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
> wrote:
> 
> > Felix Cheung created ZEPPELIN-185:
> > -------------------------------------
> >
> >              Summary: z.show does not work on DataFrame in pyspark
> >                  Key: ZEPPELIN-185
> >                  URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
> >              Project: Zeppelin
> >           Issue Type: Bug
> >           Components: Core, Interpreters
> >     Affects Versions: 0.6.0
> >             Reporter: Felix Cheung
> >             Assignee: Felix Cheung
> >
> >
> > I’ve tested this out and found these issues. Firstly,
> >
> >
> > http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
> > # Code should be changed to this – it does not work in pyspark CLI
> > otherwise
> > rdd = sc.parallelize(["1","2","3"])
> > Data = Row('first')
> > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> >
> > Secondly,
> > z.show() doesn’t seem to work properly in Python – I see the same error
> > below: “AttributeError: 'DataFrame' object has no attribute
> > '_get_object_id'"
> > #Python/PySpark – doesn’t work
> > rdd = sc.parallelize(["1","2","3"])
> > Data = Row('first')
> > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> > print df
> > print df.collect()
> > z.show(df)
> >         AttributeError: 'DataFrame' object has no attribute
> > ‘_get_object_id'
> >
> > #Scala – this works
> > val a = sc.parallelize(List("1", "2", "3"))
> > val df = a.toDF()
> > z.show(df)
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >

Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark

Posted by IT CTO <go...@gmail.com>.

Does anyone knows how to solve this one? my users are using python and
iterating through the DF each time is not useful
Eran

On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <ji...@apache.org>
wrote:

> Felix Cheung created ZEPPELIN-185:
> -------------------------------------
>
>              Summary: z.show does not work on DataFrame in pyspark
>                  Key: ZEPPELIN-185
>                  URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
>              Project: Zeppelin
>           Issue Type: Bug
>           Components: Core, Interpreters
>     Affects Versions: 0.6.0
>             Reporter: Felix Cheung
>             Assignee: Felix Cheung
>
>
> I’ve tested this out and found these issues. Firstly,
>
>
> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
> # Code should be changed to this – it does not work in pyspark CLI
> otherwise
> rdd = sc.parallelize(["1","2","3"])
> Data = Row('first')
> df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>
> Secondly,
> z.show() doesn’t seem to work properly in Python – I see the same error
> below: “AttributeError: 'DataFrame' object has no attribute
> '_get_object_id'"
> #Python/PySpark – doesn’t work
> rdd = sc.parallelize(["1","2","3"])
> Data = Row('first')
> df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
> print df
> print df.collect()
> z.show(df)
>         AttributeError: 'DataFrame' object has no attribute
> ‘_get_object_id'
>
> #Scala – this works
> val a = sc.parallelize(List("1", "2", "3"))
> val df = a.toDF()
> z.show(df)
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>