You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by IT CTO <go...@gmail.com> on 2015/07/19 10:54:51 UTC

Print RDD as table

Hi,
I am using pySpark with zeppelin and would like to print the RDD as a table
to be able to display in the display system.
I know how to loop through the records and generate the %table string and
print it but I am looking for a more elegant way.
I tried z.show(MyRdd) but it failed:
... 'PipelinedRDD object has no attribute '_get_object_id

any help?
Eran

Re: Print RDD as table

Posted by IT CTO <go...@gmail.com>.

Hi,
I tried as suggested, apparently sqlContext is not recognized in pySpark
paragraph.
No problem access it in spark paragraph

When I try to import SQLContext and create one from sc
sqlContext = SQLContext(sc)
wordcount = (sc.textFile("some path to file"))
wcDF = sqlContext.createDataFrame(wordcount)
z.show(wcDF)

I am back to the original error

Eran

On Mon, Jul 20, 2015 at 2:24 PM Felix Cheung <fe...@hotmail.com>
wrote:

> btw, it should work better in python if you first convert it to Row as the
> example from the documentation (
> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection),
> and use sqlContext.createDataFrame():
>
> lines = sc.textFile("examples/src/main/resources/people.txt")
> parts = lines.map(lambda l: l.split(","))
> people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))
>
> # Infer the schema
> schemaPeople = sqlContext.createDataFrame(people)
> ------------------------------
> From: felixcheung_m@hotmail.com
> To: users@zeppelin.incubator.apache.org
> Subject: RE: Print RDD as table
> Date: Mon, 20 Jul 2015 04:14:36 -0700
>
>
> Just a thought, try this instead?
> wordcount = sc.textFile("some path to file")
> wcDF = wordcount.toDF()
> z.show(wcDF)
>
>
> ------------------------------
> From: goi.cto@gmail.com
> Date: Mon, 20 Jul 2015 08:54:44 +0000
> Subject: Re: Print RDD as table
> To: users@zeppelin.incubator.apache.org
>
> Here is the code first is a paragraph in pySpark which fails and second is
> one in scala which works
>
> %pyspark
> #This paragraph fails
> wordcount = (sc.textFile("some path to file"))
> wcDF = wordcount.toDF() #here is where the code fails
> z.show(wcDF)
>
> btw, the same code works in scala:
>
> //This paragraph works well
> val wordcount = (sc.textFile("some path to file"))
> val wcDF = wordcount.toDF()
> z.show(wcDF)
>
>
>
> On Mon, Jul 20, 2015 at 10:34 AM <fe...@hotmail.com> wrote:
>
>  Could you post more of your code leading to that?
>
>
>
> On Sun, Jul 19, 2015 at 10:19 PM -0700, "IT CTO" <go...@gmail.com>
> wrote:
>
>  I am trying to convert the Python RDD to DF but I am getting and error:
>
>  myRDD_DF = myRDD.toDF()
>
>  error: AtributeError("'list' object is not attribute '_get_object_id'",)
>
>  As much as I read this is something to do with python and java
> conversion but I don't know....
> Any help?
>
>  On Mon, Jul 20, 2015 at 4:21 AM <fe...@hotmail.com> wrote:
>
>  You should try to convert the RDD into a DataFrame. Zeppelin can then
> display it as a table automatically
>
>
>
> On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" <go...@gmail.com> wrote:
>
>  Hi,
> I am using pySpark with zeppelin and would like to print the RDD as a
> table to be able to display in the display system.
> I know how to loop through the records and generate the %table string and
> print it but I am looking for a more elegant way.
> I tried z.show(MyRdd) but it failed:
> ... 'PipelinedRDD object has no attribute '_get_object_id
>
>  any help?
> Eran
>
>

RE: Print RDD as table

Posted by Felix Cheung <fe...@hotmail.com>.

btw, it should work better in python if you first convert it to Row as the example from the documentation (http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection), and use sqlContext.createDataFrame():
 
lines = sc.textFile("examples/src/main/resources/people.txt")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))

# Infer the schema
schemaPeople = sqlContext.createDataFrame(people)

 
From: felixcheung_m@hotmail.com
To: users@zeppelin.incubator.apache.org
Subject: RE: Print RDD as table
Date: Mon, 20 Jul 2015 04:14:36 -0700




Just a thought, try this instead?
wordcount = sc.textFile("some path to file")wcDF = wordcount.toDF()z.show(wcDF)
 From: goi.cto@gmail.com
Date: Mon, 20 Jul 2015 08:54:44 +0000
Subject: Re: Print RDD as table
To: users@zeppelin.incubator.apache.org

Here is the code first is a paragraph in pySpark which fails and second is one in scala which works
%pyspark#This paragraph fails
wordcount = (sc.textFile("some path to file"))wcDF = wordcount.toDF() #here is where the code failsz.show(wcDF)
btw, the same code works in scala:
//This paragraph works wellval wordcount = (sc.textFile("some path to file"))val wcDF = wordcount.toDF() z.show(wcDF)


On Mon, Jul 20, 2015 at 10:34 AM <fe...@hotmail.com> wrote:





Could you post more of your code leading to that?








On Sun, Jul 19, 2015 at 10:19 PM -0700, "IT CTO" 
<go...@gmail.com> wrote:





I am trying to convert the Python RDD to DF but I am getting and error:



myRDD_DF = myRDD.toDF()



error: AtributeError("'list' object is not attribute '_get_object_id'",)



As much as I read this is something to do with python and java conversion but I don't know....
Any help?




On Mon, Jul 20, 2015 at 4:21 AM <fe...@hotmail.com> wrote:




You should try to convert the RDD into a DataFrame. Zeppelin can then display it as a table automatically









On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" 
<go...@gmail.com> wrote:





Hi,
I am using pySpark with zeppelin and would like to print the RDD as a table to be able to display in the display system.
I know how to loop through the records and generate the %table string and print it but I am looking for a more elegant way.
I tried z.show(MyRdd) but it failed:
... 'PipelinedRDD object has no attribute '_get_object_id



any help?
Eran

RE: Print RDD as table

Posted by Felix Cheung <fe...@hotmail.com>.

Just a thought, try this instead?
wordcount = sc.textFile("some path to file")wcDF = wordcount.toDF()z.show(wcDF)
 From: goi.cto@gmail.com
Date: Mon, 20 Jul 2015 08:54:44 +0000
Subject: Re: Print RDD as table
To: users@zeppelin.incubator.apache.org

Here is the code first is a paragraph in pySpark which fails and second is one in scala which works
%pyspark#This paragraph fails
wordcount = (sc.textFile("some path to file"))wcDF = wordcount.toDF() #here is where the code failsz.show(wcDF)
btw, the same code works in scala:
//This paragraph works wellval wordcount = (sc.textFile("some path to file"))val wcDF = wordcount.toDF() z.show(wcDF)

On Mon, Jul 20, 2015 at 10:34 AM <fe...@hotmail.com> wrote:

Could you post more of your code leading to that?

On Sun, Jul 19, 2015 at 10:19 PM -0700, "IT CTO" 
<go...@gmail.com> wrote:

I am trying to convert the Python RDD to DF but I am getting and error:

myRDD_DF = myRDD.toDF()

error: AtributeError("'list' object is not attribute '_get_object_id'",)

As much as I read this is something to do with python and java conversion but I don't know....
Any help?

On Mon, Jul 20, 2015 at 4:21 AM <fe...@hotmail.com> wrote:

You should try to convert the RDD into a DataFrame. Zeppelin can then display it as a table automatically

On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" 
<go...@gmail.com> wrote:

Hi,
I am using pySpark with zeppelin and would like to print the RDD as a table to be able to display in the display system.
I know how to loop through the records and generate the %table string and print it but I am looking for a more elegant way.
I tried z.show(MyRdd) but it failed:
... 'PipelinedRDD object has no attribute '_get_object_id

any help?
Eran

Re: Print RDD as table

Posted by IT CTO <go...@gmail.com>.

Here is the code first is a paragraph in pySpark which fails and second is
one in scala which works

%pyspark
#This paragraph fails
wordcount = (sc.textFile("some path to file"))
wcDF = wordcount.toDF() #here is where the code fails
z.show(wcDF)

btw, the same code works in scala:

//This paragraph works well
val wordcount = (sc.textFile("some path to file"))
val wcDF = wordcount.toDF()
z.show(wcDF)



On Mon, Jul 20, 2015 at 10:34 AM <fe...@hotmail.com> wrote:

>  Could you post more of your code leading to that?
>
>
>
> On Sun, Jul 19, 2015 at 10:19 PM -0700, "IT CTO" <go...@gmail.com>
> wrote:
>
>  I am trying to convert the Python RDD to DF but I am getting and error:
>
>  myRDD_DF = myRDD.toDF()
>
>  error: AtributeError("'list' object is not attribute '_get_object_id'",)
>
>  As much as I read this is something to do with python and java
> conversion but I don't know....
> Any help?
>
>  On Mon, Jul 20, 2015 at 4:21 AM <fe...@hotmail.com> wrote:
>
>  You should try to convert the RDD into a DataFrame. Zeppelin can then
> display it as a table automatically
>
>
>
> On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" <go...@gmail.com> wrote:
>
>  Hi,
> I am using pySpark with zeppelin and would like to print the RDD as a
> table to be able to display in the display system.
> I know how to loop through the records and generate the %table string and
> print it but I am looking for a more elegant way.
> I tried z.show(MyRdd) but it failed:
> ... 'PipelinedRDD object has no attribute '_get_object_id
>
>  any help?
> Eran
>
>

Re: Print RDD as table

Posted by fe...@hotmail.com.

Could you post more of your code leading to that?

On Sun, Jul 19, 2015 at 10:19 PM -0700, "IT CTO" <go...@gmail.com> wrote:
I am trying to convert the Python RDD to DF but I am getting and error:

myRDD_DF = myRDD.toDF()

error: AtributeError("'list' object is not attribute '_get_object_id'",)

As much as I read this is something to do with python and java conversion
but I don't know....
Any help?

On Mon, Jul 20, 2015 at 4:21 AM <fe...@hotmail.com> wrote:

>  You should try to convert the RDD into a DataFrame. Zeppelin can then
> display it as a table automatically
>
>
>
> On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" <go...@gmail.com> wrote:
>
>  Hi,
> I am using pySpark with zeppelin and would like to print the RDD as a
> table to be able to display in the display system.
> I know how to loop through the records and generate the %table string and
> print it but I am looking for a more elegant way.
> I tried z.show(MyRdd) but it failed:
> ... 'PipelinedRDD object has no attribute '_get_object_id
>
>  any help?
> Eran
>

Re: Print RDD as table

Posted by IT CTO <go...@gmail.com>.

I am trying to convert the Python RDD to DF but I am getting and error:

myRDD_DF = myRDD.toDF()

error: AtributeError("'list' object is not attribute '_get_object_id'",)

As much as I read this is something to do with python and java conversion
but I don't know....
Any help?

On Mon, Jul 20, 2015 at 4:21 AM <fe...@hotmail.com> wrote:

>  You should try to convert the RDD into a DataFrame. Zeppelin can then
> display it as a table automatically
>
>
>
> On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" <go...@gmail.com> wrote:
>
>  Hi,
> I am using pySpark with zeppelin and would like to print the RDD as a
> table to be able to display in the display system.
> I know how to loop through the records and generate the %table string and
> print it but I am looking for a more elegant way.
> I tried z.show(MyRdd) but it failed:
> ... 'PipelinedRDD object has no attribute '_get_object_id
>
>  any help?
> Eran
>

Re: Print RDD as table

Posted by fe...@hotmail.com.

You should try to convert the RDD into a DataFrame. Zeppelin can then display it as a table automatically





On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" <go...@gmail.com> wrote:
Hi,
I am using pySpark with zeppelin and would like to print the RDD as a table
to be able to display in the display system.
I know how to loop through the records and generate the %table string and
print it but I am looking for a more elegant way.
I tried z.show(MyRdd) but it failed:
... 'PipelinedRDD object has no attribute '_get_object_id

any help?
Eran