You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by dimitris plakas <di...@gmail.com> on 2018/09/06 22:11:07 UTC

Error in show()

Hello everyone, I am new in Pyspark and i am facing an issue. Let me
explain what exactly is the problem.

I have a dataframe and i apply on this a map() function
(dataframe2=datframe1.rdd.map(custom_function())
dataframe = sqlContext.createDataframe(dataframe2)

when i have

dataframe.show(30,True) it shows the result,

when i am using dataframe.show(60, True) i get the error. The Error is in
the attachement Pyspark_Error.txt.

Could you please explain me what is this error and how to overpass it?

Re: Error in show()

Posted by Sonal Goyal <so...@gmail.com>.
It says serialization error - could there be a column value which is not
getting parsed as int in one of the rows 31-60? The relevant Python code in
serializers.py which is throwing the error is

def read_int(stream):
    length = stream.read(4)
    if not length:
        raise EOFError
    return struct.unpack("!i", length)[0]


Thanks,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>



On Fri, Sep 7, 2018 at 12:22 PM, Apostolos N. Papadopoulos <
papadopo@csd.auth.gr> wrote:

> Can you isolate the row that is causing the problem? I mean start using
> show(31) up to show(60).
>
> Perhaps this will help you to understand the problem.
>
> regards,
>
> Apostolos
>
>
>
> On 07/09/2018 01:11 πμ, dimitris plakas wrote:
>
> Hello everyone, I am new in Pyspark and i am facing an issue. Let me
> explain what exactly is the problem.
>
> I have a dataframe and i apply on this a map() function
> (dataframe2=datframe1.rdd.map(custom_function())
> dataframe = sqlContext.createDataframe(dataframe2)
>
> when i have
>
> dataframe.show(30,True) it shows the result,
>
> when i am using dataframe.show(60, True) i get the error. The Error is in
> the attachement Pyspark_Error.txt.
>
> Could you please explain me what is this error and how to overpass it?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
> --
> Apostolos N. Papadopoulos, Associate Professor
> Department of Informatics
> Aristotle University of Thessaloniki
> Thessaloniki, GREECE
> tel: ++0030312310991918
> email: papadopo@csd.auth.gr
> twitter: @papadopoulos_ap
> web: http://datalab.csd.auth.gr/~apostol
>
>

Re: Error in show()

Posted by "Apostolos N. Papadopoulos" <pa...@csd.auth.gr>.
Can you isolate the row that is causing the problem? I mean start using 
show(31) up to show(60).

Perhaps this will help you to understand the problem.

regards,

Apostolos



On 07/09/2018 01:11 πμ, dimitris plakas wrote:
> Hello everyone, I am new in Pyspark and i am facing an issue. Let me 
> explain what exactly is the problem.
>
> I have a dataframe and i apply on this a map() function 
> (dataframe2=datframe1.rdd.map(custom_function())
> dataframe = sqlContext.createDataframe(dataframe2)
>
> when i have
>
> dataframe.show(30,True) it shows the result,
>
> when i am using dataframe.show(60, True) i get the error. The Error is 
> in the attachement Pyspark_Error.txt.
>
> Could you please explain me what is this error and how to overpass it?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org

-- 
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papadopo@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


Re: Error in show()

Posted by Prakash Joshi <pr...@gmail.com>.
Pls checke the specific ERORR lines of the text file .
Chaces are are few Columns are not properly delimited in specific rows.

Regards
Prakash

On Fri, Sep 7, 2018, 3:41 AM dimitris plakas <di...@gmail.com> wrote:

> Hello everyone, I am new in Pyspark and i am facing an issue. Let me
> explain what exactly is the problem.
>
> I have a dataframe and i apply on this a map() function
> (dataframe2=datframe1.rdd.map(custom_function())
> dataframe = sqlContext.createDataframe(dataframe2)
>
> when i have
>
> dataframe.show(30,True) it shows the result,
>
> when i am using dataframe.show(60, True) i get the error. The Error is in
> the attachement Pyspark_Error.txt.
>
> Could you please explain me what is this error and how to overpass it?
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org