You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by dimitris plakas <di...@gmail.com> on 2018/09/06 22:11:07 UTC
Error in show()
Hello everyone, I am new in Pyspark and i am facing an issue. Let me
explain what exactly is the problem.
I have a dataframe and i apply on this a map() function
(dataframe2=datframe1.rdd.map(custom_function())
dataframe = sqlContext.createDataframe(dataframe2)
when i have
dataframe.show(30,True) it shows the result,
when i am using dataframe.show(60, True) i get the error. The Error is in
the attachement Pyspark_Error.txt.
Could you please explain me what is this error and how to overpass it?
Re: Error in show()
Posted by Sonal Goyal <so...@gmail.com>.
It says serialization error - could there be a column value which is not
getting parsed as int in one of the rows 31-60? The relevant Python code in
serializers.py which is throwing the error is
def read_int(stream):
length = stream.read(4)
if not length:
raise EOFError
return struct.unpack("!i", length)[0]
Thanks,
Sonal
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Fri, Sep 7, 2018 at 12:22 PM, Apostolos N. Papadopoulos <
papadopo@csd.auth.gr> wrote:
> Can you isolate the row that is causing the problem? I mean start using
> show(31) up to show(60).
>
> Perhaps this will help you to understand the problem.
>
> regards,
>
> Apostolos
>
>
>
> On 07/09/2018 01:11 πμ, dimitris plakas wrote:
>
> Hello everyone, I am new in Pyspark and i am facing an issue. Let me
> explain what exactly is the problem.
>
> I have a dataframe and i apply on this a map() function
> (dataframe2=datframe1.rdd.map(custom_function())
> dataframe = sqlContext.createDataframe(dataframe2)
>
> when i have
>
> dataframe.show(30,True) it shows the result,
>
> when i am using dataframe.show(60, True) i get the error. The Error is in
> the attachement Pyspark_Error.txt.
>
> Could you please explain me what is this error and how to overpass it?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
> --
> Apostolos N. Papadopoulos, Associate Professor
> Department of Informatics
> Aristotle University of Thessaloniki
> Thessaloniki, GREECE
> tel: ++0030312310991918
> email: papadopo@csd.auth.gr
> twitter: @papadopoulos_ap
> web: http://datalab.csd.auth.gr/~apostol
>
>
Re: Error in show()
Posted by "Apostolos N. Papadopoulos" <pa...@csd.auth.gr>.
Can you isolate the row that is causing the problem? I mean start using
show(31) up to show(60).
Perhaps this will help you to understand the problem.
regards,
Apostolos
On 07/09/2018 01:11 πμ, dimitris plakas wrote:
> Hello everyone, I am new in Pyspark and i am facing an issue. Let me
> explain what exactly is the problem.
>
> I have a dataframe and i apply on this a map() function
> (dataframe2=datframe1.rdd.map(custom_function())
> dataframe = sqlContext.createDataframe(dataframe2)
>
> when i have
>
> dataframe.show(30,True) it shows the result,
>
> when i am using dataframe.show(60, True) i get the error. The Error is
> in the attachement Pyspark_Error.txt.
>
> Could you please explain me what is this error and how to overpass it?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papadopo@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol
Re: Error in show()
Posted by Prakash Joshi <pr...@gmail.com>.
Pls checke the specific ERORR lines of the text file .
Chaces are are few Columns are not properly delimited in specific rows.
Regards
Prakash
On Fri, Sep 7, 2018, 3:41 AM dimitris plakas <di...@gmail.com> wrote:
> Hello everyone, I am new in Pyspark and i am facing an issue. Let me
> explain what exactly is the problem.
>
> I have a dataframe and i apply on this a map() function
> (dataframe2=datframe1.rdd.map(custom_function())
> dataframe = sqlContext.createDataframe(dataframe2)
>
> when i have
>
> dataframe.show(30,True) it shows the result,
>
> when i am using dataframe.show(60, True) i get the error. The Error is in
> the attachement Pyspark_Error.txt.
>
> Could you please explain me what is this error and how to overpass it?
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org