You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Lakshmi Praveena (JIRA)" <ji...@apache.org> on 2019/01/23 07:14:00 UTC

[jira] [Created] (SPARK-26699) Dataset column discrepancies between Parquet

Lakshmi Praveena created SPARK-26699:
----------------------------------------

             Summary: Dataset column discrepancies between Parquet 
                 Key: SPARK-26699
                 URL: https://issues.apache.org/jira/browse/SPARK-26699
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 2.3.2
            Reporter: Lakshmi Praveena


Hi,

 

When i run my job in Local mode with same parquet input files, the output is -

 

locations
--------------------
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
 null
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...

 

But when i run the same code base with same input parquet files in the YARN cluster mode, my output is as below -

--------------------
 locations
--------------------
[*WrappedArray*([tr...
[*WrappedArray*([tr...
[WrappedArray([tr...
 null
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...

Its appending WrappedArray :(

I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 5.19.0. What could be the reason for discrepancies in the output of certain Table columns ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org