You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lakshmi Praveena (JIRA)" <ji...@apache.org> on 2019/01/23 07:14:00 UTC
[jira] [Created] (SPARK-26699) Dataset column discrepancies between
Parquet
Lakshmi Praveena created SPARK-26699:
----------------------------------------
Summary: Dataset column discrepancies between Parquet
Key: SPARK-26699
URL: https://issues.apache.org/jira/browse/SPARK-26699
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 2.3.2
Reporter: Lakshmi Praveena
Hi,
When i run my job in Local mode with same parquet input files, the output is -
locations
--------------------
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
null
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
But when i run the same code base with same input parquet files in the YARN cluster mode, my output is as below -
--------------------
locations
--------------------
[*WrappedArray*([tr...
[*WrappedArray*([tr...
[WrappedArray([tr...
null
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
Its appending WrappedArray :(
I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 5.19.0. What could be the reason for discrepancies in the output of certain Table columns ?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org