You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2018/12/04 02:59:00 UTC

[jira] [Commented] (SPARK-26149) Read UTF8String from Parquet/ORC may be incorrect

    [ https://issues.apache.org/jira/browse/SPARK-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708119#comment-16708119 ] 

Yuming Wang commented on SPARK-26149:
-------------------------------------

This is not a Spark bug, but a Hive bug.

!image-2018-12-04-10-55-49-369.png!

> Read UTF8String from Parquet/ORC may be incorrect
> -------------------------------------------------
>
>                 Key: SPARK-26149
>                 URL: https://issues.apache.org/jira/browse/SPARK-26149
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>         Attachments: SPARK-26149.snappy.parquet, image-2018-12-04-10-55-49-369.png
>
>
> How to reproduce:
> {code:bash}
> scala> spark.read.parquet("/Users/yumwang/SPARK-26149/SPARK-26149.snappy.parquet").selectExpr("s1 = s2").show
> +---------+
> |(s1 = s2)|
> +---------+
> |    false|
> +---------+
> scala> val first = spark.read.parquet("/Users/yumwang/SPARK-26149/SPARK-26149.snappy.parquet").collect().head
> first: org.apache.spark.sql.Row = [a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96]
> scala> println(first.getString(0).equals(first.getString(1)))
> true
> {code}
> {code:sql}
> hive> CREATE TABLE `tb1` (`s1` STRING, `s2` STRING)
>     > stored as parquet
>     > location "/Users/yumwang/SPARK-26149";
> OK
> Time taken: 0.224 seconds
> hive> select s1 = s2 from tb1;
> OK
> true
> Time taken: 0.167 seconds, Fetched: 1 row(s)
> {code}
> As you can see, only UTF8String returns {{false}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org