You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/02 01:01:18 UTC

[jira] [Commented] (SPARK-13574) Improve parquet dictionary decoding for strings

    [ https://issues.apache.org/jira/browse/SPARK-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174656#comment-15174656 ] 

Apache Spark commented on SPARK-13574:
--------------------------------------

User 'nongli' has created a pull request for this issue:
https://github.com/apache/spark/pull/11454

> Improve parquet dictionary decoding for strings
> -----------------------------------------------
>
>                 Key: SPARK-13574
>                 URL: https://issues.apache.org/jira/browse/SPARK-13574
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Nong Li
>            Priority: Minor
>
> Currently, the parquet reader will copy the dictionary value for each data value. This is bad for string columns as we explode the dictionary during decode. We should instead, have the data values point to the safe backing memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org