You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Radar Lei (JIRA)" <ji...@apache.org> on 2017/09/04 07:15:02 UTC

[jira] [Assigned] (HAWQ-1303) Load each partition as separate table for heterogenous tables in HCatalog

     [ https://issues.apache.org/jira/browse/HAWQ-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radar Lei reassigned HAWQ-1303:
-------------------------------

    Assignee: Oleksandr Diachenko  (was: Radar Lei)

> Load each partition as separate table for heterogenous tables in HCatalog
> -------------------------------------------------------------------------
>
>                 Key: HAWQ-1303
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1303
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Hcatalog, PXF
>            Reporter: Oleksandr Diachenko
>            Assignee: Oleksandr Diachenko
>
> Changes introduced in HAWQ-1228 made HAWQ use optimal profile/format for Hive tables. But there is a limitation when HAWQ loads Hive tables into memory, it loads them as one table even if a table has multiple partitions with different output formats(GPDBWritable, TEXT). Thus currently it uses GBDBWritable format for that case. The idea is to load each partition set of one output format as a separate table, so not optimal profile, but optimal output format could be used.
> Example: 
> We have Hive table with four partitions of following formats - Text, RC, ORC, Sequence file.
> Currently, HAWQ will load it to memory with GPDBWritable format.
> GPDBWritable format is optimal for HiveORC, Hive profiles but not optimal for HIveText and HiveRC profiles.
> With proposed changes, HAWQ should load two tables with TEXT and GPDBWritable formats and use following pairs to read partitions - HiveText/TEXT, HiveRC/TEXT, HiveORC/GPDBWritable, Hive/GPDBWritable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)