You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Lili Ma (JIRA)" <ji...@apache.org> on 2017/02/28 09:43:45 UTC

[jira] [Commented] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet

    [ https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887672#comment-15887672 ] 

Lili Ma commented on HAWQ-1366:
-------------------------------

The title is optimized in Hive to dictionary storage.  Since HAWQ doesn't support this, the output information is a little werid.

In short team, HAWQ should throw error out for this case. In long term, HAWQ should support Parquet 2.0 data read/write.


> HAWQ should throw error if finding dictionary encoding type for Parquet
> -----------------------------------------------------------------------
>
>                 Key: HAWQ-1366
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1366
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Storage
>            Reporter: Lili Ma
>            Assignee: Ed Espino
>             Fix For: 2.2.0.0-incubating
>
>
> Since HAWQ is based on Parquet format version 1.0, which does not support dictionary page, and hawq register may register Parquet format version 2.0 data into HAWQ, we should throw error if finding unsupported page for column.
> Reproduce Steps:
> 1. In Hive, create a table and insert into 8 records:
> {code}
> (hive> create table tt (i int,
>     >   fname varchar(100),
>     >   title varchar(100),
>     >   salary double
>     > )
>     > STORED AS PARQUET;
> OK
> Time taken: 0.029 seconds
> hive> insert into tt values (5,    'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW',    'Sales',    80282.54),
>     > (7,    'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE',    'Engineer',    10206.65),
>     > (4,    'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ',    'Director',    63691.23),
>     > (9,    'CTDCDYRURBZMBLNWHQNOQCYFFVULOP',    'Engineer',    63867.44),
>     > (10,    'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK',    'Sales',    97720.08);
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
> Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job running in-process (local Hadoop)
> 2017-02-28 17:39:58,713 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_local2046305831_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to directory hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-10000
> Loading data to table default.tt
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 3945 HDFS Write: 4226 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 1.975 seconds
> hive> select * from tt;
> OK
> 5	OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW	Sales	80282.54
> 7	UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE	Engineer	10206.65
> 4	PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ	Director	63691.23
> 9	CTDCDYRURBZMBLNWHQNOQCYFFVULOP	Engineer	63867.44
> 10	WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK	Sales	97720.08
> Time taken: 0.056 seconds, Fetched: 5 row(s)
> {code}
> 2. Create table in HAWQ
> {code}
> CREATE TABLE public.tt
> (i int,
>   fname varchar(100),
>   title varchar(100),
>   salary float8)
> WITH (appendonly=true,orientation=parquet);
> {code}
> 3. run hawq register
> {code}
> malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f hdfs://localhost:8020/user/hive/warehouse/tt tt
> 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try to connect database localhost:5432 postgres
> 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New file(s) to be registered: ['hdfs://localhost:8020/user/hive/warehouse/tt/000000_0']
> hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/000000_0 hdfs://localhost:8020/hawq_default/16385/16387/49281/1"
> 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq Register Succeed.
> {code}
> 4. select from hawq
> {code}
> postgres=# select * from tt;
>  i  |             fname              | title |  salary
> ----+--------------------------------+-------+----------
>   5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW |       | 80282.54
>   7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE |       | 10206.65
>   4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ |       | 63691.23
>   9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP |       | 63867.44
>  10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK |       | 97720.08
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)