You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Lili Ma (JIRA)" <ji...@apache.org> on 2017/02/28 09:42:45 UTC

[jira] [Created] (HAWQ-1366) HAWQ should throw error if finding dictionary encoding type for Parquet

Lili Ma created HAWQ-1366:
-----------------------------

             Summary: HAWQ should throw error if finding dictionary encoding type for Parquet
                 Key: HAWQ-1366
                 URL: https://issues.apache.org/jira/browse/HAWQ-1366
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Storage
            Reporter: Lili Ma
            Assignee: Ed Espino
             Fix For: 2.2.0.0-incubating


Since HAWQ is based on Parquet format version 1.0, which does not support dictionary page, and hawq register may register Parquet format version 2.0 data into HAWQ, we should throw error if finding unsupported page for column.

Reproduce Steps:
1. In Hive, create a table and insert into 8 records:
{code}
(hive> create table tt (i int,
    >   fname varchar(100),
    >   title varchar(100),
    >   salary double
    > )
    > STORED AS PARQUET;
OK
Time taken: 0.029 seconds
hive> insert into tt values (5,    'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW',    'Sales',    80282.54),
    > (7,    'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE',    'Engineer',    10206.65),
    > (4,    'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ',    'Director',    63691.23),
    > (9,    'CTDCDYRURBZMBLNWHQNOQCYFFVULOP',    'Engineer',    63867.44),
    > (10,    'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK',    'Sales',    97720.08);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-02-28 17:39:58,713 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local2046305831_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-10000
Loading data to table default.tt
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 3945 HDFS Write: 4226 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.975 seconds
hive> select * from tt;
OK
5	OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW	Sales	80282.54
7	UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE	Engineer	10206.65
4	PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ	Director	63691.23
9	CTDCDYRURBZMBLNWHQNOQCYFFVULOP	Engineer	63867.44
10	WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK	Sales	97720.08
Time taken: 0.056 seconds, Fetched: 5 row(s)
{code}
2. Create table in HAWQ
{code}
CREATE TABLE public.tt
(i int,
  fname varchar(100),
  title varchar(100),
  salary float8)
WITH (appendonly=true,orientation=parquet);
{code}
3. run hawq register
{code}
malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f hdfs://localhost:8020/user/hive/warehouse/tt tt
20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try to connect database localhost:5432 postgres
20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New file(s) to be registered: ['hdfs://localhost:8020/user/hive/warehouse/tt/000000_0']
hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/000000_0 hdfs://localhost:8020/hawq_default/16385/16387/49281/1"
20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq Register Succeed.
{code}
4. select from hawq
{code}
postgres=# select * from tt;
 i  |             fname              | title |  salary
----+--------------------------------+-------+----------
  5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW |       | 80282.54
  7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE |       | 10206.65
  4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ |       | 63691.23
  9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP |       | 63867.44
 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK |       | 97720.08
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)