You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Araceli Henley <3a...@gmail.com> on 2013/12/07 00:45:39 UTC

Does Pig support HCatalogStorer table with buckets

Hi


:::::::::

QUESTION:

:::::::::

Can anyone confirm if HCatalogStore works with a hive table that was
declared with buckets?


:::::::::

DETAILS:

:::::::::


I have a table in hive that was created with buckets. But when I tried to
load the data with HCatalogStorer it fails with the following error.


Store into a partition with bucket definition from Pig/Mapreduce is not
supported.


I have a table declaration in hive:


......

   PARTITIONED BY(dtStr STRING)

   CLUSTERED BY(sessionid) SORTED BY(timestr) INTO 32 BUCKETS

   ROW FORMAT DELIMITED

           FIELDS TERMINATED BY '1'

           COLLECTION ITEMS TERMINATED BY '2'

           MAP KEYS TERMINATED BY '3'

   STORED AS ORC;


>From pig, I load the data with HCatStorer:


STORE sessnz_all INTO '$DB.allPocData' USING
org.apache.hcatalog.pig.HCatStorer();



Details at logfile:
/home/araceli/src/bigdata/projects/cisco_webanalytics_poc/src/server/pig/scripts/pig_1386373152479.log

[araceli@greenhost03 scripts]$ pig -version

Apache Pig version 0.11.2-mapr (rexported)

compiled Aug 27 2013, 13:50:32

[araceli@greenhost03 scripts]$ hive -version


Logging initialized using configuration in
jar:file:/opt/mapr/hive/hive-0.11/lib/hive-common-0.11-mapr.jar!/hive-log4j.properties

Hive history

I have a table declaration in hive:


......

   PARTITIONED BY(dtStr STRING)

   CLUSTERED BY(sessionid) SORTED BY(timestr) INTO 32 BUCKETS

   ROW FORMAT DELIMITED

           FIELDS TERMINATED BY '1'

           COLLECTION ITEMS TERMINATED BY '2'

           MAP KEYS TERMINATED BY '3'

   STORED AS ORC;


>From pig, I load the data with HCatStorer:


STORE sessnz_all INTO '$DB.allPocData' USING
org.apache.hcatalog.pig.HCatStorer();



Details at logfile:
/home/araceli/src/bigdata/projects/cisco_webanalytics_poc/src/server/pig/scripts/pig_1386373152479.log

[araceli@greenhost03 scripts]$ pig -version

Apache Pig version 0.11.2-mapr (rexported)

compiled Aug 27 2013, 13:50:32

[araceli@greenhost03 scripts]$ hive -version


Logging initialized using configuration in
jar:file:/opt/mapr/hive/hive-0.11/lib/hive-common-0.11-mapr.jar!/hive-log4j.properties

Hive history

Re: Does Pig support HCatalogStorer table with buckets

Posted by Alan Gates <ga...@hortonworks.com>.
No.  HCat explicitly checks if a table is bucketed, and if so disable storing to it to avoid writing to the table in a destructive way.

Alan.

On Dec 6, 2013, at 3:45 PM, Araceli Henley wrote:

> Hi
> 
> 
> :::::::::
> 
> QUESTION:
> 
> :::::::::
> 
> Can anyone confirm if HCatalogStore works with a hive table that was
> declared with buckets?
> 
> 
> :::::::::
> 
> DETAILS:
> 
> :::::::::
> 
> 
> I have a table in hive that was created with buckets. But when I tried to
> load the data with HCatalogStorer it fails with the following error.
> 
> 
> Store into a partition with bucket definition from Pig/Mapreduce is not
> supported.
> 
> 
> I have a table declaration in hive:
> 
> 
> ......
> 
>   PARTITIONED BY(dtStr STRING)
> 
>   CLUSTERED BY(sessionid) SORTED BY(timestr) INTO 32 BUCKETS
> 
>   ROW FORMAT DELIMITED
> 
>           FIELDS TERMINATED BY '1'
> 
>           COLLECTION ITEMS TERMINATED BY '2'
> 
>           MAP KEYS TERMINATED BY '3'
> 
>   STORED AS ORC;
> 
> 
> From pig, I load the data with HCatStorer:
> 
> 
> STORE sessnz_all INTO '$DB.allPocData' USING
> org.apache.hcatalog.pig.HCatStorer();
> 
> 
> 
> Details at logfile:
> /home/araceli/src/bigdata/projects/cisco_webanalytics_poc/src/server/pig/scripts/pig_1386373152479.log
> 
> [araceli@greenhost03 scripts]$ pig -version
> 
> Apache Pig version 0.11.2-mapr (rexported)
> 
> compiled Aug 27 2013, 13:50:32
> 
> [araceli@greenhost03 scripts]$ hive -version
> 
> 
> Logging initialized using configuration in
> jar:file:/opt/mapr/hive/hive-0.11/lib/hive-common-0.11-mapr.jar!/hive-log4j.properties
> 
> Hive history
> 
> I have a table declaration in hive:
> 
> 
> ......
> 
>   PARTITIONED BY(dtStr STRING)
> 
>   CLUSTERED BY(sessionid) SORTED BY(timestr) INTO 32 BUCKETS
> 
>   ROW FORMAT DELIMITED
> 
>           FIELDS TERMINATED BY '1'
> 
>           COLLECTION ITEMS TERMINATED BY '2'
> 
>           MAP KEYS TERMINATED BY '3'
> 
>   STORED AS ORC;
> 
> 
> From pig, I load the data with HCatStorer:
> 
> 
> STORE sessnz_all INTO '$DB.allPocData' USING
> org.apache.hcatalog.pig.HCatStorer();
> 
> 
> 
> Details at logfile:
> /home/araceli/src/bigdata/projects/cisco_webanalytics_poc/src/server/pig/scripts/pig_1386373152479.log
> 
> [araceli@greenhost03 scripts]$ pig -version
> 
> Apache Pig version 0.11.2-mapr (rexported)
> 
> compiled Aug 27 2013, 13:50:32
> 
> [araceli@greenhost03 scripts]$ hive -version
> 
> 
> Logging initialized using configuration in
> jar:file:/opt/mapr/hive/hive-0.11/lib/hive-common-0.11-mapr.jar!/hive-log4j.properties
> 
> Hive history


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.