You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Gerrit Jansen van Vuuren (JIRA)" <ji...@apache.org> on 2010/08/03 12:54:16 UTC

[jira] Updated: (PIG-1526) HiveColumnarLoader Partitioning Support

     [ https://issues.apache.org/jira/browse/PIG-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gerrit Jansen van Vuuren updated PIG-1526:
------------------------------------------

    Attachment: PIG-1526-2.patch

The previous patch did not use the UDFContext signature which caused the partition keys and expression to be overwritten if the loader was used for more than one table. That is fixed now.
Also added to PathPatitionHelper filtering out of hidden files i.e. files or directories starting with "_" are ignored now.


> HiveColumnarLoader Partitioning Support
> ---------------------------------------
>
>                 Key: PIG-1526
>                 URL: https://issues.apache.org/jira/browse/PIG-1526
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG-1526-2.patch, PIG-1526.patch
>
>
> I've made allot improvements on the HiveColumnarLoader:
> -> Added support for LoadMetadata and data path Partitioning 
> -> Improved and simplefied column loading
> Data Path Partitioning:
> Hive stores partitions as folders like to /mytable/partition1=[value]/partition2=[value]. That is the table mytable contains 2 partitions [partition1, partition2].
> The HiveColumnarLoader will scan the inputpath /mytable and add to the PigSchema the columns partition2 and partition2. 
> These columns can then be used in filtering. 
> For example: We've got year,month,day,hour partitions in our data uploads.
> So a table might look like mytable/year=2010/month=02/day=01.
> Loading with the HiveColumnarLoader allows our pig scripts do filter by date using the standard pig Filter operator.
> I've added 2 classes for this:
> -> PathPartitioner
> -> PathPartitionHelper
> These classes are not hive dependent and could be used by any other loader that wants to support partitioning and helps with implementing the LoadMetadata interface.
> For this reason I though it best to put it into the package org.apache.pig.piggybank.storage.partition.
> What would be nice is in the future have the PigStorage also use these 2 classes to provide automatic path partitioning support. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.