You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by Matthew Hayes <ma...@gmail.com> on 2014/02/12 18:20:44 UTC

Re: Mapping output of Hourglss jobs to hive tables

The jobs have methods getOutputSchemaName() and getOutputSchemaNamespace()
that can be overridden.  By default the strings are being derived from the
class and its package.  Just extend PartitionCollapsingIncrementalJob for
example and override them.  I just filed DATAFU-32 to make it easier to
override the defaults.

Regarding your other question about the key, when you construct the hive
table can you not ignore the key?


On Wed, Feb 12, 2014 at 2:06 AM, Abhishek Gayakwad <a....@gmail.com>wrote:

> Hello,
>
> After running a partition collapsing or preserving job, the generated
> container file has schema as
> PartitionPreservingIncrementalJobOutput/PartitionCollapsingIncrementalJobOutput
> which further has key and value record types in it. When I create hive
> tables using this data, it has two columns for key and value of struct
> type. This takes away readability and is not what I want. I want to store
> only value object in output file. I there any way where I can get rid off
> Partition*JobOutput schema and avoid writing keys as well ?
>
> Thanks
> Abhishek
>
>  --
> You received this message because you are subscribed to the Google Groups
> "DataFu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datafu+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>