You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hcatalog-commits@incubator.apache.org by "Ranjit Mathew (JIRA)" <ji...@apache.org> on 2011/06/01 03:37:47 UTC

[jira] [Created] (HCATALOG-36) Support Writing Out to Multiple Tables in HCatOutputFormat

Support Writing Out to Multiple Tables in HCatOutputFormat
----------------------------------------------------------

                 Key: HCATALOG-36
                 URL: https://issues.apache.org/jira/browse/HCATALOG-36
             Project: HCatalog
          Issue Type: Improvement
            Reporter: Ranjit Mathew


HCatOutputFormat does not support writing out to multiple tables (or partitions for that matter).
Add this support to HCatalog.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HCATALOG-36) Support Writing Out to Multiple Tables in HCatOutputFormat

Posted by "Ranjit Mathew (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HCATALOG-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042593#comment-13042593 ] 

Ranjit Mathew commented on HCATALOG-36:
---------------------------------------

Yes, writing to multiple partitions is a sorely-needed feature that we're looking forward to in HCatalog 0.2.

We process several logs in a MapReduce Job and write out the processed data into various
tables managed by HCatalog. There are multiple tables even for a single log-type, to separate
out the "core" data required in most queries from the "peripheral" data required only in some
queries, not to mention to keep the schema for each table relatively simple.

> Support Writing Out to Multiple Tables in HCatOutputFormat
> ----------------------------------------------------------
>
>                 Key: HCATALOG-36
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-36
>             Project: HCatalog
>          Issue Type: Improvement
>            Reporter: Ranjit Mathew
>         Attachments: multihcat.tgz
>
>
> HCatOutputFormat does not support writing out to multiple tables (or partitions for that matter).
> Add this support to HCatalog.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HCATALOG-36) Support Writing Out to Multiple Tables in HCatOutputFormat

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HCATALOG-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042812#comment-13042812 ] 

Alan Gates commented on HCATALOG-36:
------------------------------------

The thing I am not clear about is that if you are writing one output stream to multiple tables, they must all have the same schema.  Why not model them as different partitions of the same table?

> Support Writing Out to Multiple Tables in HCatOutputFormat
> ----------------------------------------------------------
>
>                 Key: HCATALOG-36
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-36
>             Project: HCatalog
>          Issue Type: Improvement
>            Reporter: Ranjit Mathew
>         Attachments: multihcat.tgz
>
>
> HCatOutputFormat does not support writing out to multiple tables (or partitions for that matter).
> Add this support to HCatalog.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HCATALOG-36) Support Writing Out to Multiple Tables in HCatOutputFormat

Posted by "Ranjit Mathew (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HCATALOG-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044804#comment-13044804 ] 

Ranjit Mathew commented on HCATALOG-36:
---------------------------------------

We are not writing one output stream to multiple tables. We're writing to different tables
depending on where the processed data should go. For example, in a map() method
we'll have:

    TableBasedHCatRecordFormat rec1 = new TableBasedHCatRecordFormat("table1", 2);
    rec1.set(0, foo);
    rec1.set(1, bar);
    ctx.write(null, rec1);
[...]
    TableBasedHCatRecordFormat rec2 = new TableBasedHCatRecordFormat("table2", 2);
    rec2.set(0, snafu);
    rec2.set(1, wombat);
    ctx.write(null, rec2);


> Support Writing Out to Multiple Tables in HCatOutputFormat
> ----------------------------------------------------------
>
>                 Key: HCATALOG-36
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-36
>             Project: HCatalog
>          Issue Type: Improvement
>            Reporter: Ranjit Mathew
>         Attachments: multihcat.tgz
>
>
> HCatOutputFormat does not support writing out to multiple tables (or partitions for that matter).
> Add this support to HCatalog.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HCATALOG-36) Support Writing Out to Multiple Tables in HCatOutputFormat

Posted by "Ranjit Mathew (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HCATALOG-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ranjit Mathew updated HCATALOG-36:
----------------------------------

    Attachment: multihcat.tgz

A set of classes in the "org.apache.hcatalog.mapreduce" 
package, created by Sreekanth Ramakrishnan, that can
be used as a starting point for this support.

> Support Writing Out to Multiple Tables in HCatOutputFormat
> ----------------------------------------------------------
>
>                 Key: HCATALOG-36
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-36
>             Project: HCatalog
>          Issue Type: Improvement
>            Reporter: Ranjit Mathew
>         Attachments: multihcat.tgz
>
>
> HCatOutputFormat does not support writing out to multiple tables (or partitions for that matter).
> Add this support to HCatalog.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HCATALOG-36) Support Writing Out to Multiple Tables in HCatOutputFormat

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HCATALOG-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042235#comment-13042235 ] 

Alan Gates commented on HCATALOG-36:
------------------------------------

Writing to multiple partitions is slated for 0.2, see https://cwiki.apache.org/confluence/display/HCATALOG/HCatalogJournal  

I'd like to understand the use case for writing to multiple tables.  You want to be able to split the output stream to different tables as it's being written?  How would the split be determined?  Can you give me an example of a job that would want to do this?  Also, using Pig's split operator and HCatStorer this should already work in Pig.  Would an extending MultipleOutputFormat do what you want?

> Support Writing Out to Multiple Tables in HCatOutputFormat
> ----------------------------------------------------------
>
>                 Key: HCATALOG-36
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-36
>             Project: HCatalog
>          Issue Type: Improvement
>            Reporter: Ranjit Mathew
>         Attachments: multihcat.tgz
>
>
> HCatOutputFormat does not support writing out to multiple tables (or partitions for that matter).
> Add this support to HCatalog.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira