You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Pradeep Gollakota (JIRA)" <ji...@apache.org> on 2013/06/05 17:42:20 UTC

[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format

    [ https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676043#comment-13676043 ] 

Pradeep Gollakota commented on ACCUMULO-391:
--------------------------------------------

This would be a great addition.

We have just started working with Pig (with Accumulo) at my company. The first thing that we noticed is that in a lot of situations, where we are joining data from one Accumulo table to data from another, we have to first dump the data from both tables to HDFS (perhaps using PigStorage), load the data back and then join the data. This was because the scan information is encoded in the job configuration. So, when Pig uses the MultiInputFormat to scan both tables in the same job, only one table ends up getting exported from Accumulo.

If this is completed, we could use the MultiTableInputFormat instead of Accumulo(Row)InputFormat to optimize our pig scripts.

Any thoughts on when this would be included?
                
> Multi-table Accumulo input format
> ---------------------------------
>
>                 Key: ACCUMULO-391
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-391
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: John Vines
>            Assignee: William Slacum
>            Priority: Minor
>              Labels: mapreduce,
>         Attachments: multi-table-if.patch, new-multitable-if.patch
>
>
> Just realized we had no MR input method which supports multiple Tables for an input format. I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively have the Table/Key be the key tuple and stick with Values being the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira