You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Billie Rinaldi (JIRA)" <ji...@apache.org> on 2012/07/11 18:06:35 UTC

[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format

    [ https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411672#comment-13411672 ] 

Billie Rinaldi commented on ACCUMULO-391:
-----------------------------------------

I think we could probably expand the existing InputFormatBase to cover the multi-table case.  This would require making columns, ranges, and iterators per-table.  Columns and iterators are only accessed on a per-table basis, so the table could be encoded in the property key and the value could be left the same, e.g. conf.set(ITERATORS + "." + Base64.encodeBase64(tableName.getBytes()), iterators).  (Although I think in the case of iterators we should get rid of the separate iterators and iterator options properties and just have one combined property.  I'd also like to see more standardization in the encodings we're using for property values.)  The ranges are pulled from the configuration all at once, so we should leave them under the RANGES property key and have either a hierarchical structure in the value, or a flat structure where the table name is included with each range.  I would suggest new methods to replace the existing ones of the same names:

{noformat}
void setInputInfo(Configuration conf, String user, byte[] passwd, Authorizations auths)
void setRanges(Configuration conf, Text tableName, Collection<Range> ranges)
void fetchColumns(Configuration conf, Text tableName, Collection<Pair<Text,Text>> columnFamilyColumnQualifierPairs)
void addIterator(Configuration conf, Text tableName, IteratorSetting cfg)
TabletLocator getTabletLocator(Configuration conf, String tableName)
Map<Text,List<Range>> getRanges(Configuration conf)
Set<Pair<Text,Text>> getFetchedColumns(Configuration conf, String tableName)
List<IteratorSetting> getIterators(Configuration conf, String tableName)
{noformat}

To provide backwards compatibility, we could also keep the old setInputInfo/setRanges/fetchColumns/addIterator methods and have a concept of a default table specified in setInputInfo that will be the table used whenever a table isn't specified for setRanges/fetchColumns/addIterator.
                
> Multi-table Accumulo input format
> ---------------------------------
>
>                 Key: ACCUMULO-391
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-391
>             Project: Accumulo
>          Issue Type: New Feature
>    Affects Versions: 1.5.0-SNAPSHOT
>            Reporter: John Vines
>            Priority: Minor
>              Labels: mapreduce,
>         Attachments: multi-table-if.patch
>
>
> Just realized we had no MR input method which supports multiple Tables for an input format. I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively have the Table/Key be the key tuple and stick with Values being the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira