You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/04/17 14:49:45 UTC
Querying Accumulo From Inside Mapper
I am reading from a text file of linked IDs but I want to store the
lookup values inside Accumulo.
RDB FOO
------
FOO_ID <-- this is the autoincrement key
ALT_ID <-- this is the natural key
NAME
AGE
RDB BAR
------
BAR_ID <-- this is the autoincrement key
TAG <-- zero or more person
RDB LINK
------
FOO_ID
BAR_ID
* RDB is relational database table.
Inside Accumulo, I want to use the ALT_ID as the row id because there
is other data that uses it which will also be stored in the row. I
will process the FOO text file first to result in:
FOO
-------
ALT_ID NAME XXX
ALT_ID AGE XXX
FOO_ID ALT_ID XXXX
Can I write to two Accumulo tables using one mapper? If I can, then I
can store the FOO_ID/ALT_ID record in a separate table.
Processing the BAR text file provides:
BAR
------
BAR_ID TAG XXXX
Then when I process the LINK table, I can query the FOO table to find
the ALT_ID. And query the BAR table to find the tag. Then combine the
information for the mutation:
FOO
------
ALT_ID TAG XXX
Is there a best practice to query from inside a mapper?
At the end of the work, I can delete the ALT_ID column (or table).
I know that this work is trivial using SQL, but <sigh> that's not an option.
Re: Querying Accumulo From Inside Mapper
Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Tuesday, April 17, 2012 8:49:45 AM, "David Medinets" <da...@gmail.com> wrote:
> I am reading from a text file of linked IDs but I want to store the
> lookup values inside Accumulo.
>
> RDB FOO
> ------
> FOO_ID <-- this is the autoincrement key
> ALT_ID <-- this is the natural key
> NAME
> AGE
>
> RDB BAR
> ------
> BAR_ID <-- this is the autoincrement key
> TAG <-- zero or more person
>
> RDB LINK
> ------
> FOO_ID
> BAR_ID
>
> * RDB is relational database table.
>
> Inside Accumulo, I want to use the ALT_ID as the row id because there
> is other data that uses it which will also be stored in the row. I
> will process the FOO text file first to result in:
>
> FOO
> -------
> ALT_ID NAME XXX
> ALT_ID AGE XXX
> FOO_ID ALT_ID XXXX
>
> Can I write to two Accumulo tables using one mapper? If I can, then I
> can store the FOO_ID/ALT_ID record in a separate table.
Yes. The AccumuloOutputFormat is parameterized by <Text,Mutation> where the Text is the table name.
> Processing the BAR text file provides:
>
> BAR
> ------
> BAR_ID TAG XXXX
>
> Then when I process the LINK table, I can query the FOO table to find
> the ALT_ID. And query the BAR table to find the tag. Then combine the
> information for the mutation:
>
> FOO
> ------
> ALT_ID TAG XXX
>
> Is there a best practice to query from inside a mapper?
Just make sure to do the Accumulo setup in the Mapper setup method. You'll probably want to look at the InputFormatBase to see how it passes the configuration information.
Billie
> At the end of the work, I can delete the ALT_ID column (or table).
>
> I know that this work is trivial using SQL, but <sigh> that's not an
> option.