You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/04/17 14:49:45 UTC

Querying Accumulo From Inside Mapper

I am reading from a text file of linked IDs but I want to store the
lookup values inside Accumulo.

RDB FOO
------
FOO_ID <-- this is the autoincrement key
ALT_ID  <-- this is the natural key
NAME
AGE

RDB BAR
------
BAR_ID <-- this is the autoincrement key
TAG       <-- zero or more person

RDB LINK
------
FOO_ID
BAR_ID

* RDB is relational database table.

Inside Accumulo, I want to use the ALT_ID as the row id because there
is other data that uses it which will also be stored in the row. I
will process the FOO text file first to result in:

FOO
-------
ALT_ID  NAME   XXX
ALT_ID  AGE     XXX
FOO_ID ALT_ID  XXXX

Can I write to two Accumulo tables using one mapper? If I can, then I
can store the FOO_ID/ALT_ID record in a separate table.

Processing the BAR text file provides:

BAR
------
BAR_ID  TAG  XXXX

Then when I process the LINK table, I can query the FOO table to find
the ALT_ID. And query the BAR table to find the tag. Then combine the
information for the mutation:

FOO
------
ALT_ID TAG XXX

Is there a best practice to query from inside a mapper?

At the end of the work, I can delete the ALT_ID column (or table).

I know that this work is trivial using SQL, but <sigh> that's not an option.

Re: Querying Accumulo From Inside Mapper

Posted by Billie J Rinaldi <bi...@ugov.gov>.
On Tuesday, April 17, 2012 8:49:45 AM, "David Medinets" <da...@gmail.com> wrote:
> I am reading from a text file of linked IDs but I want to store the
> lookup values inside Accumulo.
> 
> RDB FOO
> ------
> FOO_ID <-- this is the autoincrement key
> ALT_ID <-- this is the natural key
> NAME
> AGE
> 
> RDB BAR
> ------
> BAR_ID <-- this is the autoincrement key
> TAG <-- zero or more person
> 
> RDB LINK
> ------
> FOO_ID
> BAR_ID
> 
> * RDB is relational database table.
> 
> Inside Accumulo, I want to use the ALT_ID as the row id because there
> is other data that uses it which will also be stored in the row. I
> will process the FOO text file first to result in:
> 
> FOO
> -------
> ALT_ID NAME XXX
> ALT_ID AGE XXX
> FOO_ID ALT_ID XXXX
> 
> Can I write to two Accumulo tables using one mapper? If I can, then I
> can store the FOO_ID/ALT_ID record in a separate table.

Yes.  The AccumuloOutputFormat is parameterized by <Text,Mutation> where the Text is the table name.

> Processing the BAR text file provides:
> 
> BAR
> ------
> BAR_ID TAG XXXX
> 
> Then when I process the LINK table, I can query the FOO table to find
> the ALT_ID. And query the BAR table to find the tag. Then combine the
> information for the mutation:
> 
> FOO
> ------
> ALT_ID TAG XXX
> 
> Is there a best practice to query from inside a mapper?

Just make sure to do the Accumulo setup in the Mapper setup method.  You'll probably want to look at the InputFormatBase to see how it passes the configuration information.

Billie


> At the end of the work, I can delete the ALT_ID column (or table).
> 
> I know that this work is trivial using SQL, but <sigh> that's not an
> option.