You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Kurt Christensen <ho...@hoodel.com> on 2013/05/04 16:15:08 UTC

Re: joining accumulo tables with mapreduce

How about three scanners, one for each table? Advance the one with the 
least value (sort-wise) and combine when they match.


On 4/17/13 4:43 PM, Aji Janis wrote:
> Keith,
>
>  You hit the problem that I purposely didn't ask.
> -Accumulo inputformat doesn't support multiple tables at this point and
> -I can't run three mappers in parallel on different tables and 
> combine/send their output to a reducer (that I know of).
>
> If all three tables had the same rowid (eg: rowA exists in table 1, 2 
> and 3) then we can write the row from each table w/a different 
> family/qualifier/value to a new table. So it will be three mappers run 
> sequentially and end result is a join... this is the best I came up 
> with so far. If rowids are different accross three tables then I would 
> have to reformat my rowid from all three tables (normalize) prior to 
> writing the fourth/final table.
>
> Is calling a scanner on the other two tables from within a mapper 
> (that takes the first table as the input) bad? Any clues on how that 
> could be done in mapreduce?
>
>
> On Wed, Apr 17, 2013 at 10:59 AM, Keith Turner <keith@deenlo.com 
> <ma...@deenlo.com>> wrote:
>
>     If I am understaning you correctly, you are proposing for each row a
>     mapper gets to look that row up in two other tables?  This would
>     result in a lot of little round trip RPC calls and random disk
>     accesses.
>
>     I think a better solution would be to read all three tables into your
>     mappers, and do the join in the reduce.  This solution will avoid all
>     of the little RPC calls and do lots of sequential I/O instead of
>     random accesses.  Between the map and reduce, you could track which
>     table each row came from.  Any filtering could be done in the mapper
>     or by iterators.  Unfortunately Accumulo does not have the needed
>     input format for this out of the box.  There is a ticket,
>     ACCUMULO-391.
>
>
>
>     On Tue, Apr 16, 2013 at 5:28 PM, Aji Janis <aji1705@gmail.com
>     <ma...@gmail.com>> wrote:
>     > Hello,
>     >
>     >  I am interested in learning what the best solution/practices
>     might be to
>     > join 3 accumulo tables by running a map reduce job. Interested
>     in getting
>     > feedback on best practices and such. Heres a pseudo code of what
>     I want to
>     > accomplish:
>     >
>     >
>     > AccumuloInputFormat accepts tableA
>     > Global variable <table_list> has table names: tableB, tableC
>     >
>     > In a mapper, for example, you would do something like this:
>     >
>     > for each row in TableA
>     >  if (row.family == "abc" && row.qualifier == "xyz") value =
>     getValue()
>     >  if (foundvalue) {
>     >
>     >   for each table in table_list
>     >     scan table with (this rowid && family = "def")
>     >     for each entry found in scan
>     >       write to final_table (rowid, value_as_family,
>     tablename_as_qualifier,
>     > entry_as_value_string)
>     >
>     > }//end if foundvalue
>     >
>     > }//end for loop
>     >
>     >
>     > This is a simple version of what I want to do. In my non
>     mapreduce java code
>     > I would do this by calling a using different scanners per table
>     in the list.
>     > Couple questions:
>     >
>     >
>     > - how bad/good is performance when using scanners withing mappers?
>     > - if I get one mapper per range in tableA, do I reset scanners?
>     how? or
>     > would I set up a scanner in the setup() of mapper ? --> i have
>     no clue how
>     > this will play out so thinking out loud here.
>     > - any optimization suggestions? or examples of creating
>     join_tables/indexes
>     > out there that I can refer to?
>     >
>     >
>     > Thank you for all suggestions.
>
>

-- 

Kurt Christensen
P.O. Box 811
Westminster, MD 21158-0811

------------------------------------------------------------------------
"One of the penalties for refusing to participate in politics is that 
you end up being governed by your inferiors."
--- Plato