You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by ailinykh <ai...@gmail.com> on 2010/07/29 17:10:56 UTC

How to build multiple inverted indexes?

Hello, everybody!
I have a bunch of records. Each record has key, and two fields A,B - R(k,
A,B)
I want to build two inverted indexes, one per each field. As output I expect
two files
IndexA =(A1- [k1,k2,k3..]),(A2 -[k1,k2,k4...]) ...
IndexB =(B1- [k1,k2,k3..]),(B2-[k1,k2,k4...]) ...

Hadoop mapper allows to write out only one key value pair. Of cause, I can
run it twice, but I don't want to read the same record twice. What is the
best way to handle this situation?

Thank you,
  Andrey
-- 
View this message in context: http://old.nabble.com/How-to-build-multiple-inverted-indexes--tp29297476p29297476.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: How to build multiple inverted indexes?

Posted by Rahul Jain <rj...@gmail.com>.

Hadoop does not prevent you from writing key value pair multiple times in
the same map iteration if that is what is your roadblock.

You can call collector.collect() multiple times with same or distinct key
value pairs within a single map iteration.

-Rahul

On Thu, Jul 29, 2010 at 8:10 AM, ailinykh <ai...@gmail.com> wrote:

>
> Hello, everybody!
> I have a bunch of records. Each record has key, and two fields A,B - R(k,
> A,B)
> I want to build two inverted indexes, one per each field. As output I
> expect
> two files
> IndexA =(A1- [k1,k2,k3..]),(A2 -[k1,k2,k4...]) ...
> IndexB =(B1- [k1,k2,k3..]),(B2-[k1,k2,k4...]) ...
>
> Hadoop mapper allows to write out only one key value pair. Of cause, I can
> run it twice, but I don't want to read the same record twice. What is the
> best way to handle this situation?
>
> Thank you,
>  Andrey
> --
> View this message in context:
> http://old.nabble.com/How-to-build-multiple-inverted-indexes--tp29297476p29297476.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>