You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by roger dimitri <ro...@yahoo.com> on 2008/01/25 02:00:18 UTC

MapReduce usage with Lucene Indexing

Hi,
   I am very new to Hadoop, and I have a project where I need to use Lucene to index some input given either as a a huge collection of Java objects or one huge java object. 
  I read about Hadoop's MapReduce utilities and I want to leverage that feature in my case described above. 
  Can some one please tell me how I can approach the problem described above. Because all the Hadoop's MapReduce examples out there show only File based input and don't explicitly deal with data coming in as a huge Java object or so to speak.

Any help is greatly appreciated.

Thanks,
Roger




      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Re: MapReduce usage with Lucene Indexing

Posted by Rajagopal Natarajan <ra...@gmail.com>.
On Jan 25, 2008 6:30 AM, roger dimitri <ro...@yahoo.com> wrote:

> Hi,
>   I am very new to Hadoop, and I have a project where I need to use Lucene
> to index some input given either as a a huge collection of Java objects or
> one huge java object.
>  I read about Hadoop's MapReduce utilities and I want to leverage that
> feature in my case described above.
>  Can some one please tell me how I can approach the problem described
> above. Because all the Hadoop's MapReduce examples out there show only File
> based input and don't explicitly deal with data coming in as a huge Java
> object or so to speak.


Something that came just out of my head. When your input is a collection of
smaller objects, each independent of the other, you could serialize all the
objects and write to a file, specify the RecordReader and the reducer would
deserialize each object and perform indexing. I'll have to look into more
details on java.io.Serializable and lucene API to be able to comment more on
it.

-- 
N. Rajagopal,
Visit me at http://www.raja-gopal.com

Re: MapReduce usage with Lucene Indexing

Posted by Bradford Stephens <br...@gmail.com>.
I'm actually going to be doing something similar, with Nutch. I just
started learning about Hadoop this week, so I'm interested in what
everyone has to say :)

On Jan 24, 2008 5:00 PM, roger dimitri <ro...@yahoo.com> wrote:
> Hi,
>    I am very new to Hadoop, and I have a project where I need to use Lucene to index some input given either as a a huge collection of Java objects or one huge java object.
>   I read about Hadoop's MapReduce utilities and I want to leverage that feature in my case described above.
>   Can some one please tell me how I can approach the problem described above. Because all the Hadoop's MapReduce examples out there show only File based input and don't explicitly deal with data coming in as a huge Java object or so to speak.
>
> Any help is greatly appreciated.
>
> Thanks,
> Roger
>
>
>
>
>       ____________________________________________________________________________________
> Never miss a thing.  Make Yahoo your home page.
> http://www.yahoo.com/r/hs