You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Chen <to...@gmail.com> on 2014/09/24 18:38:14 UTC

MRIT's morphline mapper doesn't co-locate with data

Hi,

The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
mapper. The mapper doesn't co-locate with the input data that it process.
Isn't this a performance hit?

Ideally, morphline mapper should be run on those hosts that contain most
data blocks for the input files it process.

Regards,
Tom

Re: MRIT's morphline mapper doesn't co-locate with data

Posted by Tom Chen <to...@gmail.com>.
Do you have the solr Jira number for the new ingestion tool?

Thanks

On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek <wh...@cloudera.com>
wrote:

> Based on our measurements, Lucene indexing is so CPU intensive that it
> wouldn’t really help much to exploit data locality on read. The
> overwhelming bottleneck remains the same. Having said that, we have an
> ingestion tool in the works that will take advantage of data locality for
> splitable files as well.
>
> Wolfgang.
>
> On Sep 24, 2014, at 9:38 AM, Tom Chen <to...@gmail.com> wrote:
>
> > Hi,
> >
> > The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> > mapper. The mapper doesn't co-locate with the input data that it process.
> > Isn't this a performance hit?
> >
> > Ideally, morphline mapper should be run on those hosts that contain most
> > data blocks for the input files it process.
> >
> > Regards,
> > Tom
>
>

Re: MRIT's morphline mapper doesn't co-locate with data

Posted by Wolfgang Hoschek <wh...@cloudera.com>.
Based on our measurements, Lucene indexing is so CPU intensive that it wouldn’t really help much to exploit data locality on read. The overwhelming bottleneck remains the same. Having said that, we have an ingestion tool in the works that will take advantage of data locality for splitable files as well.

Wolfgang.

On Sep 24, 2014, at 9:38 AM, Tom Chen <to...@gmail.com> wrote:

> Hi,
> 
> The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> mapper. The mapper doesn't co-locate with the input data that it process.
> Isn't this a performance hit?
> 
> Ideally, morphline mapper should be run on those hosts that contain most
> data blocks for the input files it process.
> 
> Regards,
> Tom