You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Chen <to...@gmail.com> on 2014/09/24 18:38:14 UTC
MRIT's morphline mapper doesn't co-locate with data
Hi,
The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
mapper. The mapper doesn't co-locate with the input data that it process.
Isn't this a performance hit?
Ideally, morphline mapper should be run on those hosts that contain most
data blocks for the input files it process.
Regards,
Tom
Re: MRIT's morphline mapper doesn't co-locate with data
Posted by Tom Chen <to...@gmail.com>.
Do you have the solr Jira number for the new ingestion tool?
Thanks
On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek <wh...@cloudera.com>
wrote:
> Based on our measurements, Lucene indexing is so CPU intensive that it
> wouldn’t really help much to exploit data locality on read. The
> overwhelming bottleneck remains the same. Having said that, we have an
> ingestion tool in the works that will take advantage of data locality for
> splitable files as well.
>
> Wolfgang.
>
> On Sep 24, 2014, at 9:38 AM, Tom Chen <to...@gmail.com> wrote:
>
> > Hi,
> >
> > The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> > mapper. The mapper doesn't co-locate with the input data that it process.
> > Isn't this a performance hit?
> >
> > Ideally, morphline mapper should be run on those hosts that contain most
> > data blocks for the input files it process.
> >
> > Regards,
> > Tom
>
>
Re: MRIT's morphline mapper doesn't co-locate with data
Posted by Wolfgang Hoschek <wh...@cloudera.com>.
Based on our measurements, Lucene indexing is so CPU intensive that it wouldn’t really help much to exploit data locality on read. The overwhelming bottleneck remains the same. Having said that, we have an ingestion tool in the works that will take advantage of data locality for splitable files as well.
Wolfgang.
On Sep 24, 2014, at 9:38 AM, Tom Chen <to...@gmail.com> wrote:
> Hi,
>
> The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> mapper. The mapper doesn't co-locate with the input data that it process.
> Isn't this a performance hit?
>
> Ideally, morphline mapper should be run on those hosts that contain most
> data blocks for the input files it process.
>
> Regards,
> Tom