You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vidya <vi...@tcs.com> on 2016/01/04 14:00:25 UTC

MapReduceIndexerTool Indexing

Hi

I have used MapReduceIndexerTool to index data in my hdfs to solr inorder to
search it. I want to know whether it indexes entire data when some new data
is added to that path, again when tool is run on it.

Thanks in advance



--
View this message in context: http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MapReduceIndexerTool Indexing

Posted by Erick Erickson <er...@gmail.com>.
MRIT is not designed for that scenario, so you simply can't.

What people usually do is have a process whereby, after
the initial bulk load, there is some way their system-of-record
"knows" what new docs have been added since and
indexes only those. Flume is sometimes used if you have
access.

Best,
Erick

On Mon, Jan 4, 2016 at 10:13 PM, vidya <vi...@tcs.com> wrote:
> Hi
>
> I would like to index only new data but not already indexed data(delta
> Indexing). how can i achieve it using MRIT.
>
> Thanks in advance
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387p4248573.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: MapReduceIndexerTool Indexing

Posted by vidya <vi...@tcs.com>.
Hi

I would like to index only new data but not already indexed data(delta
Indexing). how can i achieve it using MRIT.

Thanks in advance



--
View this message in context: http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387p4248573.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MapReduceIndexerTool Indexing

Posted by Erick Erickson <er...@gmail.com>.
Yes it does. MRIT is intended for initial bulk loads. It takes whatever
it's pointed at and indexes it.

Additionally, it does not update documents. If the same document (by
ID) is indexed twice, you'll wind up with two copies in your results.

Best,
Erick

On Mon, Jan 4, 2016 at 5:00 AM, vidya <vi...@tcs.com> wrote:
> Hi
>
> I have used MapReduceIndexerTool to index data in my hdfs to solr inorder to
> search it. I want to know whether it indexes entire data when some new data
> is added to that path, again when tool is run on it.
>
> Thanks in advance
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387.html
> Sent from the Solr - User mailing list archive at Nabble.com.