You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erol Akarsu <ea...@gmail.com> on 2020/02/04 23:23:01 UTC

Parallel merge of indexes

I need some help in merging indexes in parallel much faster way. I am using
IndexMergeTool provided by Lucene but it seems very slow. Is there a way to
speed up the process ?

What I do is that I make 16 shards with no replication and then add replica
for every node and every shard. In the last step, I merge indexes. First 2
steps is finished quickly but last merging step takes time

I appreciate your help

Erol Akarsu

-- 

Erol Akarsu

Re: Parallel merge of indexes

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Feb 4, 2020 at 7:37 PM Erick Erickson <er...@gmail.com>
wrote:

> Or is this really an optimize, i.e. you’re trying to merge all the segments
> on each shard down to a single segment so in the end you still have 16
> shards, each with a single segment?


The IndexMergeTool that Erol is using does a forceMerge(1) at the end:
https://s.apache.org/n1l24

Erol, does it take forever after the MergeTool prints "Full Merge..." ?

It would be nice if this tool had better options and defaults.

Re: Parallel merge of indexes

Posted by Erick Erickson <er...@gmail.com>.
_Why_ are you trying to merge indexes? On the surface this doesn’t
make much sense.

You start with 16 shards. Your Zookeeper configuration will show that
each shard has 1/16 of the hash range (based on the <uniqueKey>. What
are you merging? Are you merging all the segments on each shard?
Merging the indexes from the separate shards?? If the latter, your
bookkeeping in Zookeeper will be totally messed up.

Or is this really an optimize, i.e. you’re trying to merge all the segments
on each shard down to a single segment so in the end you still have 16
shards, each with a single segment?

This sounds like an XY problem. You’re trying to accomplish some
end goal and asking how to do Y, without explaining the actual
problem you’re trying to solve, the X.

Perhaps if you give us some background we can suggest alternatives.

Best,
Erick

> On Feb 4, 2020, at 6:58 PM, Erol Akarsu <ea...@gmail.com> wrote:
> 
> I can give time information. I am dealing with big product records. I have 5 million products
> Indexing without replica with 16 shards : 20 minutes
> Add replicas : 5 minutes
> Index merging with IndexMergeTool  : 40 minutes
> 
> On Tue, Feb 4, 2020 at 6:23 PM Erol Akarsu <ea...@gmail.com> wrote:
> I need some help in merging indexes in parallel much faster way. I am using IndexMergeTool provided by Lucene but it seems very slow. Is there a way to speed up the process ?
> 
> What I do is that I make 16 shards with no replication and then add replica for every node and every shard. In the last step, I merge indexes. First 2 steps is finished quickly but last merging step takes time
> 
> I appreciate your help
> 
> Erol Akarsu
> 
> -- 
> 
> Erol Akarsu
> 
> -- 
> 
> Erol Akarsu
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Parallel merge of indexes

Posted by Erol Akarsu <ea...@gmail.com>.
I can give time information. I am dealing with big product records. I have
5 million products
Indexing without replica with 16 shards : 20 minutes
Add replicas : 5 minutes
Index merging with IndexMergeTool  : 40 minutes

On Tue, Feb 4, 2020 at 6:23 PM Erol Akarsu <ea...@gmail.com> wrote:

> I need some help in merging indexes in parallel much faster way. I am
> using IndexMergeTool provided by Lucene but it seems very slow. Is there a
> way to speed up the process ?
>
> What I do is that I make 16 shards with no replication and then add
> replica for every node and every shard. In the last step, I merge indexes.
> First 2 steps is finished quickly but last merging step takes time
>
> I appreciate your help
>
> Erol Akarsu
>
> --
>
> Erol Akarsu
>
> --

Erol Akarsu

Parallel merge of indexes

Posted by Erol Akarsu <ea...@gmail.com>.
I need some help in merging indexes in parallel much faster way. I am using
IndexMergeTool provided by Lucene but it seems very slow. Is there a way to
speed up the process ?

What I do is that I make 16 shards with no replication and then add replica
for every node and every shard. In the last step, I merge indexes. First 2
steps is finished quickly but last merging step takes time

I appreciate your help

Erol Akarsu

-- 

Erol Akarsu

-- 

Erol Akarsu