You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jaikit Savla <ja...@yahoo.com.INVALID> on 2015/01/14 09:42:51 UTC

Load existing Lucene sharded indexes onto single Solr collection

Folks,
I have generated multiple (count of 100) sharded Lucene indexes on Hadoop and they are of format. The total indexed data (sum of all the index-*) is of size 500GB and hence the number of shards.drwxr-x--- 2 index-66drwxr-x--- 2 index-68drwxr-x--- 2 index-9........
and each index directory is of formatls index-9_4.fdt  _4.fdx  _4.fnm  _4_Lucene40_0.frq  _4_Lucene40_0.prx  _4_Lucene40_0.tim  _4_Lucene40_0.tip  _4_nrm.cfe  _4_nrm.cfs  _4.si  segments_1  segments.gen  write.lock
Now to load this index, I am currently using Lucene IndexMergeTool to merge all the shards into one giant index. My question is, is there a way to load shared index without merging into one giant index on to single collection ?
Thanks,Jaikit

Re: Load existing Lucene sharded indexes onto single Solr collection

Posted by Jaikit Savla <ja...@yahoo.com.INVALID>.
Yes, I wanted to get rid of merge step. But looks like merge is not that cumbersome either. Thanks Mikhail and Erick for pointers, that helped.
Jaikit 

     On Wednesday, January 14, 2015 8:24 AM, Erick Erickson <er...@gmail.com> wrote:
   

 You certainly can't do this into a single directory, there would be
zillions of name conflicts.

I believe I saw Uwe make a comment on the Lucene list about using
MultiReaders and
keeping the sub-indexes in different directories, but that's
lower-level than Solr has access to
Plus, you'd have to control index updates _very_ carefully.

So I don't think there's something built into Solr to work with
indexes like this, so merge is
probably your only option here.

Do note that the contrib MapReduceIndexerTool that will do most all of
this for you, it includes
a --go-live option. That option still copies things around though.

Best,
Erick

On Wed, Jan 14, 2015 at 1:25 AM, Jaikit Savla
<ja...@yahoo.com.invalid> wrote:
> This solution will merge the index as well. I want to find out if merge is "required" before loading indexes onto Solr ?  If that is possible than I can just point solrconfig.xml to directory where I have all the shards.
> Jaikit
>
>      On Wednesday, January 14, 2015 1:11 AM, Mikhail Khludnev <mk...@griddynamics.com> wrote:
>
>
>
> On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla <ja...@yahoo.com.invalid> wrote:
>
> Now to load this index, I am currently using Lucene IndexMergeTool to merge all the shards into one giant index. My question is, is there a way to load shared index without merging into one giant index on to single collection ?
>
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES ?
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>
>
>
>


   

Re: Load existing Lucene sharded indexes onto single Solr collection

Posted by Erick Erickson <er...@gmail.com>.
You certainly can't do this into a single directory, there would be
zillions of name conflicts.

I believe I saw Uwe make a comment on the Lucene list about using
MultiReaders and
keeping the sub-indexes in different directories, but that's
lower-level than Solr has access to
Plus, you'd have to control index updates _very_ carefully.

So I don't think there's something built into Solr to work with
indexes like this, so merge is
probably your only option here.

Do note that the contrib MapReduceIndexerTool that will do most all of
this for you, it includes
a --go-live option. That option still copies things around though.

Best,
Erick

On Wed, Jan 14, 2015 at 1:25 AM, Jaikit Savla
<ja...@yahoo.com.invalid> wrote:
> This solution will merge the index as well. I want to find out if merge is "required" before loading indexes onto Solr ?  If that is possible than I can just point solrconfig.xml to directory where I have all the shards.
> Jaikit
>
>      On Wednesday, January 14, 2015 1:11 AM, Mikhail Khludnev <mk...@griddynamics.com> wrote:
>
>
>
> On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla <ja...@yahoo.com.invalid> wrote:
>
> Now to load this index, I am currently using Lucene IndexMergeTool to merge all the shards into one giant index. My question is, is there a way to load shared index without merging into one giant index on to single collection ?
>
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES ?
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>
>
>
>

Re: Load existing Lucene sharded indexes onto single Solr collection

Posted by Jaikit Savla <ja...@yahoo.com.INVALID>.
This solution will merge the index as well. I want to find out if merge is "required" before loading indexes onto Solr ?  If that is possible than I can just point solrconfig.xml to directory where I have all the shards.
Jaikit 

     On Wednesday, January 14, 2015 1:11 AM, Mikhail Khludnev <mk...@griddynamics.com> wrote:
   

 
On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla <ja...@yahoo.com.invalid> wrote:

Now to load this index, I am currently using Lucene IndexMergeTool to merge all the shards into one giant index. My question is, is there a way to load shared index without merging into one giant index on to single collection ?

https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES ?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics




   

Re: Load existing Lucene sharded indexes onto single Solr collection

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
On Wed, Jan 14, 2015 at 11:42 AM, Jaikit Savla <
jaikit.savla@yahoo.com.invalid> wrote:

> Now to load this index, I am currently using Lucene IndexMergeTool to
> merge all the shards into one giant index. My question is, is there a way
> to load shared index without merging into one giant index on to single
> collection ?


https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-MERGEINDEXES
?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>