You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sumit Aggarwal <su...@gmail.com> on 2009/07/06 10:10:45 UTC

Index partitioning with solr multiple core feature

I was trying to implement entity based partitioning using multiple core
feature.
So my solr.xml is like :
<solr sharedLib="lib" persistent="true">
<cores adminPath="/admin/cores">
<core default="true" instanceDir="user" name="User">
<property name="dataDir" value="/user/data" />
</core>
<core default="false" instanceDir="group" name="group">
<property name="dataDir" value="/group/data" />
</core>
</cores>
</solr>

Now using http://localhost:8983/solr/User/ or
http://localhost:8983/solr/Group/ i am able to reach seperate partition for
entity based search. Now question arises for entity based indexing. I was
reading http://wiki.apache.org/solr/IndexPartitioning document but it does
not help much.... How can i do entity based indexing of document..
I don't want to make http url based on entity for indexing purpose. Kindly
help me in this?

Another requirement: Since i have entity based partitioning and each entity
can have total index size more than 10GB so i need another partitioning
inside entity like based on no of document in an index inside entity. How
can i do this? Unfortunately solr wiki does not says much on partitioning..
-- 
Cheers....
Sumit

Re: Index partitioning with solr multiple core feature

Posted by Sumit Aggarwal <su...@gmail.com>.
I forgot to mention i already have a partitioning to 3 different servers for
each entity based on some unique int value.

On Mon, Jul 6, 2009 at 1:40 PM, Sumit Aggarwal <su...@gmail.com>wrote:

> I was trying to implement entity based partitioning using multiple core
> feature.
> So my solr.xml is like :
> <solr sharedLib="lib" persistent="true">
> <cores adminPath="/admin/cores">
>  <core default="true" instanceDir="user" name="User">
> <property name="dataDir" value="/user/data" />
>  </core>
> <core default="false" instanceDir="group" name="group">
>  <property name="dataDir" value="/group/data" />
> </core>
>  </cores>
> </solr>
>
> Now using http://localhost:8983/solr/User/ or
> http://localhost:8983/solr/Group/ i am able to reach seperate partition
> for entity based search. Now question arises for entity based indexing. I
> was reading http://wiki.apache.org/solr/IndexPartitioning document but it
> does not help much.... How can i do entity based indexing of document..
> I don't want to make http url based on entity for indexing purpose. Kindly
> help me in this?
>
> Another requirement: Since i have entity based partitioning and each entity
> can have total index size more than 10GB so i need another partitioning
> inside entity like based on no of document in an index inside entity. How
> can i do this? Unfortunately solr wiki does not says much on partitioning..
> --
> Cheers....
> Sumit
>



-- 
Cheers....
Sumit
9818621804

Re: Index partitioning with solr multiple core feature

Posted by Sumit Aggarwal <su...@gmail.com>.
Shalin,
at a time i will be doing search only on one entity... Also data will be
indexed only to corresponding entity.

Thanks,
Sumit

On Mon, Jul 6, 2009 at 3:05 PM, Sumit Aggarwal <su...@gmail.com>wrote:

> Hi Shalin,
> Yes i want to achieve a logical separation of indexes for performance
> reason also else index size will keep on growing as i have 8 different
> entities. I am already partitioning all these entities to different servers
> also on which i will be doing search based on distributed search by solr
> using shards and collecting merged results from 3 different servers. You
> mentioned i wont achieve putting all partitions on the same box , why is
> that so?
>
> While reading solr core it says solr core is used for different
> applications only.... My search on different entities is also a type of
> different applications theoritically ....
>
> Does solr provides any good support for index partitioning.
> Thanks,
> Sumit
>
> On Mon, Jul 6, 2009 at 2:43 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> On Mon, Jul 6, 2009 at 1:40 PM, Sumit Aggarwal <sumit.kaggarwal@gmail.com
>> >wrote:
>>
>> > I was trying to implement entity based partitioning using multiple core
>> > feature.
>> > So my solr.xml is like :
>> > <solr sharedLib="lib" persistent="true">
>> > <cores adminPath="/admin/cores">
>> > <core default="true" instanceDir="user" name="User">
>> > <property name="dataDir" value="/user/data" />
>> > </core>
>> > <core default="false" instanceDir="group" name="group">
>> > <property name="dataDir" value="/group/data" />
>> > </core>
>> > </cores>
>> > </solr>
>> >
>> > Now using http://localhost:8983/solr/User/ or
>> > http://localhost:8983/solr/Group/ i am able to reach seperate partition
>> > for
>> > entity based search. Now question arises for entity based indexing. I
>> was
>> > reading http://wiki.apache.org/solr/IndexPartitioning document but it
>> does
>> > not help much.... How can i do entity based indexing of document..
>> > I don't want to make http url based on entity for indexing purpose.
>>
>>
>> Why not? You know which document belongs to which "entity" so you can
>> select
>> which core to post that document to.
>>
>>
>>
>> > Another requirement: Since i have entity based partitioning and each
>> entity
>> > can have total index size more than 10GB so i need another partitioning
>> > inside entity like based on no of document in an index inside entity.
>> How
>> > can i do this? Unfortunately solr wiki does not says much on
>> partitioning..
>> >
>>
>> What are you trying to achieve by partitioning your data? Is it just for
>> logical separation? If it is for performance reasons, I don't think you'll
>> gain much by putting all partitions on the same box.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
>
> --
> Cheers....
> Sumit
> 9818621804
>



-- 
Cheers....
Sumit
9818621804

Re: Index partitioning with solr multiple core feature

Posted by Sumit Aggarwal <su...@gmail.com>.
Shalin,
First of all each entity data is unrelated so it makes sense to use solr
core concept as per your suggestion.

But Since you are suggesting putting each entity index on same box will
consume CPU so does it make sense to add boxes based on number of
entities considering i will have to add replication boxes also amounting a
huge cost.

This is what i am thinking after your suggestion - Have separate boxes for
each entity and then inside each entity do some partitioning based on round
robin or some strategy. With this if i am searching on any entity data then
i will just require to reach a box for that entity. Now since i am doing a
partitioning inside an entity also how i will do search for data so that i
got merged result from each partition in a single entity box. If i doing
this type of partitioning than which functionality of solr i will use ... is
it http://wiki.apache.org/solr/IndexPartitioning ?

My actual concern is performance irrespective of implementation
design considering a good scaling logic also for future .


On Mon, Jul 6, 2009 at 3:16 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Mon, Jul 6, 2009 at 3:05 PM, Sumit Aggarwal <sumit.kaggarwal@gmail.com
> >wrote:
>
> > Hi Shalin,
> > Yes i want to achieve a logical separation of indexes for performance
> > reason
> > also else index size will keep on growing as i have 8 different entities.
> I
> > am already partitioning all these entities to different servers also on
> > which i will be doing search based on distributed search by solr using
> > shards and collecting merged results from 3 different servers. You
> > mentioned
> > i wont achieve putting all partitions on the same box , why is that so?
>
>
> This is because each shard will compete for CPU and disk if you put them on
> the same box. Logical separation and partitioning for performance are two
> different things. You should partition if one Solr instance is not able to
> hold the complete index or if it is not giving you the desired performance.
> You can use multiple cores if the data is unrelated and you wouldn't need
> to
> search on all of them.
>
> In your case, the primary reason is performance, so it makes sense to put
> each shard on a separate box.
>
>
> > While reading solr core it says solr core is used for different
> > applications
> > only.... My search on different entities is also a type of different
> > applications theoritically ....
> >
> > Does solr provides any good support for index partitioning.
>
>
> No. Partitioning is not done by Solr. So you should decide your
> partitioning
> scheme: round robin, fixed hashing, random etc. Once you have partitioned
> your data, a distributed search helps you search over all the shards in one
> go.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Cheers....
Sumit

Re: Index partitioning with solr multiple core feature

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Mon, Jul 6, 2009 at 3:05 PM, Sumit Aggarwal <su...@gmail.com>wrote:

> Hi Shalin,
> Yes i want to achieve a logical separation of indexes for performance
> reason
> also else index size will keep on growing as i have 8 different entities. I
> am already partitioning all these entities to different servers also on
> which i will be doing search based on distributed search by solr using
> shards and collecting merged results from 3 different servers. You
> mentioned
> i wont achieve putting all partitions on the same box , why is that so?


This is because each shard will compete for CPU and disk if you put them on
the same box. Logical separation and partitioning for performance are two
different things. You should partition if one Solr instance is not able to
hold the complete index or if it is not giving you the desired performance.
You can use multiple cores if the data is unrelated and you wouldn't need to
search on all of them.

In your case, the primary reason is performance, so it makes sense to put
each shard on a separate box.


> While reading solr core it says solr core is used for different
> applications
> only.... My search on different entities is also a type of different
> applications theoritically ....
>
> Does solr provides any good support for index partitioning.


No. Partitioning is not done by Solr. So you should decide your partitioning
scheme: round robin, fixed hashing, random etc. Once you have partitioned
your data, a distributed search helps you search over all the shards in one
go.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Index partitioning with solr multiple core feature

Posted by Sumit Aggarwal <su...@gmail.com>.
Hi Shalin,
Yes i want to achieve a logical separation of indexes for performance reason
also else index size will keep on growing as i have 8 different entities. I
am already partitioning all these entities to different servers also on
which i will be doing search based on distributed search by solr using
shards and collecting merged results from 3 different servers. You mentioned
i wont achieve putting all partitions on the same box , why is that so?

While reading solr core it says solr core is used for different applications
only.... My search on different entities is also a type of different
applications theoritically ....

Does solr provides any good support for index partitioning.
Thanks,
Sumit

On Mon, Jul 6, 2009 at 2:43 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Mon, Jul 6, 2009 at 1:40 PM, Sumit Aggarwal <sumit.kaggarwal@gmail.com
> >wrote:
>
> > I was trying to implement entity based partitioning using multiple core
> > feature.
> > So my solr.xml is like :
> > <solr sharedLib="lib" persistent="true">
> > <cores adminPath="/admin/cores">
> > <core default="true" instanceDir="user" name="User">
> > <property name="dataDir" value="/user/data" />
> > </core>
> > <core default="false" instanceDir="group" name="group">
> > <property name="dataDir" value="/group/data" />
> > </core>
> > </cores>
> > </solr>
> >
> > Now using http://localhost:8983/solr/User/ or
> > http://localhost:8983/solr/Group/ i am able to reach seperate partition
> > for
> > entity based search. Now question arises for entity based indexing. I was
> > reading http://wiki.apache.org/solr/IndexPartitioning document but it
> does
> > not help much.... How can i do entity based indexing of document..
> > I don't want to make http url based on entity for indexing purpose.
>
>
> Why not? You know which document belongs to which "entity" so you can
> select
> which core to post that document to.
>
>
>
> > Another requirement: Since i have entity based partitioning and each
> entity
> > can have total index size more than 10GB so i need another partitioning
> > inside entity like based on no of document in an index inside entity. How
> > can i do this? Unfortunately solr wiki does not says much on
> partitioning..
> >
>
> What are you trying to achieve by partitioning your data? Is it just for
> logical separation? If it is for performance reasons, I don't think you'll
> gain much by putting all partitions on the same box.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Cheers....
Sumit
9818621804

Re: Index partitioning with solr multiple core feature

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Mon, Jul 6, 2009 at 1:40 PM, Sumit Aggarwal <su...@gmail.com>wrote:

> I was trying to implement entity based partitioning using multiple core
> feature.
> So my solr.xml is like :
> <solr sharedLib="lib" persistent="true">
> <cores adminPath="/admin/cores">
> <core default="true" instanceDir="user" name="User">
> <property name="dataDir" value="/user/data" />
> </core>
> <core default="false" instanceDir="group" name="group">
> <property name="dataDir" value="/group/data" />
> </core>
> </cores>
> </solr>
>
> Now using http://localhost:8983/solr/User/ or
> http://localhost:8983/solr/Group/ i am able to reach seperate partition
> for
> entity based search. Now question arises for entity based indexing. I was
> reading http://wiki.apache.org/solr/IndexPartitioning document but it does
> not help much.... How can i do entity based indexing of document..
> I don't want to make http url based on entity for indexing purpose.


Why not? You know which document belongs to which "entity" so you can select
which core to post that document to.



> Another requirement: Since i have entity based partitioning and each entity
> can have total index size more than 10GB so i need another partitioning
> inside entity like based on no of document in an index inside entity. How
> can i do this? Unfortunately solr wiki does not says much on partitioning..
>

What are you trying to achieve by partitioning your data? Is it just for
logical separation? If it is for performance reasons, I don't think you'll
gain much by putting all partitions on the same box.

-- 
Regards,
Shalin Shekhar Mangar.