You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Neha Gupta <ne...@uni-jena.de> on 2022/04/10 20:31:56 UTC

Regarding indexing data in different cores or same core with different entities.

Dear Solr Community,

Need your advice for the best option

I have four tables in the DB and i want to index them in Solr.

I want to query these tables differently in Solr as they don't have any 
relation between them. Could you please tell whether i should create 
different cores for each table or should i indexed them in one core with 
different entities. If latter is the case then how i can query Solr on 
basis of entity?


Thanks and Regards
Neha Gupta

Re: Regarding indexing data in different cores or same core with different entities.

Posted by Thomas Corthals <th...@klascement.net>.

We have a similar setup where entities of different types all go in a
single core. Folding, stemming, managed synonyms … have to be the same for
all entity types. I find it easier to only have to keep one schema up to
date with business needs. Adding a new entity type to the index can usually
be done entirely in code. The logic for managing synonyms through the REST
API doesn't even have to be aware of these changes.

Our schema has 3 required fields:

   - uid is the uniqueKey (a combination of type + id) and is used for
   atomic updates and delete-by-id
   - type is a string field that is used in filter queries
   - id is the (usually autoincrement) identifier from the database and is
   what most queries retrieve

All other fields are dynamic. It's very rare that I have had to add a
dynamic field definition because most of our data is either natural
language text, verbatim strings, dates or integers that are foreign keys in
the database.

Another benefit of this approach is that you can query "across tables" on a
common field and get facet counts per entity type.

That's what works for us. If you don't need to facet across tables, if you
want to define each field explicitly because you don't want a different
field name in Solr (to match the dynamic field wildcard), if you have
specific schema requirements per table, if you don't have to bother with
managed synonyms … you might be better off with a core (collection) per
table.

Thomas

Op ma 11 apr. 2022 om 02:31 schreef Walter Underwood <wunder@wunderwood.org
>:

> I would make four cores (collections). With a single one, the schema is a
> union of all of the tables, so a mess to manage. There will be lots of
> comments about which field belongs to which table.
>
> Make four collections with four schemas that match the four tables. You
> can load them independently and update the schemas independently.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 10, 2022, at 4:56 PM, David Hastings <
> hastings.recursive@gmail.com> wrote:
> >
> > The different field types for the same named field I didn’t take into
> > account.  That would throw a wrench into it if one table wanted facets
> on a
> > field but the other just wanted text searching on the same field name for
> > example.
> >
> > Guess without context the question becomes more difficult to answer, such
> > as is it one client for all the tables, or one each, or…. Why even use a
> > rdbms if there is no r in it in the first place?
> >
> > On Sun, Apr 10, 2022 at 7:51 PM Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 4/10/2022 2:31 PM, Neha Gupta wrote:
> >>> I want to query these tables differently in Solr as they don't have
> >>> any relation between them. Could you please tell whether i should
> >>> create different cores for each table or should i indexed them in one
> >>> core with different entities. If latter is the case then how i can
> >>> query Solr on basis of entity?
> >>
> >> I'm going to disagree with Saurabh Sharma here.
> >>
> >> If there truly is no relationship between the data in those tables, then
> >> I would index them as separate cores, or collections if running in cloud
> >> mode.  The configurations will be cleaner, and there is much less chance
> >> of a change in one table causing general problems for a combined core
> >> ... those effects would be limited to the core for the table that is
> >> changing.
> >>
> >> If you can use the same fields for data coming from multiple tables,
> >> there is a certain amount of space savings that can be realized by
> >> having one index instead of four, due to the way that Lucene file
> >> formats work.  For most setups, that space savings will be very small
> >> compared to the problems that you can avoid by not combining the data.
> >>
> >> The only time it makes sense to have the data from multiple database
> >> tables in the same core is when there is a definite relationship between
> >> the tables.  If you use JOIN queries on the DB server on a regular
> >> basis, and that extends to searching as well, performance in Solr will
> >> be better if Solr does not have to do a JOIN to accomplish its work.
> >> The cross-core join capability in Solr is fairly limited and is NOT what
> >> someone familiar with database joins would expect, particularly in the
> >> arena of performance.
> >>
> >> As mentioned by Dave, if you do combine the data, you will want to have
> >> at least one field indexed that can filter results as you need them
> >> filtered.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>

Re: Regarding indexing data in different cores or same core with different entities.

Posted by Walter Underwood <wu...@wunderwood.org>.

I would make four cores (collections). With a single one, the schema is a union of all of the tables, so a mess to manage. There will be lots of comments about which field belongs to which table.

Make four collections with four schemas that match the four tables. You can load them independently and update the schemas independently.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 10, 2022, at 4:56 PM, David Hastings <ha...@gmail.com> wrote:
> 
> The different field types for the same named field I didn’t take into
> account.  That would throw a wrench into it if one table wanted facets on a
> field but the other just wanted text searching on the same field name for
> example.
> 
> Guess without context the question becomes more difficult to answer, such
> as is it one client for all the tables, or one each, or…. Why even use a
> rdbms if there is no r in it in the first place?
> 
> On Sun, Apr 10, 2022 at 7:51 PM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 4/10/2022 2:31 PM, Neha Gupta wrote:
>>> I want to query these tables differently in Solr as they don't have
>>> any relation between them. Could you please tell whether i should
>>> create different cores for each table or should i indexed them in one
>>> core with different entities. If latter is the case then how i can
>>> query Solr on basis of entity?
>> 
>> I'm going to disagree with Saurabh Sharma here.
>> 
>> If there truly is no relationship between the data in those tables, then
>> I would index them as separate cores, or collections if running in cloud
>> mode.  The configurations will be cleaner, and there is much less chance
>> of a change in one table causing general problems for a combined core
>> ... those effects would be limited to the core for the table that is
>> changing.
>> 
>> If you can use the same fields for data coming from multiple tables,
>> there is a certain amount of space savings that can be realized by
>> having one index instead of four, due to the way that Lucene file
>> formats work.  For most setups, that space savings will be very small
>> compared to the problems that you can avoid by not combining the data.
>> 
>> The only time it makes sense to have the data from multiple database
>> tables in the same core is when there is a definite relationship between
>> the tables.  If you use JOIN queries on the DB server on a regular
>> basis, and that extends to searching as well, performance in Solr will
>> be better if Solr does not have to do a JOIN to accomplish its work.
>> The cross-core join capability in Solr is fairly limited and is NOT what
>> someone familiar with database joins would expect, particularly in the
>> arena of performance.
>> 
>> As mentioned by Dave, if you do combine the data, you will want to have
>> at least one field indexed that can filter results as you need them
>> filtered.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: Regarding indexing data in different cores or same core with different entities.

Posted by David Hastings <ha...@gmail.com>.

The different field types for the same named field I didn’t take into
account.  That would throw a wrench into it if one table wanted facets on a
field but the other just wanted text searching on the same field name for
example.

Guess without context the question becomes more difficult to answer, such
as is it one client for all the tables, or one each, or…. Why even use a
rdbms if there is no r in it in the first place?

On Sun, Apr 10, 2022 at 7:51 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 4/10/2022 2:31 PM, Neha Gupta wrote:
> > I want to query these tables differently in Solr as they don't have
> > any relation between them. Could you please tell whether i should
> > create different cores for each table or should i indexed them in one
> > core with different entities. If latter is the case then how i can
> > query Solr on basis of entity?
>
> I'm going to disagree with Saurabh Sharma here.
>
> If there truly is no relationship between the data in those tables, then
> I would index them as separate cores, or collections if running in cloud
> mode.  The configurations will be cleaner, and there is much less chance
> of a change in one table causing general problems for a combined core
> ... those effects would be limited to the core for the table that is
> changing.
>
> If you can use the same fields for data coming from multiple tables,
> there is a certain amount of space savings that can be realized by
> having one index instead of four, due to the way that Lucene file
> formats work.  For most setups, that space savings will be very small
> compared to the problems that you can avoid by not combining the data.
>
> The only time it makes sense to have the data from multiple database
> tables in the same core is when there is a definite relationship between
> the tables.  If you use JOIN queries on the DB server on a regular
> basis, and that extends to searching as well, performance in Solr will
> be better if Solr does not have to do a JOIN to accomplish its work.
> The cross-core join capability in Solr is fairly limited and is NOT what
> someone familiar with database joins would expect, particularly in the
> arena of performance.
>
> As mentioned by Dave, if you do combine the data, you will want to have
> at least one field indexed that can filter results as you need them
> filtered.
>
> Thanks,
> Shawn
>
>

Re: Regarding indexing data in different cores or same core with different entities.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/10/2022 2:31 PM, Neha Gupta wrote:
> I want to query these tables differently in Solr as they don't have 
> any relation between them. Could you please tell whether i should 
> create different cores for each table or should i indexed them in one 
> core with different entities. If latter is the case then how i can 
> query Solr on basis of entity?

I'm going to disagree with Saurabh Sharma here.

If there truly is no relationship between the data in those tables, then 
I would index them as separate cores, or collections if running in cloud 
mode.  The configurations will be cleaner, and there is much less chance 
of a change in one table causing general problems for a combined core 
... those effects would be limited to the core for the table that is 
changing.

If you can use the same fields for data coming from multiple tables, 
there is a certain amount of space savings that can be realized by 
having one index instead of four, due to the way that Lucene file 
formats work.  For most setups, that space savings will be very small 
compared to the problems that you can avoid by not combining the data.

The only time it makes sense to have the data from multiple database 
tables in the same core is when there is a definite relationship between 
the tables.  If you use JOIN queries on the DB server on a regular 
basis, and that extends to searching as well, performance in Solr will 
be better if Solr does not have to do a JOIN to accomplish its work.  
The cross-core join capability in Solr is fairly limited and is NOT what 
someone familiar with database joins would expect, particularly in the 
arena of performance.

As mentioned by Dave, if you do combine the data, you will want to have 
at least one field indexed that can filter results as you need them 
filtered.

Thanks,
Shawn

Re: Regarding indexing data in different cores or same core with different entities.

Posted by Dave <ha...@gmail.com>.

This is a good place to use a filter query as well, especially if you want results from any combination of the tables

> On Apr 10, 2022, at 5:05 PM, Saurabh Sharma <sa...@gmail.com> wrote:
> 
> In case you are having very less data in tables then you should index all
> four tables in a single core. With every document you can index table
> identifier and during query you can use that identifier .
> 
> 
> //For table person
> {
> 'name':'neha'
> 'type':'person'
> }
> 
> //For table department
> {
> 'name':'information technology'
> 'type':'department'
> }
> 
> //For sport table
> {
> 'name':'cricket'
> 'type':'sport'
> }
> 
> all these docs have name field but can be distinguished on the basis of
> type field .
> 
> 
> Thanks & Regards
> Saurabh
> 
> 
>> On Mon, Apr 11, 2022, 2:02 AM Neha Gupta <ne...@uni-jena.de> wrote:
>> 
>> Dear Solr Community,
>> 
>> Need your advice for the best option
>> 
>> I have four tables in the DB and i want to index them in Solr.
>> 
>> I want to query these tables differently in Solr as they don't have any
>> relation between them. Could you please tell whether i should create
>> different cores for each table or should i indexed them in one core with
>> different entities. If latter is the case then how i can query Solr on
>> basis of entity?
>> 
>> 
>> Thanks and Regards
>> Neha Gupta
>> 
>>

Re: Regarding indexing data in different cores or same core with different entities.

Posted by Saurabh Sharma <sa...@gmail.com>.

In case you are having very less data in tables then you should index all
four tables in a single core. With every document you can index table
identifier and during query you can use that identifier .

//For table person
{
'name':'neha'
'type':'person'
}

//For table department
{
'name':'information technology'
'type':'department'
}

//For sport table
{
'name':'cricket'
'type':'sport'
}

all these docs have name field but can be distinguished on the basis of
type field .

Thanks & Regards
Saurabh

On Mon, Apr 11, 2022, 2:02 AM Neha Gupta <ne...@uni-jena.de> wrote:

> Dear Solr Community,
>
> Need your advice for the best option
>
> I have four tables in the DB and i want to index them in Solr.
>
> I want to query these tables differently in Solr as they don't have any
> relation between them. Could you please tell whether i should create
> different cores for each table or should i indexed them in one core with
> different entities. If latter is the case then how i can query Solr on
> basis of entity?
>
>
> Thanks and Regards
> Neha Gupta
>
>