You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chantal Ackermann <ch...@btelligent.de> on 2010/07/30 10:01:29 UTC

Re: advice on creating a solr index when data source is from many unrelated db tables

Hi Ahmed,

fields that are empty do not impact the index. It's different from a
database.
I have text fields for different languages and per document there is
always only one of the languages set (the text fields for the other
languages are empty/not set). It works all very well and fast.

I wonder more about what you describe as "unrelated data" - why would
you want to put unrelated data into a single index? If you want to
search on all the data and return mixed results there surely must be
some kind of relation between the documents?

Chantal

On Thu, 2010-07-29 at 21:33 +0200, S Ahmed wrote:
> I understand (and its straightforward) when you want to create a index for
> something simple like Products.
> 
> But how do you go about creating a Solr index when you have data coming from
> 10-15 database tables, and the tables have unrelated data?
> 
> The issue is then you would have many 'columns' in your index, and they will
> be NULL for much of the data since you are trying to shove 15 db tables into
> a single Solr/Lucense index.
> 
> 
> This must be a common problem, what are the potential solutions?




Re: advice on creating a solr index when data source is from many unrelated db tables

Posted by Erick Erickson <er...@gmail.com>.
Yes, there might be 50 fields. This is not a problem for SOLR/Lucene.

HOWEVER: SOLR is NOT a database and shouldn't be used like one. Your
emails hint that you're thinking like a DB person, not a search person. I
guarantee that if you just index the database information into 9 different
document types (using your example), the next question you'll have is
a variant on "how do I do a join on different documents in a single
search?".
And the answer is "you don't".

Instead, go against all your training in the DB world and DE-normalize your
data. Your goal here is for each "document" to contain all the information
you need to satisfy your requirements in one SOLR query.

I know, I know... the first reaction any experienced DB admin has to that
suggestion is a horrified "Do WHAT?!?!?!". But remember that SOLR is
a search application. Database developers have put a huge amount of effort
into being able to do joins. And they do it very well. SOLR doesn't
make any attempt to "do the join thing better", and it shouldn't. It
searches
massive amounts of text. And it does it very well. Disk space is cheap.

So step back and ask yourself what you're trying to do with SOLR. It may be
that you need the best of both worlds, in which case you index your
searchable
data along with your DB primary keys and use SOLR to search your text then
use your DB to do the other stuff. It may be that you need to flatten your
data and only have one document type and just use SOLR. It may be many
things.
But I almost guarantee that the thing it isn't is to just index the tables
as they
are <G>....

Best
Erick


On Fri, Jul 30, 2010 at 12:58 PM, S Ahmed <sa...@gmail.com> wrote:

> So I have tables like this:
>
> Users
> UserSales
> UserHistory
> UserAddresses
> UserNotes
> ClientAddress
> CalenderEvent
> Articles
> Blogs
>
> Just seems odd to me, jamming on these tables into a single index.  But I
> guess the idea of using a 'type' field to quality exactly what I am
> searching is a good idea, in case I need to filter for only 'articles' or
> blogs or contacts etc.
>
> But there might be 50 fields if I do this no?
>
>
>
> On Fri, Jul 30, 2010 at 4:01 AM, Chantal Ackermann <
> chantal.ackermann@btelligent.de> wrote:
>
> > Hi Ahmed,
> >
> > fields that are empty do not impact the index. It's different from a
> > database.
> > I have text fields for different languages and per document there is
> > always only one of the languages set (the text fields for the other
> > languages are empty/not set). It works all very well and fast.
> >
> > I wonder more about what you describe as "unrelated data" - why would
> > you want to put unrelated data into a single index? If you want to
> > search on all the data and return mixed results there surely must be
> > some kind of relation between the documents?
> >
> > Chantal
> >
> > On Thu, 2010-07-29 at 21:33 +0200, S Ahmed wrote:
> > > I understand (and its straightforward) when you want to create a index
> > for
> > > something simple like Products.
> > >
> > > But how do you go about creating a Solr index when you have data coming
> > from
> > > 10-15 database tables, and the tables have unrelated data?
> > >
> > > The issue is then you would have many 'columns' in your index, and they
> > will
> > > be NULL for much of the data since you are trying to shove 15 db tables
> > into
> > > a single Solr/Lucense index.
> > >
> > >
> > > This must be a common problem, what are the potential solutions?
> >
> >
> >
> >
>

Re: advice on creating a solr index when data source is from many unrelated db tables

Posted by Kerwin <ke...@gmail.com>.
Hi,

This is something that I am working on too.I have been trying to combine
results from 3 different tables and trying to avoid the usual SQL union
clauses.
One thing I have tried to do is watch out for common fields like, for
example, first name and last name that could be present in all tables and
then map the similar fields from different tables in data-config.xml to the
same two Schema fields for names. I too have realised that using record type
is a good idea to filter on the results and perhaps make the search faster
by filtering on type.
On Fri, Jul 30, 2010 at 9:58 AM, S Ahmed <sa...@gmail.com> wrote:

> So I have tables like this:
>
> Users
> UserSales
> UserHistory
> UserAddresses
> UserNotes
> ClientAddress
> CalenderEvent
> Articles
> Blogs
>
> Just seems odd to me, jamming on these tables into a single index.  But I
> guess the idea of using a 'type' field to quality exactly what I am
> searching is a good idea, in case I need to filter for only 'articles' or
> blogs or contacts etc.
>
> But there might be 50 fields if I do this no?
>
>
>
> On Fri, Jul 30, 2010 at 4:01 AM, Chantal Ackermann <
> chantal.ackermann@btelligent.de> wrote:
>
> > Hi Ahmed,
> >
> > fields that are empty do not impact the index. It's different from a
> > database.
> > I have text fields for different languages and per document there is
> > always only one of the languages set (the text fields for the other
> > languages are empty/not set). It works all very well and fast.
> >
> > I wonder more about what you describe as "unrelated data" - why would
> > you want to put unrelated data into a single index? If you want to
> > search on all the data and return mixed results there surely must be
> > some kind of relation between the documents?
> >
> > Chantal
> >
> > On Thu, 2010-07-29 at 21:33 +0200, S Ahmed wrote:
> > > I understand (and its straightforward) when you want to create a index
> > for
> > > something simple like Products.
> > >
> > > But how do you go about creating a Solr index when you have data coming
> > from
> > > 10-15 database tables, and the tables have unrelated data?
> > >
> > > The issue is then you would have many 'columns' in your index, and they
> > will
> > > be NULL for much of the data since you are trying to shove 15 db tables
> > into
> > > a single Solr/Lucense index.
> > >
> > >
> > > This must be a common problem, what are the potential solutions?
> >
> >
> >
> >
>

Re: advice on creating a solr index when data source is from many unrelated db tables

Posted by S Ahmed <sa...@gmail.com>.
So I have tables like this:

Users
UserSales
UserHistory
UserAddresses
UserNotes
ClientAddress
CalenderEvent
Articles
Blogs

Just seems odd to me, jamming on these tables into a single index.  But I
guess the idea of using a 'type' field to quality exactly what I am
searching is a good idea, in case I need to filter for only 'articles' or
blogs or contacts etc.

But there might be 50 fields if I do this no?



On Fri, Jul 30, 2010 at 4:01 AM, Chantal Ackermann <
chantal.ackermann@btelligent.de> wrote:

> Hi Ahmed,
>
> fields that are empty do not impact the index. It's different from a
> database.
> I have text fields for different languages and per document there is
> always only one of the languages set (the text fields for the other
> languages are empty/not set). It works all very well and fast.
>
> I wonder more about what you describe as "unrelated data" - why would
> you want to put unrelated data into a single index? If you want to
> search on all the data and return mixed results there surely must be
> some kind of relation between the documents?
>
> Chantal
>
> On Thu, 2010-07-29 at 21:33 +0200, S Ahmed wrote:
> > I understand (and its straightforward) when you want to create a index
> for
> > something simple like Products.
> >
> > But how do you go about creating a Solr index when you have data coming
> from
> > 10-15 database tables, and the tables have unrelated data?
> >
> > The issue is then you would have many 'columns' in your index, and they
> will
> > be NULL for much of the data since you are trying to shove 15 db tables
> into
> > a single Solr/Lucense index.
> >
> >
> > This must be a common problem, what are the potential solutions?
>
>
>
>