You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael Böckling <Mi...@dmc.de> on 2007/07/05 16:51:19 UTC
AW: Searching over multiple indexes with 1:m relationship

Hi,

thanks for your answers, you really helped me make the right decision. I
have now a fully denormalized second index, which is way easier to handle
than the attempt I made before that mimicked the DB schema, and I don't have
any speed problems.

It seems Lucene's mailinglist is just as great as the code. :-)

Regards,
Michael



> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:erickerickson@gmail.com]
> Gesendet: Donnerstag, 28. Juni 2007 20:45
> An: java-user@lucene.apache.org
> Betreff: Re: Searching over multiple indexes with 1:m relationship
> 
> 
> Chris is spot-on. Your data set is so small that I wouldn't 
> worry about
> speed unless and until you have proof that it's a problem. 
> The complexity
> you'll introduce by having multiple indexes just won't be worth it.
> 
> In your case, following Chris's advice and de-normalizing the 
> data would
> be the first you should try.
> 
> Erick.
> 
> On 6/28/07, Michael Böckling <Mi...@dmc.de> wrote:
> >
> > Hi Erickson,
> >
> > thanks for your reply.
> >
> > Of course you are right that its a bit insane to mimic a 
> database-schema
> > with indices, but thats how it is. The primary index is 
> already in use,
> > the
> > extended requirements came later.
> >
> > The Index isn't really that big, the primary one has 2-3 MB 
> of data, I
> > don't
> > know yet how big the secondary one will be, but probably 
> less than 20
> > Megs.
> > The idea was that most searches will only need the first 
> index, it is only
> > by using an extended search form that the secondary index 
> is queried.
> > Keeping the first index small should help with performance, 
> where the main
> > load is handled.
> >
> > The number of primary results will often be less than 200, typically
> > around
> > 20 I guess, so its not that big of a deal to iterate through them.
> >
> > Regards,
> >
> > Michael
> >
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Erick Erickson [mailto:erickerickson@gmail.com]
> > > Gesendet: Donnerstag, 28. Juni 2007 16:09
> > > An: java-user@lucene.apache.org
> > > Betreff: Re: Searching over multiple indexes with 1:m relationship
> > >
> > >
> > > I do have an off-the-wall question.. Why have two indexes? There
> > > are, of course, good reasons, but they're things like 
> size and speed.
> > >
> > > Where I'm going here is that Lucene does NOT require that all
> > > documents have the same fields. So it's perfectly 
> reasonable to index
> > > heterogeneous data (or differing forms of the same data) in a
> > > single index.
> > > This may not fit your requirements, but I thought I'd mention it.
> > >
> > > That said, it really doesn't bear on your question since
> > > you'd really have
> > > two logical indexes in the same physical index. Although
> > > maybe it does.
> > > If all the data were in one index, then perhaps you could do
> > > exactly one
> > > search instead.
> > >
> > > I'm always leery of using an index to mimic what looks 
> like database
> > > functionality. That often means that you either should 
> actually use a
> > > database for the database-like parts or get much more clever
> > > in your index
> > > so you don't need what are essentially joins.
> > >
> > > All that said, a lot depends on the data set size. If 
> your first query
> > > results
> > > in, say, 100 documents (pks) that you need to use for your
> > > second query,
> > > it probably doesn't matter whether you do a lot of manual
> > > processing. If the
> > > first query results in 1,000,000 pks, then it does....
> > >
> > > So how much data are you talking about? Even the single-index idea
> > > depends upon whether we're talking a couple of G index size of a
> > > couple of T...
> > >
> > > Best
> > > Erick
> > >
> > >
> > >
> > >
> > > On 6/28/07, Michael Böckling <Mi...@dmc.de> wrote:
> > > >
> > > > Hi folks!
> > > >
> > > > I know there is a MultiSearcher for searching over multiple
> > > indices, but
> > > > my
> > > > requirement is a bit special.
> > > > I have two indices whose documents have a 1:m relationship.
> > > Most queries
> > > > will only use the primary index, but some will have to look
> > > for detailed
> > > > information in the secondary index (the index fields 
> are of course
> > > > different).
> > > >
> > > > What I plan to do:
> > > > - first get the results from the primary index
> > > > - then use the pk of the found documents and the 
> additional search
> > > > constraints to search in the secondary index
> > > > - discard any primary results that did not match in the
> > > secondary index
> > > >
> > > > Is this ok, or am I completely nuts by doing that? Is 
> there a better
> > > > alternative?
> > > >
> > > > Thanks for any clues!
> > > >
> > > > Michael
> > > >
> > > >
> > > > --
> > > > Michael Böckling
> > > > Java Engineer
> > > > dmc digital media center GmbH
> > > > Rommelstraße 11
> > > > 70376 Stuttgart (Germany)
> > > > Telefon: +49 711 601747-0
> > > > Telefax: +49 711 601747-141
> > > > E-Mail: Michael.Boeckling@dmc.de
> > > > Internet: www.dmc.de
> > > >
> > > > Handelsregister: AG Stuttgart HRB 18974
> > > > Geschäftsführer: Andreas Magg, Daniel Rebhorn, Andreas Schwend
> > > >
> > > > ---------------------------------------------
> > > > Besseres E-Business.
> > > > dmc ist die kreative Vernetzung von Agentur, Systemhaus und
> > > Service. Seit
> > > > über 10 Jahren entwickeln und realisieren wir 
> zukunftweisende und
> > > > erfolgreiche E-Business-Lösungen. Zu unseren langjährigen
> > > Kunden zählen
> > > > neckermann.de, Kodak und Telekom Training.
> > > >
> > > > dmc auf Platz 8 im aktuellen New Media Service Ranking.
> > > > Als inhabergeführte und netzwerkunabhängige Agentur gehören
> > > wir mit einem
> > > > Umsatz von 13,50 Mio. Euro zu den Top 10 der
> > > erfolgreichsten New Media
> > > > Dienstleister in Deutschland.
> > > >
> > > >
> > > 
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: 
> java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org