You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Tom Bradford <br...@apache.org> on 2002/06/08 00:21:38 UTC

Re: Indexing performance

On Wednesday, May 8, 2002, at 01:04  PM, xindice@turretweb.com wrote:

> We have a collection of about 800 documents each about 5 Kb in size.
> Upon indexing using the wildcard index, Xindice retrieves queries'
> using the Contains function in about 45-60 seconds, which is
> unacceptable by Internet standards.  Is there a way to improve on
> this performance?  We have tried indexing all elements, all
> attributes as well as specific elements only, with no success.  Also,
> is there a way to delete an index (other than to delete and then
> recreate the collection)?  We know from experience that Oracle
> retrieved similar queries using the Contains clause with 100 times
> more XML documents stored in CLOBS in less than 15 seconds using
> Oracle Intermedia.

Even if Xindice supported full text indexing, which it doesn't yet, 
contains() queries will *always* result in a collection scan because 
contains() is not a full text function, but rather, a substring function 
which requires all text to be scanned.  Xindice is *not* a search 
engine, and though at some point we'll support searches on 
semi-structured data, for now you're better off just using grep if all 
you're doing is contains searches.

--
Tom Bradford - http://www.tbradford.org
Architect - XQRL (XQuery Engine) - http://www.xqrl.com
Apache Xindice (XML Database) - http://xml.apache.org/xindice
Labrador (Web Services Hub) - http://www.notdotnet.org/labrador