You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Tom Bradford <br...@apache.org> on 2002/06/08 00:21:38 UTC
Re: Indexing performance
On Wednesday, May 8, 2002, at 01:04 PM, xindice@turretweb.com wrote:
> We have a collection of about 800 documents each about 5 Kb in size.
> Upon indexing using the wildcard index, Xindice retrieves queries'
> using the Contains function in about 45-60 seconds, which is
> unacceptable by Internet standards. Is there a way to improve on
> this performance? We have tried indexing all elements, all
> attributes as well as specific elements only, with no success. Also,
> is there a way to delete an index (other than to delete and then
> recreate the collection)? We know from experience that Oracle
> retrieved similar queries using the Contains clause with 100 times
> more XML documents stored in CLOBS in less than 15 seconds using
> Oracle Intermedia.
Even if Xindice supported full text indexing, which it doesn't yet,
contains() queries will *always* result in a collection scan because
contains() is not a full text function, but rather, a substring function
which requires all text to be scanned. Xindice is *not* a search
engine, and though at some point we'll support searches on
semi-structured data, for now you're better off just using grep if all
you're doing is contains searches.
--
Tom Bradford - http://www.tbradford.org
Architect - XQRL (XQuery Engine) - http://www.xqrl.com
Apache Xindice (XML Database) - http://xml.apache.org/xindice
Labrador (Web Services Hub) - http://www.notdotnet.org/labrador