You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by Steven Jacobs <sj...@ucr.edu> on 2013/10/09 20:24:56 UTC

New Indexing Results

I re-implemented indexing in a better way to try for better results, and
the results that I got were very positive. In the worst case (Retrieving
all XML from all files) the indexing and the collection function perform
equally well. In other cases, indexing outperformed collection. Here are
some results that I got:

Retrieve all data from 42000 files:
Collection: 292 seconds
Index: 290 seconds

Retrieve 1 line of data from 42000 files:
Collection: 50 seconds
Index: 12 seconds

I also wanted to see how well indexing did when the relevant data is found
on a small subset of files. I ran a query to retrieve all data from only
files matching an equality search (600 out of 42000 files). The results
came in about 8 seconds for indexing.

Unfortunately my machine was not powerful enough to run a collection and
equality on the same set of data (A large enough frame size could not fit
everything into my 4 gigs of memory). On the positive side, this seems to
flush out another advantage of the indexing version. It ends up using a
smaller frame size to perform the same task, meaning it can operate on
larger files.

I would love any feedback or other comparisons that you would like to see.

Steven