You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@vxquery.apache.org by Steven Jacobs <sj...@ucr.edu> on 2013/09/14 00:20:02 UTC

Fwd: Test Results

Hi, I have conducted several tests to compare the collection function with
my function for a simple path return (/bookstore/book) where all files have
the same data. I used bookstore files with 50 books per file (around 8k
size files). The collection query that I used was:
collection("path")/bookstore/book

My results were sometimes counter-intuitive. I found that my function does
a lot better in the cases where the number of files is greater than two
hundred. The strange thing I found is that the index yields the greatest
advantage when you return a large percentage of the file. It seems like it
should have the biggest advantage when it is returning a small percentage
of the file, but this is not the case. I think that this is because the
bottleneck for my function is the number of steps that it executes (The
number of tuples that is needs to create). This means that it does not
really gain much from the size of each tuple being reduced, so the
difference is smaller. Collection seems to get a much better gain as return
size is reduced. I am attaching a spreadsheet with the results. In any
case, it is definitely clear that my indexing algorithm outperforms
collection for more than 200 files, but has the greatest advantage when
returning a large percentage of the file.

I also did a study of a set where only a single file had the data that we
are searching for. This is the third set of graphs on the attachment. I
would think that this is where indexing has a huge advantage, but the
results were very inconclusive.

Steven