You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Renaud Delbru (JIRA)" <ji...@apache.org> on 2010/04/02 18:16:27 UTC

[jira] Commented: (LUCENE-1410) PFOR implementation

    [ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852855#action_12852855 ] 

Renaud Delbru commented on LUCENE-1410:
---------------------------------------

Here some results on the perofrmance of compression-decompression of various algorithms over the wikipedia dataset (english articles).

First, the performance on compressing/decompressing over the full .doc posting list using block size of 1024. 
The Compression and Decompression are in KInt/ms and the size in bytes.

||Codec||Compression||Decompression||Size||
|FOR (Orig)|20|106|448856|
|PFOR (Orig)|8|107| 383596|
|VINT|37|74|421764|
|FOR (Opt)|39|102|447369|
|PFOR (Opt)|10|108|382379|
|RICE|8|31|350687|
|S9|16|65|408218|

* VInt is a block based version of variable integer (unlike the simple int block, I am creating blocks using vint, then read the full block in memory and decompress it using vint one integer at a time).
* For (Orig) and PFOR (Orig) are your implementation of FOR and PFOR.
* FOR (Opt) and PFOR (Opt) are my implementation of FOR and PFOR (using array of functors for each compression decompression case).
* Rice is one implementation of the rice algorithm, but block-based.
* S9 is your implementation of simple 9 with some bug fixes, but block-based.

I have created a *IndexInput and *IndexOutput for each algorithm, and use them to compress and decompress the posting file in this experiment.

We can see that using an array of functors for compression in FOR provides a good speed increase in compression. However, on PFOR, the compression speed increase is limited. From my test, it is due to the sort step which is necessary for comrpessing each block. 
PFOR provides a good balance between decompression speed and size, but it is very slow in compression (it is as slow as Rice). I don't think it is possible to improve the compression speed due to the sort step, which seems to impose a hard upper limit in term of performance (I have tried various heuristics to avoid sorting, but not very successful so far).

Following this first benchmark, I have created various Lucene codec that uses the *IndexInput and *IndexOutput of the different compression algorithms. I have indexed the full wikipedia dataset. Each block-based codec is configured to use block of 1024. 
I have recorded the average commit time and the optimise time. Then, I have executing random keyword queries and I have recorded the average query time and the number of bytes read for answering the query.

h5. Compression

Time is in ms, Size in bytes.

||Codec||Avg Commit Time||Total Commit Time||Optimise Time||Index Size||
|STD|1276|359978|113386|1423251|
|SEP|1437|405286|117163|1666870|
|VINT|1572|426551|122557|1742083|
|RICE|1791|505312|230636|1339703|
|S9|1575|444157|146216|1530666|
|FOR|1520|428847|134502|1754578|
|PFOR|1852|522401|189262|1511154|

* STD is the Standard lucene codec.
* SEP the Sep codec.
* All the other algorithms are based on the Sep codec, but block-based.
* FOR and PFOR is my implementation.

We can see that the sep codec and block-based codecs have a certain overhead (in indexing time and index size) compared to the standard lucene codec. But, I imagine that this overhead can be reduced with some code optimisation. For the index size, the overhead in block-based codeca is due to the skip list, which is much bigger (127MB for SEP against 189MB for block-based) since we need to encode the block offset, and the inner documet offset in the block.

In term of compression speed and index size, we can see that this results follows the previous results. We can observe also that PFOR is the slowest.

h5. Decompression

For each codec, I have performed five runs, each run having 200 random keyword queries, and average the time over the 5 runs and 200 queries (i.e., 200*5 = 1000 query execution).
For each run, I am opening a new IndexReader and IndexSearcher and I am reusing them across the 200 random queries.

To generate the queries, I first grouped the terms into three group: HIGH, MEDIUM and LOW, based on their frequency, in order to be able to generate random keyword queries based on this three frequency groups. Then, I have performed benchmarks with random queries of one, two or more keywords, using all possible combinations (e.g., HIGH & HIGH, HIGH & MEDIUM, etc). I have executed the queries using a single thread, and also using multiple threads.

I am reporting below a few of query results, but for most of the other queries, the results were following the same scheme. I don't report the results of the Standard and Sep codec, but they always outperform block-based codec, whatever compression used.

However, the results for the query time are completely contradictory with the results of the first benchmark. The slowest algorithm (Rice, Vint) generally provides the faster query execution time, and the faster algorithm (FOR, PFOR) provides the slowest query execution time. Simple9 seems to be as fast as VInt. 
I currently have no explanation for that. Maybe the dataset (wikipedia articles) is too small to see the benefits. But I would say instead there is a problem somewhere. How come VInt and even Rice could provide better query execution times than FOR and PFOR ? While, in the first benchmark, it is clear that FOR and PFOR provides nearly 40% of decompression performance increase.
Do you have some advices on how to improve the benchmark, or some ideas on the causes of this frop of performance ? I'll be able to re-perform the benchmarks if you propose something.

h6. HIGH:MUST - 1 thread - 200 random queries

||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.3 ms|
|FOR|2.5 ms|
|S9|1 ms|

h6. HIGH:MUST - 2 thread - 200 random queries for each thread

||Codec||Avg Query Time||
|Rice|2 ms|
|VInt|1.9 ms|
|PFOR|3 ms|
|FOR|3.5 ms|
|S9|1.6 ms|

h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 1 thread - 200 random queries

||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.7 ms|
|FOR|2.5 ms|
|S9|1.3 ms|

h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 2 threads - 200 random queries for each thread

|Rice|2.5 ms|
|VInt|2 ms|
|PFOR|3 ms|
|FOR|3.4 ms|
|S9|2 ms|



> PFOR implementation
> -------------------
>
>                 Key: LUCENE-1410
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1410
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: autogen.tgz, for-summary.txt, LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org