You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Li Li <fa...@gmail.com> on 2010/12/14 08:50:32 UTC

strange problem of PForDelta decoder

Hi
   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
   I use the implementation in
http://code.google.com/p/integer-array-compress-kit/
   it implements a basic PForDelta algorithm and an improved one(which
called NewPForDelta, but there are many bugs and I have fixed them),
   But compare it with VInt and S9, it's speed is very slow when only
decode small number of integer arrays.
   e.g. when I decoded int[256] arrays which values are randomly
generated between 0 and 100, if decode just one array. PFor(or
NewPFor) is very slow. when it continuously decodes many arrays such
as 10000, it's faster than s9 and vint.
   Another strange phenomena is that when call PFor decoder twice, the
2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
is faster. reverse the call sequcence, the 2nd called decoder is
faster
   e.g.
  		ct.testNewPFDCodes(list);
		ct.testPFor(list);
		ct.testVInt(list);
		ct.testS9(list);

NewPFD decode: 3614705
PForDelta decode: 17320
VINT decode: 16483
S9 decode: 19835
when I call by the following sequence

		ct.testPFor(list);
		ct.testNewPFDCodes(list);
		ct.testVInt(list);
		ct.testS9(list);

PForDelta decode: 3212140
NewPFD decode: 19556
VINT decode: 16762
S9 decode: 16483

   My implementation is -- group docIDs and termDocFreqs into block
which contains 128 integers. when SegmentTermDocs's next method
called(or read readNoTf).it decodes a block and save it to a cache.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

but I let VINT and S9 decoder run first, their time is the same as
when they are called in the end. it's stable

2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>
> Ie, does your test first "warmup" the JRE and then do the real test?
>
> I've also found that running -Xbatch produces more consistent results
> from run to run, however, those results may not be as fast as running
> w/o -Xbatch.
>
> Also, it's better to test on actual data (ie a Lucene index's
> postings), and in the full context of searching, because then we get a
> sense of what speedups a real app will see... micro-benching is nearly
> impossible in Java since Hotspot acts very differently vs the "real"
> test.
>
> Mike
>
> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>> Hi
>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>   I use the implementation in
>> http://code.google.com/p/integer-array-compress-kit/
>>   it implements a basic PForDelta algorithm and an improved one(which
>> called NewPForDelta, but there are many bugs and I have fixed them),
>>   But compare it with VInt and S9, it's speed is very slow when only
>> decode small number of integer arrays.
>>   e.g. when I decoded int[256] arrays which values are randomly
>> generated between 0 and 100, if decode just one array. PFor(or
>> NewPFor) is very slow. when it continuously decodes many arrays such
>> as 10000, it's faster than s9 and vint.
>>   Another strange phenomena is that when call PFor decoder twice, the
>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>> is faster. reverse the call sequcence, the 2nd called decoder is
>> faster
>>   e.g.
>>                ct.testNewPFDCodes(list);
>>                ct.testPFor(list);
>>                ct.testVInt(list);
>>                ct.testS9(list);
>>
>> NewPFD decode: 3614705
>> PForDelta decode: 17320
>> VINT decode: 16483
>> S9 decode: 19835
>> when I call by the following sequence
>>
>>                ct.testPFor(list);
>>                ct.testNewPFDCodes(list);
>>                ct.testVInt(list);
>>                ct.testS9(list);
>>
>> PForDelta decode: 3212140
>> NewPFD decode: 19556
>> VINT decode: 16762
>> S9 decode: 16483
>>
>>   My implementation is -- group docIDs and termDocFreqs into block
>> which contains 128 integers. when SegmentTermDocs's next method
>> called(or read readNoTf).it decodes a block and save it to a cache.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

thanks.
   I'd like trying it and do some experiment on our dataset.

2010/12/15 Michael McCandless <lu...@mikemccandless.com>:
> Hi Li Li,
>
> That issue has such a big patch, and enough of us are now iterating on
> it, that we cut a dedicated branch for it.
>
> But note that this branch is off of trunk (to be 4.0).
>
> You should be able to do this:
>
>  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>
> And then run things in there.  I just committed FOR/PFOR prototype
> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
> using those codecs by running "ant test
> -Dtests.codec=PatchedFrameOfRef".
>
> Please post patches back if you improve things!  We need all the help
> we can get :)
>
> Mike
>
> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fa...@gmail.com> wrote:
>> hi Michael
>>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>>    I am not familiar with patch. do I need download
>> LUCENE-2723.patch(there are many patches after this name, do I need
>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>
>>
>> 2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
>>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>>
>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>
>>> I've also found that running -Xbatch produces more consistent results
>>> from run to run, however, those results may not be as fast as running
>>> w/o -Xbatch.
>>>
>>> Also, it's better to test on actual data (ie a Lucene index's
>>> postings), and in the full context of searching, because then we get a
>>> sense of what speedups a real app will see... micro-benching is nearly
>>> impossible in Java since Hotspot acts very differently vs the "real"
>>> test.
>>>
>>> Mike
>>>
>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>>>> Hi
>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>>   I use the implementation in
>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>> decode small number of integer arrays.
>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>> as 10000, it's faster than s9 and vint.
>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>> faster
>>>>   e.g.
>>>>                ct.testNewPFDCodes(list);
>>>>                ct.testPFor(list);
>>>>                ct.testVInt(list);
>>>>                ct.testS9(list);
>>>>
>>>> NewPFD decode: 3614705
>>>> PForDelta decode: 17320
>>>> VINT decode: 16483
>>>> S9 decode: 19835
>>>> when I call by the following sequence
>>>>
>>>>                ct.testPFor(list);
>>>>                ct.testNewPFDCodes(list);
>>>>                ct.testVInt(list);
>>>>                ct.testS9(list);
>>>>
>>>> PForDelta decode: 3212140
>>>> NewPFD decode: 19556
>>>> VINT decode: 16762
>>>> S9 decode: 16483
>>>>
>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

   our recent experiments show that PFOR is not a good solution for "and query"
we tested it with our dataset and users' queries. for most case, PFOR is slower
than vint. we analyzed the reason may be that it's very likely there
is a low-frequent
term in most queries. So the scoring time is the majority while decoding is not.
   e.g in our index, term "beijing"'s df is 2557916 and "park" is
2313201, both them
are hight frequent terms. but the count of documents containing both
is only 1552
   for vint, it only need decode 1552 documents, while PFOR, it may decode many
blocks.
   for most search engines, and query is used. So PFOR is only good for or query
and "and query" whose terms are all high frequent.
   So we have to give up this in our application.
   partial decoder for PFOR? for all high frequent terms, using normal
PFOR decoder
;for quries with low frequent  terms, using partial decoder?
   partial decoder of PFOR many need many if/else and will be slower.
   Any one has any solution for this?


2010/12/27 Li Li <fa...@gmail.com>:
> I integrated pfor codec into lucene 2.9.3 and the search time
> comparsion is as follows:
>                                   single term   and query   or query
> VINT in lucene 2.9.3         11.2            36.5           38.6
> PFor in lucene 2.9.3         8.7              27.6           33.4
> VINT in lucene 4 branch   10.6             26.5           35.4
> PFor in lcuene 4 branch    8.1              22.5           30.7
>
> My test terms are high frequncy terms because we are interested in "bad case"
> It seems lucene 4 branch's implementation of and query(conjuction
> query) is well optimized that even for VINT codec, it's faster than
> PFor in lucene 2.9.3. Could any one tell me what optimization is done?
> is store docIDs and freqs separately making it faster? or anything
> else?
>
> Another querstion, Is there anyone interested in integrating pfor
> codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
> 1.4). And how do I contribute this patch?
>
> 2010/12/24 Michael McCandless <lu...@mikemccandless.com>:
>> Well, an early patch somewhere was able to run PFor on trunk, but the
>> performance wasn't great because the trunk bulk-read API is a
>> bottleneck (this is why the bulk postings branch was created).
>>
>> Mike
>>
>> On Wed, Dec 22, 2010 at 9:45 PM, Li Li <fa...@gmail.com> wrote:
>>> I used the bulkpostings
>>> branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
>>> does trunk have PForDelta decoder/encoder ?
>>>
>>> 2010/12/23 Michael McCandless <lu...@mikemccandless.com>:
>>>> Those are nice speedups!
>>>>
>>>> Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test?
>>>>
>>>> Mike
>>>>
>>>> On Tue, Dec 21, 2010 at 9:59 PM, Li Li <fa...@gmail.com> wrote:
>>>>> great improvement!
>>>>> I did a test in our data set. doc count is about 2M+ and index size
>>>>> after optimization is about 13.3GB(including fdt)
>>>>> it seems lucene4's index format is better than lucene2.9.3. and PFor
>>>>> give good results.
>>>>> Besides BlockEncoder for frq and pos. is there any other modification
>>>>> for lucene 4?
>>>>>
>>>>>       decoder    \ avg time     single word(ms)          and
>>>>> query(ms)     or query(ms)
>>>>>  VINT in lucene 2.9                   11.2
>>>>> 36.5                 38.6
>>>>>  VINT in lucene 4 branch           10.6
>>>>> 26.5                 35.4
>>>>>  PFor in lucene 4 branch             8.1
>>>>> 22.5                 30.7
>>>>> 2010/12/21 Li Li <fa...@gmail.com>:
>>>>>>> OK we should have a look at that one still.  We need to converge on a
>>>>>>> good default codec for 4.0.  Fortunately it's trivial to take any int
>>>>>>> block encoder (fixed or variable block) and make a Lucene codec out of
>>>>>>> it!
>>>>>>
>>>>>> I suggests you not to use this one, I fixed dozens of bugs but it
>>>>>> still failed when with random tests. it's codes is hand coded rather
>>>>>> than generated by program. But we may learn something from it.
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

Great :)

Also, realize that certain queries (MultiTermQuery) spend lots of time
in "rewrite" and (sometimes, anyway) comparatively small amounts of
time in actually running the query.  So it'd be nice to have a good
way to let multiple threads participate in rewrite too.

Thread scheduling/prioritization is also important, so that a big slow
query doesn't starve execution for fast queries.  Ideally
(unfortunately!) I think we'd want something like how the OS schedules
processes, ie new processes run with higher priority than long running
resource consuming processes.  So a new query would get higher
priority, but, if it runs for a long time, it's priority is reduced.
Not sure how we should solve that one...

Mike

On Wed, Jan 5, 2011 at 10:09 PM, Li Li <fa...@gmail.com> wrote:
> we recently are interested in this problem. if we come up with a
> patch, I'd like
> to share it with everyone.
>
> 2011/1/4 Michael McCandless <lu...@mikemccandless.com>:
>> 2011/1/4 Li Li <fa...@gmail.com>:
>>> I agree with you that we should not tie concurrency w/in a single search to
>>> index segments.
>>> That solution is just a hack.
>>> will lucene 4 support multithreads search for a single query?
>>> I haven't found any patch about this.
>>
>> Well, as things stand now, Lucene 4 will only support the "thread per
>> segment" hack.  The patch on LUCENE-2837 (still needs work) merges
>> ParallelMultiSearcher into IndexSearcher, carrying over that hack.
>>
>> But this discussion seems like it could lead to a nice patch?  (If
>> someone has the time/energy/itch to cons one up).
>>
>> Just dividing up the docID space equally seems like a simple solution
>> that'd work well...
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

we recently are interested in this problem. if we come up with a
patch, I'd like
to share it with everyone.

2011/1/4 Michael McCandless <lu...@mikemccandless.com>:
> 2011/1/4 Li Li <fa...@gmail.com>:
>> I agree with you that we should not tie concurrency w/in a single search to
>> index segments.
>> That solution is just a hack.
>> will lucene 4 support multithreads search for a single query?
>> I haven't found any patch about this.
>
> Well, as things stand now, Lucene 4 will only support the "thread per
> segment" hack.  The patch on LUCENE-2837 (still needs work) merges
> ParallelMultiSearcher into IndexSearcher, carrying over that hack.
>
> But this discussion seems like it could lead to a nice patch?  (If
> someone has the time/energy/itch to cons one up).
>
> Just dividing up the docID space equally seems like a simple solution
> that'd work well...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

2011/1/4 Li Li <fa...@gmail.com>:
> I agree with you that we should not tie concurrency w/in a single search to
> index segments.
> That solution is just a hack.
> will lucene 4 support multithreads search for a single query?
> I haven't found any patch about this.

Well, as things stand now, Lucene 4 will only support the "thread per
segment" hack.  The patch on LUCENE-2837 (still needs work) merges
ParallelMultiSearcher into IndexSearcher, carrying over that hack.

But this discussion seems like it could lead to a nice patch?  (If
someone has the time/energy/itch to cons one up).

Just dividing up the docID space equally seems like a simple solution
that'd work well...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

I agree with you that we should not tie concurrency w/in a single search to
index segments.
That solution is just a hack.
will lucene 4 support multithreads search for a single query?
I haven't found any patch about this.

2011/1/4 Michael McCandless <lu...@mikemccandless.com>:
> Here's the paper:
>
>    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.8091
>
> I haven't read it yet...
>
> In general I don't like tying concurrency w/in a single search to
> index segments; I'd rather they be (relatively?) independent.  EG an
> optimized index would then force single thread queries, which is
> crazy...
>
> In Lucene it should be easy to chop up the maxDoc(), ie if I have 10M
> docs and I want to use 4 threads I just give each thread a chunk of
> 2.5M docs.  Each thread would first skip to its start doc, and then go
> from there.
>
> I'm not sure we'd want to share a single collector vs private
> collector per thread and then merge in the end (this is a similar
> tradeoff as we've discussed on the per-segment collectors).
>
> Mike
>
> 2011/1/1 Li Li <fa...@gmail.com>:
>>   I sent a mail to MG4J group and Sebastiano Vigna recommended the paper
>> Reducing query latencies in web search using fine-grained parallelism.
>> World Wide Web, 12(4):441-460, 2009.
>>   I read it roughly. But there are some questions
>>     it says:
>>    it first coalesces all disk reads in a single process, and then
>> distributes the index data among
>> the parallel threads. So when the index server receives a query, it
>> loads from storage (disk or cache)
>> the required posting lists, initializes the query execution, and then
>> spawns lightweight threads, one per core.
>> Each thread receives an equal-sized subset of document IDs to scan,
>> together covering the entire index
>> partition. All threads execute the same code on the same query, but
>> with private data structures.
>> The only writable shared data structure is a heap of the top-scoring
>> hits, protected by a lock. At the end of the threads'
>> execution, the heap contains the highest-scoring hits in the entire
>> index partition, which is then transmitted to the query
>> integrator as usual. Since the index contains skip-lists that permit
>> near-random-access to any document ID, and since
>> hits are generally distributed evenly in the index, our measurements
>> show that all threads complete their work with
>> little variability, within 1-2% of each other.
>>
>>  I have some questions
>>  1.Since the index contains skip-lists that permit
>> near-random-access to any document ID
>>   skip list can be near random access?(especially when it's in hard disk)
>>
>>  2.For "or query", it's easy. e.g. we search "search" and "engine",
>> we using one main thread to get postings the
>> the 2 terms and divide their postings into 2 groups(e.g. even docIds
>> and odd docIds) and using 2 threads to score
>> them and finally merge it(or using locked priority queue)
>>     but for "and query", we usally don't decode all the docIDs
>> because we can skip many documents. especially when
>> searching low frequent terms with high frequent terms. only a small
>> number of docIDs of high frequent term is decoded.
>>
>>   btw. I think or query is often much slower than and query. If we can
>> parallel or query well, it's also very useful.
>>
>> On Dec 31, 2010, at 7:25 AM, Li Li wrote:
>>
>>>    Which one is used in MG4J to support multithreads searching? Are
>>
>> 2010/12/31 Li Li <fa...@gmail.com>:
>>> is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
>>> it says Multithreading. Indices can be queried and scored concurrently.
>>> maybe we can learn something from it.
>>>
>>> 2010/12/31 Li Li <fa...@gmail.com>:
>>>> plus
>>>> 2 means search a term need seek many times for tis(if it's not cached in tii)
>>>>
>>>> 2010/12/31 Li Li <fa...@gmail.com>:
>>>>> searching multi segments is a alternative solution but it has some
>>>>> disadvantages.
>>>>> 1. idf is not global?(I am not familiar with its implementation) maybe
>>>>> it's easy to solve it by share global idf
>>>>> 2. each segments will has it's own tii and tis files, which may make
>>>>> search slower(that's why optimization of
>>>>> index is neccessary)
>>>>> 3. one term's  docList is distributed in many files rather than one.
>>>>> more than one frq files means
>>>>> hard disk must seek different tracks, it's time consuming. if there is
>>>>> only one segment, the are likely
>>>>> stored in a single track.
>>>>>
>>>>>
>>>>> 2010/12/31 Earwin Burrfoot <ea...@gmail.com>:
>>>>>>>>>until we fix Lucene to run a single search concurrently (which we
>>>>>>>>>badly need to do).
>>>>>>> I am interested in this idea.(I have posted it before) do you have some
>>>>>>> resources such as papers or tech articles about it?
>>>>>>> I have tried but it need to modify index format dramatically and we use
>>>>>>> solr distributed search to relieve the problem of response time. so finally
>>>>>>> give it up.
>>>>>>> lucene4's index format is more flexible that it supports customed codecs
>>>>>>> and it's now on development, I think it's good time to take it into
>>>>>>> consideration
>>>>>>> that let it support multithread searching for a single query.
>>>>>>> I have a naive solution. dividing docList into many groups
>>>>>>> e.g grouping docIds by it's even or odd
>>>>>>> term1 df1=4  docList =  0  4  8  10
>>>>>>> term1 df2=4  docList = 1  3  9  11
>>>>>>>
>>>>>>> term2 df1=4  docList = 0  6  8  12
>>>>>>> term2 df2=4  docList = 3  9  11 15
>>>>>>>   then we can use 2 threads to search topN docs on even group and odd group
>>>>>>> and finally merge their results into a single on just like solr
>>>>>>> distributed search.
>>>>>>> But it's better than solr distributed search.
>>>>>>>   First, it's in a single process and data communication between
>>>>>>> threads is much
>>>>>>> faster than network.
>>>>>>>   Second, each threads process the same number of documents.For solr
>>>>>>> distributed
>>>>>>> search, one shard may process 7 documents and another shard may 1 document
>>>>>>> Even if we can make each shard have the same document number. we can not
>>>>>>> make it uniformly for each term.
>>>>>>>    e.g. shard1 has doc1 doc2
>>>>>>>           shard2 has doc3 doc4
>>>>>>>    but term1 may only occur in doc1 and doc2
>>>>>>>    while term2 may only occur in doc3 and doc4
>>>>>>>    we may modify it
>>>>>>>           shard1 doc1 doc3
>>>>>>>           shard2 doc2 doc4
>>>>>>>    it's good for term1 and term2
>>>>>>>    but term3 may occur in doc1 and doc3...
>>>>>>>    So I think it's fine-grained distributed in index while solr
>>>>>>> distributed search is coarse-
>>>>>>> grained.
>>>>>> This is just crazy :)
>>>>>>
>>>>>> The simple way is just to search different segments in parallel.
>>>>>> BalancedSegmentMergePolicy makes sure you have roughly even-sized
>>>>>> large segments (and small ones don't count, they're small!).
>>>>>> If you're bound on squeezing out that extra millisecond (and making
>>>>>> your life miserable along the way), you can search a single segment
>>>>>> with multiple threads (by dividing it in even chunks, and then doing
>>>>>> skipTo to position your iterators to the beginning of each chunk).
>>>>>>
>>>>>> First approach is really easy to implement. Second one is harder, but
>>>>>> still doesn't require you to cook the number of CPU cores available
>>>>>> into your index!
>>>>>>
>>>>>> It's the law of diminishing returns at play here. You're most likely
>>>>>> to search in parallel over mostly memory-resident index
>>>>>> (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
>>>>>> tend to slow down considerably on parallel sequential reads, so you
>>>>>> already have pretty decent speed.
>>>>>> Searching different segments in parallel (with BSMP) makes you several
>>>>>> times faster.
>>>>>> Searching in parallel within a segment requires some weird hacks, but
>>>>>> has maybe a few percent advantage over previous solution.
>>>>>> Sharding posting lists requires a great deal of weird hacks, makes
>>>>>> index machine-bound, and boosts speed by another couple of percent.
>>>>>> Sounds worthless.
>>>>>>
>>>>>> --
>>>>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>>>>> Phone: +7 (495) 683-567-4
>>>>>> ICQ: 104465785
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

Here's the paper:

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.8091

I haven't read it yet...

In general I don't like tying concurrency w/in a single search to
index segments; I'd rather they be (relatively?) independent.  EG an
optimized index would then force single thread queries, which is
crazy...

In Lucene it should be easy to chop up the maxDoc(), ie if I have 10M
docs and I want to use 4 threads I just give each thread a chunk of
2.5M docs.  Each thread would first skip to its start doc, and then go
from there.

I'm not sure we'd want to share a single collector vs private
collector per thread and then merge in the end (this is a similar
tradeoff as we've discussed on the per-segment collectors).

Mike

2011/1/1 Li Li <fa...@gmail.com>:
>   I sent a mail to MG4J group and Sebastiano Vigna recommended the paper
> Reducing query latencies in web search using fine-grained parallelism.
> World Wide Web, 12(4):441-460, 2009.
>   I read it roughly. But there are some questions
>     it says:
>    it first coalesces all disk reads in a single process, and then
> distributes the index data among
> the parallel threads. So when the index server receives a query, it
> loads from storage (disk or cache)
> the required posting lists, initializes the query execution, and then
> spawns lightweight threads, one per core.
> Each thread receives an equal-sized subset of document IDs to scan,
> together covering the entire index
> partition. All threads execute the same code on the same query, but
> with private data structures.
> The only writable shared data structure is a heap of the top-scoring
> hits, protected by a lock. At the end of the threads'
> execution, the heap contains the highest-scoring hits in the entire
> index partition, which is then transmitted to the query
> integrator as usual. Since the index contains skip-lists that permit
> near-random-access to any document ID, and since
> hits are generally distributed evenly in the index, our measurements
> show that all threads complete their work with
> little variability, within 1-2% of each other.
>
>  I have some questions
>  1.Since the index contains skip-lists that permit
> near-random-access to any document ID
>   skip list can be near random access?(especially when it's in hard disk)
>
>  2.For "or query", it's easy. e.g. we search "search" and "engine",
> we using one main thread to get postings the
> the 2 terms and divide their postings into 2 groups(e.g. even docIds
> and odd docIds) and using 2 threads to score
> them and finally merge it(or using locked priority queue)
>     but for "and query", we usally don't decode all the docIDs
> because we can skip many documents. especially when
> searching low frequent terms with high frequent terms. only a small
> number of docIDs of high frequent term is decoded.
>
>   btw. I think or query is often much slower than and query. If we can
> parallel or query well, it's also very useful.
>
> On Dec 31, 2010, at 7:25 AM, Li Li wrote:
>
>>    Which one is used in MG4J to support multithreads searching? Are
>
> 2010/12/31 Li Li <fa...@gmail.com>:
>> is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
>> it says Multithreading. Indices can be queried and scored concurrently.
>> maybe we can learn something from it.
>>
>> 2010/12/31 Li Li <fa...@gmail.com>:
>>> plus
>>> 2 means search a term need seek many times for tis(if it's not cached in tii)
>>>
>>> 2010/12/31 Li Li <fa...@gmail.com>:
>>>> searching multi segments is a alternative solution but it has some
>>>> disadvantages.
>>>> 1. idf is not global?(I am not familiar with its implementation) maybe
>>>> it's easy to solve it by share global idf
>>>> 2. each segments will has it's own tii and tis files, which may make
>>>> search slower(that's why optimization of
>>>> index is neccessary)
>>>> 3. one term's  docList is distributed in many files rather than one.
>>>> more than one frq files means
>>>> hard disk must seek different tracks, it's time consuming. if there is
>>>> only one segment, the are likely
>>>> stored in a single track.
>>>>
>>>>
>>>> 2010/12/31 Earwin Burrfoot <ea...@gmail.com>:
>>>>>>>>until we fix Lucene to run a single search concurrently (which we
>>>>>>>>badly need to do).
>>>>>> I am interested in this idea.(I have posted it before) do you have some
>>>>>> resources such as papers or tech articles about it?
>>>>>> I have tried but it need to modify index format dramatically and we use
>>>>>> solr distributed search to relieve the problem of response time. so finally
>>>>>> give it up.
>>>>>> lucene4's index format is more flexible that it supports customed codecs
>>>>>> and it's now on development, I think it's good time to take it into
>>>>>> consideration
>>>>>> that let it support multithread searching for a single query.
>>>>>> I have a naive solution. dividing docList into many groups
>>>>>> e.g grouping docIds by it's even or odd
>>>>>> term1 df1=4  docList =  0  4  8  10
>>>>>> term1 df2=4  docList = 1  3  9  11
>>>>>>
>>>>>> term2 df1=4  docList = 0  6  8  12
>>>>>> term2 df2=4  docList = 3  9  11 15
>>>>>>   then we can use 2 threads to search topN docs on even group and odd group
>>>>>> and finally merge their results into a single on just like solr
>>>>>> distributed search.
>>>>>> But it's better than solr distributed search.
>>>>>>   First, it's in a single process and data communication between
>>>>>> threads is much
>>>>>> faster than network.
>>>>>>   Second, each threads process the same number of documents.For solr
>>>>>> distributed
>>>>>> search, one shard may process 7 documents and another shard may 1 document
>>>>>> Even if we can make each shard have the same document number. we can not
>>>>>> make it uniformly for each term.
>>>>>>    e.g. shard1 has doc1 doc2
>>>>>>           shard2 has doc3 doc4
>>>>>>    but term1 may only occur in doc1 and doc2
>>>>>>    while term2 may only occur in doc3 and doc4
>>>>>>    we may modify it
>>>>>>           shard1 doc1 doc3
>>>>>>           shard2 doc2 doc4
>>>>>>    it's good for term1 and term2
>>>>>>    but term3 may occur in doc1 and doc3...
>>>>>>    So I think it's fine-grained distributed in index while solr
>>>>>> distributed search is coarse-
>>>>>> grained.
>>>>> This is just crazy :)
>>>>>
>>>>> The simple way is just to search different segments in parallel.
>>>>> BalancedSegmentMergePolicy makes sure you have roughly even-sized
>>>>> large segments (and small ones don't count, they're small!).
>>>>> If you're bound on squeezing out that extra millisecond (and making
>>>>> your life miserable along the way), you can search a single segment
>>>>> with multiple threads (by dividing it in even chunks, and then doing
>>>>> skipTo to position your iterators to the beginning of each chunk).
>>>>>
>>>>> First approach is really easy to implement. Second one is harder, but
>>>>> still doesn't require you to cook the number of CPU cores available
>>>>> into your index!
>>>>>
>>>>> It's the law of diminishing returns at play here. You're most likely
>>>>> to search in parallel over mostly memory-resident index
>>>>> (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
>>>>> tend to slow down considerably on parallel sequential reads, so you
>>>>> already have pretty decent speed.
>>>>> Searching different segments in parallel (with BSMP) makes you several
>>>>> times faster.
>>>>> Searching in parallel within a segment requires some weird hacks, but
>>>>> has maybe a few percent advantage over previous solution.
>>>>> Sharding posting lists requires a great deal of weird hacks, makes
>>>>> index machine-bound, and boosts speed by another couple of percent.
>>>>> Sounds worthless.
>>>>>
>>>>> --
>>>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>>>> Phone: +7 (495) 683-567-4
>>>>> ICQ: 104465785
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

   I sent a mail to MG4J group and Sebastiano Vigna recommended the paper
Reducing query latencies in web search using fine-grained parallelism.
World Wide Web, 12(4):441-460, 2009.
   I read it roughly. But there are some questions
     it says:
    it first coalesces all disk reads in a single process, and then
distributes the index data among
the parallel threads. So when the index server receives a query, it
loads from storage (disk or cache)
the required posting lists, initializes the query execution, and then
spawns lightweight threads, one per core.
Each thread receives an equal-sized subset of document IDs to scan,
together covering the entire index
partition. All threads execute the same code on the same query, but
with private data structures.
The only writable shared data structure is a heap of the top-scoring
hits, protected by a lock. At the end of the threads'
execution, the heap contains the highest-scoring hits in the entire
index partition, which is then transmitted to the query
integrator as usual. Since the index contains skip-lists that permit
near-random-access to any document ID, and since
hits are generally distributed evenly in the index, our measurements
show that all threads complete their work with
little variability, within 1-2% of each other.

  I have some questions
  1.Since the index contains skip-lists that permit
near-random-access to any document ID
   skip list can be near random access?(especially when it's in hard disk)

  2.For "or query", it's easy. e.g. we search "search" and "engine",
we using one main thread to get postings the
the 2 terms and divide their postings into 2 groups(e.g. even docIds
and odd docIds) and using 2 threads to score
them and finally merge it(or using locked priority queue)
     but for "and query", we usally don't decode all the docIDs
because we can skip many documents. especially when
searching low frequent terms with high frequent terms. only a small
number of docIDs of high frequent term is decoded.

   btw. I think or query is often much slower than and query. If we can
parallel or query well, it's also very useful.

On Dec 31, 2010, at 7:25 AM, Li Li wrote:

>    Which one is used in MG4J to support multithreads searching? Are

2010/12/31 Li Li <fa...@gmail.com>:
> is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
> it says Multithreading. Indices can be queried and scored concurrently.
> maybe we can learn something from it.
>
> 2010/12/31 Li Li <fa...@gmail.com>:
>> plus
>> 2 means search a term need seek many times for tis(if it's not cached in tii)
>>
>> 2010/12/31 Li Li <fa...@gmail.com>:
>>> searching multi segments is a alternative solution but it has some
>>> disadvantages.
>>> 1. idf is not global?(I am not familiar with its implementation) maybe
>>> it's easy to solve it by share global idf
>>> 2. each segments will has it's own tii and tis files, which may make
>>> search slower(that's why optimization of
>>> index is neccessary)
>>> 3. one term's  docList is distributed in many files rather than one.
>>> more than one frq files means
>>> hard disk must seek different tracks, it's time consuming. if there is
>>> only one segment, the are likely
>>> stored in a single track.
>>>
>>>
>>> 2010/12/31 Earwin Burrfoot <ea...@gmail.com>:
>>>>>>>until we fix Lucene to run a single search concurrently (which we
>>>>>>>badly need to do).
>>>>> I am interested in this idea.(I have posted it before) do you have some
>>>>> resources such as papers or tech articles about it?
>>>>> I have tried but it need to modify index format dramatically and we use
>>>>> solr distributed search to relieve the problem of response time. so finally
>>>>> give it up.
>>>>> lucene4's index format is more flexible that it supports customed codecs
>>>>> and it's now on development, I think it's good time to take it into
>>>>> consideration
>>>>> that let it support multithread searching for a single query.
>>>>> I have a naive solution. dividing docList into many groups
>>>>> e.g grouping docIds by it's even or odd
>>>>> term1 df1=4  docList =  0  4  8  10
>>>>> term1 df2=4  docList = 1  3  9  11
>>>>>
>>>>> term2 df1=4  docList = 0  6  8  12
>>>>> term2 df2=4  docList = 3  9  11 15
>>>>>   then we can use 2 threads to search topN docs on even group and odd group
>>>>> and finally merge their results into a single on just like solr
>>>>> distributed search.
>>>>> But it's better than solr distributed search.
>>>>>   First, it's in a single process and data communication between
>>>>> threads is much
>>>>> faster than network.
>>>>>   Second, each threads process the same number of documents.For solr
>>>>> distributed
>>>>> search, one shard may process 7 documents and another shard may 1 document
>>>>> Even if we can make each shard have the same document number. we can not
>>>>> make it uniformly for each term.
>>>>>    e.g. shard1 has doc1 doc2
>>>>>           shard2 has doc3 doc4
>>>>>    but term1 may only occur in doc1 and doc2
>>>>>    while term2 may only occur in doc3 and doc4
>>>>>    we may modify it
>>>>>           shard1 doc1 doc3
>>>>>           shard2 doc2 doc4
>>>>>    it's good for term1 and term2
>>>>>    but term3 may occur in doc1 and doc3...
>>>>>    So I think it's fine-grained distributed in index while solr
>>>>> distributed search is coarse-
>>>>> grained.
>>>> This is just crazy :)
>>>>
>>>> The simple way is just to search different segments in parallel.
>>>> BalancedSegmentMergePolicy makes sure you have roughly even-sized
>>>> large segments (and small ones don't count, they're small!).
>>>> If you're bound on squeezing out that extra millisecond (and making
>>>> your life miserable along the way), you can search a single segment
>>>> with multiple threads (by dividing it in even chunks, and then doing
>>>> skipTo to position your iterators to the beginning of each chunk).
>>>>
>>>> First approach is really easy to implement. Second one is harder, but
>>>> still doesn't require you to cook the number of CPU cores available
>>>> into your index!
>>>>
>>>> It's the law of diminishing returns at play here. You're most likely
>>>> to search in parallel over mostly memory-resident index
>>>> (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
>>>> tend to slow down considerably on parallel sequential reads, so you
>>>> already have pretty decent speed.
>>>> Searching different segments in parallel (with BSMP) makes you several
>>>> times faster.
>>>> Searching in parallel within a segment requires some weird hacks, but
>>>> has maybe a few percent advantage over previous solution.
>>>> Sharding posting lists requires a great deal of weird hacks, makes
>>>> index machine-bound, and boosts speed by another couple of percent.
>>>> Sounds worthless.
>>>>
>>>> --
>>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>>> Phone: +7 (495) 683-567-4
>>>> ICQ: 104465785
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Lance Norskog <go...@gmail.com>.

Please purchase and read Lucene In Action 2. It will explain how all
of this works and how to used Lucene efficiently.

http://www.lucidimagination.com/blog/2009/03/11/lia2/
http://www.lucidimagination.com/blog/2010/08/01/lucene-in-action-2d-edition-authors-round-table-podcast/
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Lucene-in-Action

It is available from Manning as a downloadable E-Book (PDF) so you
don't have to wait for it to be shipped.

2010/12/30 Li Li <fa...@gmail.com>:
> is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
> it says Multithreading. Indices can be queried and scored concurrently.
> maybe we can learn something from it.
>
> 2010/12/31 Li Li <fa...@gmail.com>:
>> plus
>> 2 means search a term need seek many times for tis(if it's not cached in tii)
>>
>> 2010/12/31 Li Li <fa...@gmail.com>:
>>> searching multi segments is a alternative solution but it has some
>>> disadvantages.
>>> 1. idf is not global?(I am not familiar with its implementation) maybe
>>> it's easy to solve it by share global idf
>>> 2. each segments will has it's own tii and tis files, which may make
>>> search slower(that's why optimization of
>>> index is neccessary)
>>> 3. one term's  docList is distributed in many files rather than one.
>>> more than one frq files means
>>> hard disk must seek different tracks, it's time consuming. if there is
>>> only one segment, the are likely
>>> stored in a single track.
>>>
>>>
>>> 2010/12/31 Earwin Burrfoot <ea...@gmail.com>:
>>>>>>>until we fix Lucene to run a single search concurrently (which we
>>>>>>>badly need to do).
>>>>> I am interested in this idea.(I have posted it before) do you have some
>>>>> resources such as papers or tech articles about it?
>>>>> I have tried but it need to modify index format dramatically and we use
>>>>> solr distributed search to relieve the problem of response time. so finally
>>>>> give it up.
>>>>> lucene4's index format is more flexible that it supports customed codecs
>>>>> and it's now on development, I think it's good time to take it into
>>>>> consideration
>>>>> that let it support multithread searching for a single query.
>>>>> I have a naive solution. dividing docList into many groups
>>>>> e.g grouping docIds by it's even or odd
>>>>> term1 df1=4  docList =  0  4  8  10
>>>>> term1 df2=4  docList = 1  3  9  11
>>>>>
>>>>> term2 df1=4  docList = 0  6  8  12
>>>>> term2 df2=4  docList = 3  9  11 15
>>>>>   then we can use 2 threads to search topN docs on even group and odd group
>>>>> and finally merge their results into a single on just like solr
>>>>> distributed search.
>>>>> But it's better than solr distributed search.
>>>>>   First, it's in a single process and data communication between
>>>>> threads is much
>>>>> faster than network.
>>>>>   Second, each threads process the same number of documents.For solr
>>>>> distributed
>>>>> search, one shard may process 7 documents and another shard may 1 document
>>>>> Even if we can make each shard have the same document number. we can not
>>>>> make it uniformly for each term.
>>>>>    e.g. shard1 has doc1 doc2
>>>>>           shard2 has doc3 doc4
>>>>>    but term1 may only occur in doc1 and doc2
>>>>>    while term2 may only occur in doc3 and doc4
>>>>>    we may modify it
>>>>>           shard1 doc1 doc3
>>>>>           shard2 doc2 doc4
>>>>>    it's good for term1 and term2
>>>>>    but term3 may occur in doc1 and doc3...
>>>>>    So I think it's fine-grained distributed in index while solr
>>>>> distributed search is coarse-
>>>>> grained.
>>>> This is just crazy :)
>>>>
>>>> The simple way is just to search different segments in parallel.
>>>> BalancedSegmentMergePolicy makes sure you have roughly even-sized
>>>> large segments (and small ones don't count, they're small!).
>>>> If you're bound on squeezing out that extra millisecond (and making
>>>> your life miserable along the way), you can search a single segment
>>>> with multiple threads (by dividing it in even chunks, and then doing
>>>> skipTo to position your iterators to the beginning of each chunk).
>>>>
>>>> First approach is really easy to implement. Second one is harder, but
>>>> still doesn't require you to cook the number of CPU cores available
>>>> into your index!
>>>>
>>>> It's the law of diminishing returns at play here. You're most likely
>>>> to search in parallel over mostly memory-resident index
>>>> (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
>>>> tend to slow down considerably on parallel sequential reads, so you
>>>> already have pretty decent speed.
>>>> Searching different segments in parallel (with BSMP) makes you several
>>>> times faster.
>>>> Searching in parallel within a segment requires some weird hacks, but
>>>> has maybe a few percent advantage over previous solution.
>>>> Sharding posting lists requires a great deal of weird hacks, makes
>>>> index machine-bound, and boosts speed by another couple of percent.
>>>> Sounds worthless.
>>>>
>>>> --
>>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>>> Phone: +7 (495) 683-567-4
>>>> ICQ: 104465785
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

is there anyone familiar with MG4J(http://mg4j.dsi.unimi.it/)
it says Multithreading. Indices can be queried and scored concurrently.
maybe we can learn something from it.

2010/12/31 Li Li <fa...@gmail.com>:
> plus
> 2 means search a term need seek many times for tis(if it's not cached in tii)
>
> 2010/12/31 Li Li <fa...@gmail.com>:
>> searching multi segments is a alternative solution but it has some
>> disadvantages.
>> 1. idf is not global?(I am not familiar with its implementation) maybe
>> it's easy to solve it by share global idf
>> 2. each segments will has it's own tii and tis files, which may make
>> search slower(that's why optimization of
>> index is neccessary)
>> 3. one term's  docList is distributed in many files rather than one.
>> more than one frq files means
>> hard disk must seek different tracks, it's time consuming. if there is
>> only one segment, the are likely
>> stored in a single track.
>>
>>
>> 2010/12/31 Earwin Burrfoot <ea...@gmail.com>:
>>>>>>until we fix Lucene to run a single search concurrently (which we
>>>>>>badly need to do).
>>>> I am interested in this idea.(I have posted it before) do you have some
>>>> resources such as papers or tech articles about it?
>>>> I have tried but it need to modify index format dramatically and we use
>>>> solr distributed search to relieve the problem of response time. so finally
>>>> give it up.
>>>> lucene4's index format is more flexible that it supports customed codecs
>>>> and it's now on development, I think it's good time to take it into
>>>> consideration
>>>> that let it support multithread searching for a single query.
>>>> I have a naive solution. dividing docList into many groups
>>>> e.g grouping docIds by it's even or odd
>>>> term1 df1=4  docList =  0  4  8  10
>>>> term1 df2=4  docList = 1  3  9  11
>>>>
>>>> term2 df1=4  docList = 0  6  8  12
>>>> term2 df2=4  docList = 3  9  11 15
>>>>   then we can use 2 threads to search topN docs on even group and odd group
>>>> and finally merge their results into a single on just like solr
>>>> distributed search.
>>>> But it's better than solr distributed search.
>>>>   First, it's in a single process and data communication between
>>>> threads is much
>>>> faster than network.
>>>>   Second, each threads process the same number of documents.For solr
>>>> distributed
>>>> search, one shard may process 7 documents and another shard may 1 document
>>>> Even if we can make each shard have the same document number. we can not
>>>> make it uniformly for each term.
>>>>    e.g. shard1 has doc1 doc2
>>>>           shard2 has doc3 doc4
>>>>    but term1 may only occur in doc1 and doc2
>>>>    while term2 may only occur in doc3 and doc4
>>>>    we may modify it
>>>>           shard1 doc1 doc3
>>>>           shard2 doc2 doc4
>>>>    it's good for term1 and term2
>>>>    but term3 may occur in doc1 and doc3...
>>>>    So I think it's fine-grained distributed in index while solr
>>>> distributed search is coarse-
>>>> grained.
>>> This is just crazy :)
>>>
>>> The simple way is just to search different segments in parallel.
>>> BalancedSegmentMergePolicy makes sure you have roughly even-sized
>>> large segments (and small ones don't count, they're small!).
>>> If you're bound on squeezing out that extra millisecond (and making
>>> your life miserable along the way), you can search a single segment
>>> with multiple threads (by dividing it in even chunks, and then doing
>>> skipTo to position your iterators to the beginning of each chunk).
>>>
>>> First approach is really easy to implement. Second one is harder, but
>>> still doesn't require you to cook the number of CPU cores available
>>> into your index!
>>>
>>> It's the law of diminishing returns at play here. You're most likely
>>> to search in parallel over mostly memory-resident index
>>> (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
>>> tend to slow down considerably on parallel sequential reads, so you
>>> already have pretty decent speed.
>>> Searching different segments in parallel (with BSMP) makes you several
>>> times faster.
>>> Searching in parallel within a segment requires some weird hacks, but
>>> has maybe a few percent advantage over previous solution.
>>> Sharding posting lists requires a great deal of weird hacks, makes
>>> index machine-bound, and boosts speed by another couple of percent.
>>> Sounds worthless.
>>>
>>> --
>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>> Phone: +7 (495) 683-567-4
>>> ICQ: 104465785
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

plus
2 means search a term need seek many times for tis(if it's not cached in tii)

2010/12/31 Li Li <fa...@gmail.com>:
> searching multi segments is a alternative solution but it has some
> disadvantages.
> 1. idf is not global?(I am not familiar with its implementation) maybe
> it's easy to solve it by share global idf
> 2. each segments will has it's own tii and tis files, which may make
> search slower(that's why optimization of
> index is neccessary)
> 3. one term's  docList is distributed in many files rather than one.
> more than one frq files means
> hard disk must seek different tracks, it's time consuming. if there is
> only one segment, the are likely
> stored in a single track.
>
>
> 2010/12/31 Earwin Burrfoot <ea...@gmail.com>:
>>>>>until we fix Lucene to run a single search concurrently (which we
>>>>>badly need to do).
>>> I am interested in this idea.(I have posted it before) do you have some
>>> resources such as papers or tech articles about it?
>>> I have tried but it need to modify index format dramatically and we use
>>> solr distributed search to relieve the problem of response time. so finally
>>> give it up.
>>> lucene4's index format is more flexible that it supports customed codecs
>>> and it's now on development, I think it's good time to take it into
>>> consideration
>>> that let it support multithread searching for a single query.
>>> I have a naive solution. dividing docList into many groups
>>> e.g grouping docIds by it's even or odd
>>> term1 df1=4  docList =  0  4  8  10
>>> term1 df2=4  docList = 1  3  9  11
>>>
>>> term2 df1=4  docList = 0  6  8  12
>>> term2 df2=4  docList = 3  9  11 15
>>>   then we can use 2 threads to search topN docs on even group and odd group
>>> and finally merge their results into a single on just like solr
>>> distributed search.
>>> But it's better than solr distributed search.
>>>   First, it's in a single process and data communication between
>>> threads is much
>>> faster than network.
>>>   Second, each threads process the same number of documents.For solr
>>> distributed
>>> search, one shard may process 7 documents and another shard may 1 document
>>> Even if we can make each shard have the same document number. we can not
>>> make it uniformly for each term.
>>>    e.g. shard1 has doc1 doc2
>>>           shard2 has doc3 doc4
>>>    but term1 may only occur in doc1 and doc2
>>>    while term2 may only occur in doc3 and doc4
>>>    we may modify it
>>>           shard1 doc1 doc3
>>>           shard2 doc2 doc4
>>>    it's good for term1 and term2
>>>    but term3 may occur in doc1 and doc3...
>>>    So I think it's fine-grained distributed in index while solr
>>> distributed search is coarse-
>>> grained.
>> This is just crazy :)
>>
>> The simple way is just to search different segments in parallel.
>> BalancedSegmentMergePolicy makes sure you have roughly even-sized
>> large segments (and small ones don't count, they're small!).
>> If you're bound on squeezing out that extra millisecond (and making
>> your life miserable along the way), you can search a single segment
>> with multiple threads (by dividing it in even chunks, and then doing
>> skipTo to position your iterators to the beginning of each chunk).
>>
>> First approach is really easy to implement. Second one is harder, but
>> still doesn't require you to cook the number of CPU cores available
>> into your index!
>>
>> It's the law of diminishing returns at play here. You're most likely
>> to search in parallel over mostly memory-resident index
>> (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
>> tend to slow down considerably on parallel sequential reads, so you
>> already have pretty decent speed.
>> Searching different segments in parallel (with BSMP) makes you several
>> times faster.
>> Searching in parallel within a segment requires some weird hacks, but
>> has maybe a few percent advantage over previous solution.
>> Sharding posting lists requires a great deal of weird hacks, makes
>> index machine-bound, and boosts speed by another couple of percent.
>> Sounds worthless.
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>> Phone: +7 (495) 683-567-4
>> ICQ: 104465785
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

searching multi segments is a alternative solution but it has some
disadvantages.
1. idf is not global?(I am not familiar with its implementation) maybe
it's easy to solve it by share global idf
2. each segments will has it's own tii and tis files, which may make
search slower(that's why optimization of
index is neccessary)
3. one term's  docList is distributed in many files rather than one.
more than one frq files means
hard disk must seek different tracks, it's time consuming. if there is
only one segment, the are likely
stored in a single track.


2010/12/31 Earwin Burrfoot <ea...@gmail.com>:
>>>>until we fix Lucene to run a single search concurrently (which we
>>>>badly need to do).
>> I am interested in this idea.(I have posted it before) do you have some
>> resources such as papers or tech articles about it?
>> I have tried but it need to modify index format dramatically and we use
>> solr distributed search to relieve the problem of response time. so finally
>> give it up.
>> lucene4's index format is more flexible that it supports customed codecs
>> and it's now on development, I think it's good time to take it into
>> consideration
>> that let it support multithread searching for a single query.
>> I have a naive solution. dividing docList into many groups
>> e.g grouping docIds by it's even or odd
>> term1 df1=4  docList =  0  4  8  10
>> term1 df2=4  docList = 1  3  9  11
>>
>> term2 df1=4  docList = 0  6  8  12
>> term2 df2=4  docList = 3  9  11 15
>>   then we can use 2 threads to search topN docs on even group and odd group
>> and finally merge their results into a single on just like solr
>> distributed search.
>> But it's better than solr distributed search.
>>   First, it's in a single process and data communication between
>> threads is much
>> faster than network.
>>   Second, each threads process the same number of documents.For solr
>> distributed
>> search, one shard may process 7 documents and another shard may 1 document
>> Even if we can make each shard have the same document number. we can not
>> make it uniformly for each term.
>>    e.g. shard1 has doc1 doc2
>>           shard2 has doc3 doc4
>>    but term1 may only occur in doc1 and doc2
>>    while term2 may only occur in doc3 and doc4
>>    we may modify it
>>           shard1 doc1 doc3
>>           shard2 doc2 doc4
>>    it's good for term1 and term2
>>    but term3 may occur in doc1 and doc3...
>>    So I think it's fine-grained distributed in index while solr
>> distributed search is coarse-
>> grained.
> This is just crazy :)
>
> The simple way is just to search different segments in parallel.
> BalancedSegmentMergePolicy makes sure you have roughly even-sized
> large segments (and small ones don't count, they're small!).
> If you're bound on squeezing out that extra millisecond (and making
> your life miserable along the way), you can search a single segment
> with multiple threads (by dividing it in even chunks, and then doing
> skipTo to position your iterators to the beginning of each chunk).
>
> First approach is really easy to implement. Second one is harder, but
> still doesn't require you to cook the number of CPU cores available
> into your index!
>
> It's the law of diminishing returns at play here. You're most likely
> to search in parallel over mostly memory-resident index
> (RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
> tend to slow down considerably on parallel sequential reads, so you
> already have pretty decent speed.
> Searching different segments in parallel (with BSMP) makes you several
> times faster.
> Searching in parallel within a segment requires some weird hacks, but
> has maybe a few percent advantage over previous solution.
> Sharding posting lists requires a great deal of weird hacks, makes
> index machine-bound, and boosts speed by another couple of percent.
> Sounds worthless.
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Earwin Burrfoot <ea...@gmail.com>.

>>>until we fix Lucene to run a single search concurrently (which we
>>>badly need to do).
> I am interested in this idea.(I have posted it before) do you have some
> resources such as papers or tech articles about it?
> I have tried but it need to modify index format dramatically and we use
> solr distributed search to relieve the problem of response time. so finally
> give it up.
> lucene4's index format is more flexible that it supports customed codecs
> and it's now on development, I think it's good time to take it into
> consideration
> that let it support multithread searching for a single query.
> I have a naive solution. dividing docList into many groups
> e.g grouping docIds by it's even or odd
> term1 df1=4  docList =  0  4  8  10
> term1 df2=4  docList = 1  3  9  11
>
> term2 df1=4  docList = 0  6  8  12
> term2 df2=4  docList = 3  9  11 15
>   then we can use 2 threads to search topN docs on even group and odd group
> and finally merge their results into a single on just like solr
> distributed search.
> But it's better than solr distributed search.
>   First, it's in a single process and data communication between
> threads is much
> faster than network.
>   Second, each threads process the same number of documents.For solr
> distributed
> search, one shard may process 7 documents and another shard may 1 document
> Even if we can make each shard have the same document number. we can not
> make it uniformly for each term.
>    e.g. shard1 has doc1 doc2
>           shard2 has doc3 doc4
>    but term1 may only occur in doc1 and doc2
>    while term2 may only occur in doc3 and doc4
>    we may modify it
>           shard1 doc1 doc3
>           shard2 doc2 doc4
>    it's good for term1 and term2
>    but term3 may occur in doc1 and doc3...
>    So I think it's fine-grained distributed in index while solr
> distributed search is coarse-
> grained.
This is just crazy :)

The simple way is just to search different segments in parallel.
BalancedSegmentMergePolicy makes sure you have roughly even-sized
large segments (and small ones don't count, they're small!).
If you're bound on squeezing out that extra millisecond (and making
your life miserable along the way), you can search a single segment
with multiple threads (by dividing it in even chunks, and then doing
skipTo to position your iterators to the beginning of each chunk).

First approach is really easy to implement. Second one is harder, but
still doesn't require you to cook the number of CPU cores available
into your index!

It's the law of diminishing returns at play here. You're most likely
to search in parallel over mostly memory-resident index
(RAMDir/mmap/filesys cache - doesn't matter), as most of IO subsystems
tend to slow down considerably on parallel sequential reads, so you
already have pretty decent speed.
Searching different segments in parallel (with BSMP) makes you several
times faster.
Searching in parallel within a segment requires some weird hacks, but
has maybe a few percent advantage over previous solution.
Sharding posting lists requires a great deal of weird hacks, makes
index machine-bound, and boosts speed by another couple of percent.
Sounds worthless.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

I did another test using lucene 4 trunk with default codecs. it's file
is the same as lucene 2.9.
the speed is almost the same as lucene 2.9

>> I think it could be the
>>fact that AND query does block reads (64 doc/freqs at once) instead of
>>doc-at-once?  Ie, because of this, the query is efficitively scanning
>>the next block of 64 docs instead of skipping to them?  Our skipping
>>impl is unfortunately rather costly so if skip will not skip that many
>>docs it's better to scan.
I agree with this explanation. for high frequency terms, the skiplist can
not skip over many docs. it seems there are something need optimization.
e.g. for high frequent terms, we use scanning; for low frequent terms, we
use skiplist. but if we only care "bad case", we can just care high frequent
terms only.

>>You only use PFor for the very high freq terms in 2.9.x right?
I use PFor if df is greater than 128. if not, I use VINT

>>until we fix Lucene to run a single search concurrently (which we
>>badly need to do).
I am interested in this idea.(I have posted it before) do you have some
resources such as papers or tech articles about it?
I have tried but it need to modify index format dramatically and we use
solr distributed search to relieve the problem of response time. so finally
give it up.
lucene4's index format is more flexible that it supports customed codecs
and it's now on development, I think it's good time to take it into
consideration
that let it support multithread searching for a single query.
I have a naive solution. dividing docList into many groups
e.g grouping docIds by it's even or odd
term1 df1=4  docList =  0  4  8  10
term1 df2=4  docList = 1  3  9  11

term2 df1=4  docList = 0  6  8  12
term2 df2=4  docList = 3  9  11 15
   then we can use 2 threads to search topN docs on even group and odd group
and finally merge their results into a single on just like solr
distributed search.
But it's better than solr distributed search.
   First, it's in a single process and data communication between
threads is much
faster than network.
   Second, each threads process the same number of documents.For solr
distributed
search, one shard may process 7 documents and another shard may 1 document
Even if we can make each shard have the same document number. we can not
make it uniformly for each term.
    e.g. shard1 has doc1 doc2
           shard2 has doc3 doc4
    but term1 may only occur in doc1 and doc2
    while term2 may only occur in doc3 and doc4
    we may modify it
           shard1 doc1 doc3
           shard2 doc2 doc4
    it's good for term1 and term2
    but term3 may occur in doc1 and doc3...
    So I think it's fine-grained distributed in index while solr
distributed search is coarse-
grained.







2010/12/30 Michael McCandless <lu...@mikemccandless.com>:
> On Mon, Dec 27, 2010 at 5:08 AM, Li Li <fa...@gmail.com> wrote:
>> I integrated pfor codec into lucene 2.9.3 and the search time
>> comparsion is as follows:
>>                                   single term   and query   or query
>> VINT in lucene 2.9.3         11.2            36.5           38.6
>> PFor in lucene 2.9.3         8.7              27.6           33.4
>> VINT in lucene 4 branch   10.6             26.5           35.4
>> PFor in lcuene 4 branch    8.1              22.5           30.7
>>
>> My test terms are high frequncy terms because we are interested in "bad case"
>
> I agree it's the bad cases we should focus on in general.  If a super
> fast query gets somewhat slower it's "relatively harmless" (just a
> "capacity" question for high volume sites) but if the bad queries get
> slower it's awful (requires faster cutover to sharded architecture),
> until we fix Lucene to run a single search concurrently (which we
> badly need to do).
>
>> It seems lucene 4 branch's implementation of and query(conjuction
>> query) is well optimized that even for VINT codec, it's faster than
>> PFor in lucene 2.9.3. Could any one tell me what optimization is done?
>> is store docIDs and freqs separately making it faster? or anything
>> else?
>
> Actually vInt on the bulkpostings branch stores freq/doc together.  Ie
> the format is the same as 2.9.x's format.  I think it could be the
> fact that AND query does block reads (64 doc/freqs at once) instead of
> doc-at-once?  Ie, because of this, the query is efficitively scanning
> the next block of 64 docs instead of skipping to them?  Our skipping
> impl is unfortunately rather costly so if skip will not skip that many
> docs it's better to scan.
>
>> Another querstion, Is there anyone interested in integrating pfor
>> codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
>> 1.4). And how do I contribute this patch?
>
> Realistically I don't think we can commit this to 2.9.x -- that branch
> is purely bug fixes at this point.
>
> Still it's possible others could make use of such a patch so if it's
> not too much work you may as well post it?  It can lead to
> improvements on the bulk postings branch too :)  The more patches the
> merrier!
>
> You only use PFor for the very high freq terms in 2.9.x right?  I've
> wondered if we should do the same on bulkpostings... problem is for eg
> range queries, that visit all docs for all terms b/w X and Y, you want
> the bulk decode even for low freq terms...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Dec 27, 2010 at 5:08 AM, Li Li <fa...@gmail.com> wrote:
> I integrated pfor codec into lucene 2.9.3 and the search time
> comparsion is as follows:
>                                   single term   and query   or query
> VINT in lucene 2.9.3         11.2            36.5           38.6
> PFor in lucene 2.9.3         8.7              27.6           33.4
> VINT in lucene 4 branch   10.6             26.5           35.4
> PFor in lcuene 4 branch    8.1              22.5           30.7
>
> My test terms are high frequncy terms because we are interested in "bad case"

I agree it's the bad cases we should focus on in general.  If a super
fast query gets somewhat slower it's "relatively harmless" (just a
"capacity" question for high volume sites) but if the bad queries get
slower it's awful (requires faster cutover to sharded architecture),
until we fix Lucene to run a single search concurrently (which we
badly need to do).

> It seems lucene 4 branch's implementation of and query(conjuction
> query) is well optimized that even for VINT codec, it's faster than
> PFor in lucene 2.9.3. Could any one tell me what optimization is done?
> is store docIDs and freqs separately making it faster? or anything
> else?

Actually vInt on the bulkpostings branch stores freq/doc together.  Ie
the format is the same as 2.9.x's format.  I think it could be the
fact that AND query does block reads (64 doc/freqs at once) instead of
doc-at-once?  Ie, because of this, the query is efficitively scanning
the next block of 64 docs instead of skipping to them?  Our skipping
impl is unfortunately rather costly so if skip will not skip that many
docs it's better to scan.

> Another querstion, Is there anyone interested in integrating pfor
> codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
> 1.4). And how do I contribute this patch?

Realistically I don't think we can commit this to 2.9.x -- that branch
is purely bug fixes at this point.

Still it's possible others could make use of such a patch so if it's
not too much work you may as well post it?  It can lead to
improvements on the bulk postings branch too :)  The more patches the
merrier!

You only use PFor for the very high freq terms in 2.9.x right?  I've
wondered if we should do the same on bulkpostings... problem is for eg
range queries, that visit all docs for all terms b/w X and Y, you want
the bulk decode even for low freq terms...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

I integrated pfor codec into lucene 2.9.3 and the search time
comparsion is as follows:
                                   single term   and query   or query
VINT in lucene 2.9.3         11.2            36.5           38.6
PFor in lucene 2.9.3         8.7              27.6           33.4
VINT in lucene 4 branch   10.6             26.5           35.4
PFor in lcuene 4 branch    8.1              22.5           30.7

My test terms are high frequncy terms because we are interested in "bad case"
It seems lucene 4 branch's implementation of and query(conjuction
query) is well optimized that even for VINT codec, it's faster than
PFor in lucene 2.9.3. Could any one tell me what optimization is done?
is store docIDs and freqs separately making it faster? or anything
else?

Another querstion, Is there anyone interested in integrating pfor
codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
1.4). And how do I contribute this patch?

2010/12/24 Michael McCandless <lu...@mikemccandless.com>:
> Well, an early patch somewhere was able to run PFor on trunk, but the
> performance wasn't great because the trunk bulk-read API is a
> bottleneck (this is why the bulk postings branch was created).
>
> Mike
>
> On Wed, Dec 22, 2010 at 9:45 PM, Li Li <fa...@gmail.com> wrote:
>> I used the bulkpostings
>> branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
>> does trunk have PForDelta decoder/encoder ?
>>
>> 2010/12/23 Michael McCandless <lu...@mikemccandless.com>:
>>> Those are nice speedups!
>>>
>>> Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test?
>>>
>>> Mike
>>>
>>> On Tue, Dec 21, 2010 at 9:59 PM, Li Li <fa...@gmail.com> wrote:
>>>> great improvement!
>>>> I did a test in our data set. doc count is about 2M+ and index size
>>>> after optimization is about 13.3GB(including fdt)
>>>> it seems lucene4's index format is better than lucene2.9.3. and PFor
>>>> give good results.
>>>> Besides BlockEncoder for frq and pos. is there any other modification
>>>> for lucene 4?
>>>>
>>>>       decoder    \ avg time     single word(ms)          and
>>>> query(ms)     or query(ms)
>>>>  VINT in lucene 2.9                   11.2
>>>> 36.5                 38.6
>>>>  VINT in lucene 4 branch           10.6
>>>> 26.5                 35.4
>>>>  PFor in lucene 4 branch             8.1
>>>> 22.5                 30.7
>>>> 2010/12/21 Li Li <fa...@gmail.com>:
>>>>>> OK we should have a look at that one still.  We need to converge on a
>>>>>> good default codec for 4.0.  Fortunately it's trivial to take any int
>>>>>> block encoder (fixed or variable block) and make a Lucene codec out of
>>>>>> it!
>>>>>
>>>>> I suggests you not to use this one, I fixed dozens of bugs but it
>>>>> still failed when with random tests. it's codes is hand coded rather
>>>>> than generated by program. But we may learn something from it.
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

Well, an early patch somewhere was able to run PFor on trunk, but the
performance wasn't great because the trunk bulk-read API is a
bottleneck (this is why the bulk postings branch was created).

Mike

On Wed, Dec 22, 2010 at 9:45 PM, Li Li <fa...@gmail.com> wrote:
> I used the bulkpostings
> branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
> does trunk have PForDelta decoder/encoder ?
>
> 2010/12/23 Michael McCandless <lu...@mikemccandless.com>:
>> Those are nice speedups!
>>
>> Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test?
>>
>> Mike
>>
>> On Tue, Dec 21, 2010 at 9:59 PM, Li Li <fa...@gmail.com> wrote:
>>> great improvement!
>>> I did a test in our data set. doc count is about 2M+ and index size
>>> after optimization is about 13.3GB(including fdt)
>>> it seems lucene4's index format is better than lucene2.9.3. and PFor
>>> give good results.
>>> Besides BlockEncoder for frq and pos. is there any other modification
>>> for lucene 4?
>>>
>>>       decoder    \ avg time     single word(ms)          and
>>> query(ms)     or query(ms)
>>>  VINT in lucene 2.9                   11.2
>>> 36.5                 38.6
>>>  VINT in lucene 4 branch           10.6
>>> 26.5                 35.4
>>>  PFor in lucene 4 branch             8.1
>>> 22.5                 30.7
>>> 2010/12/21 Li Li <fa...@gmail.com>:
>>>>> OK we should have a look at that one still.  We need to converge on a
>>>>> good default codec for 4.0.  Fortunately it's trivial to take any int
>>>>> block encoder (fixed or variable block) and make a Lucene codec out of
>>>>> it!
>>>>
>>>> I suggests you not to use this one, I fixed dozens of bugs but it
>>>> still failed when with random tests. it's codes is hand coded rather
>>>> than generated by program. But we may learn something from it.
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

I used the bulkpostings
branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene)
does trunk have PForDelta decoder/encoder ?

2010/12/23 Michael McCandless <lu...@mikemccandless.com>:
> Those are nice speedups!
>
> Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test?
>
> Mike
>
> On Tue, Dec 21, 2010 at 9:59 PM, Li Li <fa...@gmail.com> wrote:
>> great improvement!
>> I did a test in our data set. doc count is about 2M+ and index size
>> after optimization is about 13.3GB(including fdt)
>> it seems lucene4's index format is better than lucene2.9.3. and PFor
>> give good results.
>> Besides BlockEncoder for frq and pos. is there any other modification
>> for lucene 4?
>>
>>       decoder    \ avg time     single word(ms)          and
>> query(ms)     or query(ms)
>>  VINT in lucene 2.9                   11.2
>> 36.5                 38.6
>>  VINT in lucene 4 branch           10.6
>> 26.5                 35.4
>>  PFor in lucene 4 branch             8.1
>> 22.5                 30.7
>> 2010/12/21 Li Li <fa...@gmail.com>:
>>>> OK we should have a look at that one still.  We need to converge on a
>>>> good default codec for 4.0.  Fortunately it's trivial to take any int
>>>> block encoder (fixed or variable block) and make a Lucene codec out of
>>>> it!
>>>
>>> I suggests you not to use this one, I fixed dozens of bugs but it
>>> still failed when with random tests. it's codes is hand coded rather
>>> than generated by program. But we may learn something from it.
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

Those are nice speedups!

Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test?

Mike

On Tue, Dec 21, 2010 at 9:59 PM, Li Li <fa...@gmail.com> wrote:
> great improvement!
> I did a test in our data set. doc count is about 2M+ and index size
> after optimization is about 13.3GB(including fdt)
> it seems lucene4's index format is better than lucene2.9.3. and PFor
> give good results.
> Besides BlockEncoder for frq and pos. is there any other modification
> for lucene 4?
>
>       decoder    \ avg time     single word(ms)          and
> query(ms)     or query(ms)
>  VINT in lucene 2.9                   11.2
> 36.5                 38.6
>  VINT in lucene 4 branch           10.6
> 26.5                 35.4
>  PFor in lucene 4 branch             8.1
> 22.5                 30.7
> 2010/12/21 Li Li <fa...@gmail.com>:
>>> OK we should have a look at that one still.  We need to converge on a
>>> good default codec for 4.0.  Fortunately it's trivial to take any int
>>> block encoder (fixed or variable block) and make a Lucene codec out of
>>> it!
>>
>> I suggests you not to use this one, I fixed dozens of bugs but it
>> still failed when with random tests. it's codes is hand coded rather
>> than generated by program. But we may learn something from it.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

great improvement!
I did a test in our data set. doc count is about 2M+ and index size
after optimization is about 13.3GB(including fdt)
it seems lucene4's index format is better than lucene2.9.3. and PFor
give good results.
Besides BlockEncoder for frq and pos. is there any other modification
for lucene 4?

       decoder    \ avg time     single word(ms)          and
query(ms)     or query(ms)
  VINT in lucene 2.9                   11.2
36.5                 38.6
  VINT in lucene 4 branch           10.6
26.5                 35.4
  PFor in lucene 4 branch             8.1
22.5                 30.7
2010/12/21 Li Li <fa...@gmail.com>:
>> OK we should have a look at that one still.  We need to converge on a
>> good default codec for 4.0.  Fortunately it's trivial to take any int
>> block encoder (fixed or variable block) and make a Lucene codec out of
>> it!
>
> I suggests you not to use this one, I fixed dozens of bugs but it
> still failed when with random tests. it's codes is hand coded rather
> than generated by program. But we may learn something from it.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

> OK we should have a look at that one still.  We need to converge on a
> good default codec for 4.0.  Fortunately it's trivial to take any int
> block encoder (fixed or variable block) and make a Lucene codec out of
> it!

I suggests you not to use this one, I fixed dozens of bugs but it
still failed when with random tests. it's codes is hand coded rather
than generated by program. But we may learn something from it.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Dec 20, 2010 at 5:49 AM, Li Li <fa...@gmail.com> wrote:
>   I think random test is not sufficient.
>   for normal situation, some branches are not executed. I tested
> http://code.google.com/p/integer-array-compress-kit/ with many random
> int arrays and it works. But when I use it in real indexing, when in
> optimize stage, it corrupted.
>  Because PForDelta will choose best numFrameBits and some bit such as
> 31 is hardly generated by random arrays. So I "force" the encoder to
> choose all possible numFrameBits to test all the decode1 ...decode32
> and find some bugs in it.

Good point -- we need to make sure we cover all numFrameBits.  And a
series of 128 random ints in a row will heavily bias for the high num
bits cases.  Maybe if we doing a better job w/ the random source to
try to target all numBits, w/ varying numbers of exceptions, etc...
I'll put a nocommit for this.

>    what's pfor2? using s9/s16 to encode exception and offset?

Yeah I just committed pfor2 this morning on the bulk branch.  You can
check it out from
https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

pfor2 came from the patch attached on
https://issues.apache.org/jira/browse/LUCENE-1410 by Hao Yan
(thanks!). It uses s16 for the exceptions (though, there's a bug
somewhere, because it fails the random test), and it takes a different
approachy for encoding exceptions.

>    In http://code.google.com/p/integer-array-compress-kit/ it's s9
> for NewPForDelta also have many bugs and also need test each branch to
> ensure it works well.

OK we should have a look at that one still.  We need to converge on a
good default codec for 4.0.  Fortunately it's trivial to take any int
block encoder (fixed or variable block) and make a Lucene codec out of
it!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

   I think random test is not sufficient.
   for normal situation, some branches are not executed. I tested
http://code.google.com/p/integer-array-compress-kit/ with many random
int arrays and it works. But when I use it in real indexing, when in
optimize stage, it corrupted.
  Because PForDelta will choose best numFrameBits and some bit such as
31 is hardly generated by random arrays. So I "force" the encoder to
choose all possible numFrameBits to test all the decode1 ...decode32
and find some bugs in it.
    what's pfor2? using s9/s16 to encode exception and offset?
    In http://code.google.com/p/integer-array-compress-kit/ it's s9
for NewPForDelta also have many bugs and also need test each branch to
ensure it works well.

2010/12/20 Michael McCandless <lu...@mikemccandless.com>:
> It's autogen'd from the Python script in that dir (gendecompress),
> but, we are actively experimenting with the numerous ways to feed it
> data from the file, to see what the JVM can most efficiently execute.
>
> For testing, we need better coverage here.  But I have an initial
> "encode random ints" test that I'm about to commit to the bulkpostings
> branch... the pfor1 impl passes it, but pfor2 doesn't yet (I think
> maybe because Simple16 can't handle ints >= 2^28?).
>
> Mike
>
> On Sun, Dec 19, 2010 at 10:06 PM, Li Li <fa...@gmail.com> wrote:
>> is ForDecompressImpl generated by codes or manully coded?
>> I am frustrated by
>> http://code.google.com/p/integer-array-compress-kit/ which contains
>> too many bugs( I fixed more than 20 but there still existed bugs)
>> Because decoder has too many branches and in normal situation, some
>> branches seldom  occurs .
>>
>> these decoder implemented in branch assume blockSize==128, it has less
>> branches than decoder which support arbitrary blockSize.
>> How do you test these decoder to ensure every branch is tested?
>>
>> 2010/12/16 Michael McCandless <lu...@mikemccandless.com>:
>>> On the bulkpostings branch you can do something like this:
>>>
>>>  CodecProvider cp = new CodecProvider();
>>>  cp.register(new PatchedFrameOfRefCodec());
>>>  cp.setDefaultFieldCodec("PatchedFrameOfRef");
>>>
>>> Then whenever you create an IW or IR, use the advanced method that
>>> accepts a CodecProvider.  Then the index will always use PForDelta to
>>> write/read.
>>>
>>> I suspect conjunction queries got faster because we no longer skip if
>>> the docID we seek is already in the current buffer (currently sized
>>> 64).  Ie, skip is very costly when the target isn't far.  This was
>>> sort of an accidental byproduct of forcing even conjunction queries
>>> using Standard (vInt) codec to work on block buffers, but I think it's
>>> an important opto that we should more generally apply.
>>>
>>> Skipping for block codecs and Standard/vInt are done w/ the same class
>>> now.  It's just that the block codec must store the long filePointer
>>> where the block starts *and* the int offset into the block, vs
>>> Standard codec that just stores a filePointer.
>>>
>>> On "how do we implement bulk read" this is the core change on the
>>> bulkpostings branch -- we have a new API to separately bulk-read
>>> docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
>>> improving this (and getting to a clean PFor/For impl) now...
>>>
>>> Mike
>>>
>>> On Thu, Dec 16, 2010 at 4:29 AM, Li Li <fa...@gmail.com> wrote:
>>>> hi Michael,
>>>>   lucene 4 has so much changes that I don't know how to index and
>>>> search with specified codec. could you please give me some code
>>>> snipplets that using PFor codec so I can trace the codes.
>>>>   in you blog http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
>>>>   you said "The AND query, curiously, got faster; I think this is
>>>> because I modified its scoring to first try seeking within the block
>>>> of decoded ints".
>>>>   I am also curious about the result because VINT only need decode
>>>> part of the doclist while PFor need decode the whole block. But I
>>>> think with conjuction queries, the main time is used for searching in
>>>> skiplist. I haven't read your codes yet. But I guess the skiplist for
>>>> VINT and the skiplist for PFor is different.
>>>>   e.g.   lucene 2.9's default skipInterval is 16, so it like
>>>>   level 1                                               256
>>>>   level 0  16 32 48 64 80 96 112 128 ...   256
>>>>   when need skipTo(60) we need read 0 16 32 48 64 in level0
>>>>   but when use block, e.g. block size is 128, my implementation of skiplist is
>>>>   level 1       256
>>>>   level 0 128 256
>>>>   when skipTo(60) we only read 2 item in level0 and decode the first
>>>> block which contains 128 docIDs
>>>>
>>>>   How do you implement bulk read?
>>>>   I did like this: I decode a block and cache it in a int array. I
>>>> think I can buffer up to 100K docIDs and tfs for disjuction queries(it
>>>> cost less than 1MB memory for each term)
>>>>   SegmentTermDocs.read(final int[] docs, final int[] freqs)
>>>>           ...........
>>>>                        while (i < length && count < df) {
>>>>                                if (curBlockIdx >= curBlockSize) { //this condition is often
>>>> false, we may optimize it. but JVM hotspots will cache "hot" codes. So
>>>> ...
>>>>                                        int idBlockBytes = freqStream.readVInt();
>>>>                                        curBlockIdx = 0;
>>>>                                        for (int k = 0; k < idBlockBytes; k++) {
>>>>                                                buffer[k] = freqStream.readInt();
>>>>                                        }
>>>>
>>>>                                        blockIds = code.decode(buffer,idBlockBytes);
>>>>                                        curBlockSize = blockIds.length;
>>>>
>>>>                                        int tfBlockBytes = freqStream.readVInt();
>>>>                                        for (int k = 0; k < tfBlockBytes; k++) {
>>>>                                                buffer[k] = freqStream.readInt();
>>>>                                        }
>>>>                                        blockTfs = code.decode(buffer, tfBlockBytes);
>>>>                                        assert curBlockSize == decoded.length;
>>>>
>>>>                                }
>>>>                                freq = blockTfs[curBlockIdx];
>>>>                                doc += blockIds[curBlockIdx++];
>>>>
>>>>                                count++;
>>>>
>>>>                                if (deletedDocs == null || !deletedDocs.get(doc)) {
>>>>                                        docs[i] = doc;
>>>>                                        freqs[i] = freq;
>>>>                                        ++i;
>>>>                                }
>>>>                        }
>>>>
>>>>
>>>>
>>>> 2010/12/15 Michael McCandless <lu...@mikemccandless.com>:
>>>>> Hi Li Li,
>>>>>
>>>>> That issue has such a big patch, and enough of us are now iterating on
>>>>> it, that we cut a dedicated branch for it.
>>>>>
>>>>> But note that this branch is off of trunk (to be 4.0).
>>>>>
>>>>> You should be able to do this:
>>>>>
>>>>>  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>>>>>
>>>>> And then run things in there.  I just committed FOR/PFOR prototype
>>>>> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
>>>>> using those codecs by running "ant test
>>>>> -Dtests.codec=PatchedFrameOfRef".
>>>>>
>>>>> Please post patches back if you improve things!  We need all the help
>>>>> we can get :)
>>>>>
>>>>> Mike
>>>>>
>>>>> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fa...@gmail.com> wrote:
>>>>>> hi Michael
>>>>>>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>>>>>>    I am not familiar with patch. do I need download
>>>>>> LUCENE-2723.patch(there are many patches after this name, do I need
>>>>>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>>>>>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>>>>>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>>>>>
>>>>>>
>>>>>> 2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
>>>>>>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>>>>>>
>>>>>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>>>>>
>>>>>>> I've also found that running -Xbatch produces more consistent results
>>>>>>> from run to run, however, those results may not be as fast as running
>>>>>>> w/o -Xbatch.
>>>>>>>
>>>>>>> Also, it's better to test on actual data (ie a Lucene index's
>>>>>>> postings), and in the full context of searching, because then we get a
>>>>>>> sense of what speedups a real app will see... micro-benching is nearly
>>>>>>> impossible in Java since Hotspot acts very differently vs the "real"
>>>>>>> test.
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>>>>>>>> Hi
>>>>>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>>>>>>   I use the implementation in
>>>>>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>>>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>>>>>> decode small number of integer arrays.
>>>>>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>>>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>>>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>>>>>> as 10000, it's faster than s9 and vint.
>>>>>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>>>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>>>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>>>>>> faster
>>>>>>>>   e.g.
>>>>>>>>                ct.testNewPFDCodes(list);
>>>>>>>>                ct.testPFor(list);
>>>>>>>>                ct.testVInt(list);
>>>>>>>>                ct.testS9(list);
>>>>>>>>
>>>>>>>> NewPFD decode: 3614705
>>>>>>>> PForDelta decode: 17320
>>>>>>>> VINT decode: 16483
>>>>>>>> S9 decode: 19835
>>>>>>>> when I call by the following sequence
>>>>>>>>
>>>>>>>>                ct.testPFor(list);
>>>>>>>>                ct.testNewPFDCodes(list);
>>>>>>>>                ct.testVInt(list);
>>>>>>>>                ct.testS9(list);
>>>>>>>>
>>>>>>>> PForDelta decode: 3212140
>>>>>>>> NewPFD decode: 19556
>>>>>>>> VINT decode: 16762
>>>>>>>> S9 decode: 16483
>>>>>>>>
>>>>>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>>>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>>>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

It's autogen'd from the Python script in that dir (gendecompress),
but, we are actively experimenting with the numerous ways to feed it
data from the file, to see what the JVM can most efficiently execute.

For testing, we need better coverage here.  But I have an initial
"encode random ints" test that I'm about to commit to the bulkpostings
branch... the pfor1 impl passes it, but pfor2 doesn't yet (I think
maybe because Simple16 can't handle ints >= 2^28?).

Mike

On Sun, Dec 19, 2010 at 10:06 PM, Li Li <fa...@gmail.com> wrote:
> is ForDecompressImpl generated by codes or manully coded?
> I am frustrated by
> http://code.google.com/p/integer-array-compress-kit/ which contains
> too many bugs( I fixed more than 20 but there still existed bugs)
> Because decoder has too many branches and in normal situation, some
> branches seldom  occurs .
>
> these decoder implemented in branch assume blockSize==128, it has less
> branches than decoder which support arbitrary blockSize.
> How do you test these decoder to ensure every branch is tested?
>
> 2010/12/16 Michael McCandless <lu...@mikemccandless.com>:
>> On the bulkpostings branch you can do something like this:
>>
>>  CodecProvider cp = new CodecProvider();
>>  cp.register(new PatchedFrameOfRefCodec());
>>  cp.setDefaultFieldCodec("PatchedFrameOfRef");
>>
>> Then whenever you create an IW or IR, use the advanced method that
>> accepts a CodecProvider.  Then the index will always use PForDelta to
>> write/read.
>>
>> I suspect conjunction queries got faster because we no longer skip if
>> the docID we seek is already in the current buffer (currently sized
>> 64).  Ie, skip is very costly when the target isn't far.  This was
>> sort of an accidental byproduct of forcing even conjunction queries
>> using Standard (vInt) codec to work on block buffers, but I think it's
>> an important opto that we should more generally apply.
>>
>> Skipping for block codecs and Standard/vInt are done w/ the same class
>> now.  It's just that the block codec must store the long filePointer
>> where the block starts *and* the int offset into the block, vs
>> Standard codec that just stores a filePointer.
>>
>> On "how do we implement bulk read" this is the core change on the
>> bulkpostings branch -- we have a new API to separately bulk-read
>> docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
>> improving this (and getting to a clean PFor/For impl) now...
>>
>> Mike
>>
>> On Thu, Dec 16, 2010 at 4:29 AM, Li Li <fa...@gmail.com> wrote:
>>> hi Michael,
>>>   lucene 4 has so much changes that I don't know how to index and
>>> search with specified codec. could you please give me some code
>>> snipplets that using PFor codec so I can trace the codes.
>>>   in you blog http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
>>>   you said "The AND query, curiously, got faster; I think this is
>>> because I modified its scoring to first try seeking within the block
>>> of decoded ints".
>>>   I am also curious about the result because VINT only need decode
>>> part of the doclist while PFor need decode the whole block. But I
>>> think with conjuction queries, the main time is used for searching in
>>> skiplist. I haven't read your codes yet. But I guess the skiplist for
>>> VINT and the skiplist for PFor is different.
>>>   e.g.   lucene 2.9's default skipInterval is 16, so it like
>>>   level 1                                               256
>>>   level 0  16 32 48 64 80 96 112 128 ...   256
>>>   when need skipTo(60) we need read 0 16 32 48 64 in level0
>>>   but when use block, e.g. block size is 128, my implementation of skiplist is
>>>   level 1       256
>>>   level 0 128 256
>>>   when skipTo(60) we only read 2 item in level0 and decode the first
>>> block which contains 128 docIDs
>>>
>>>   How do you implement bulk read?
>>>   I did like this: I decode a block and cache it in a int array. I
>>> think I can buffer up to 100K docIDs and tfs for disjuction queries(it
>>> cost less than 1MB memory for each term)
>>>   SegmentTermDocs.read(final int[] docs, final int[] freqs)
>>>           ...........
>>>                        while (i < length && count < df) {
>>>                                if (curBlockIdx >= curBlockSize) { //this condition is often
>>> false, we may optimize it. but JVM hotspots will cache "hot" codes. So
>>> ...
>>>                                        int idBlockBytes = freqStream.readVInt();
>>>                                        curBlockIdx = 0;
>>>                                        for (int k = 0; k < idBlockBytes; k++) {
>>>                                                buffer[k] = freqStream.readInt();
>>>                                        }
>>>
>>>                                        blockIds = code.decode(buffer,idBlockBytes);
>>>                                        curBlockSize = blockIds.length;
>>>
>>>                                        int tfBlockBytes = freqStream.readVInt();
>>>                                        for (int k = 0; k < tfBlockBytes; k++) {
>>>                                                buffer[k] = freqStream.readInt();
>>>                                        }
>>>                                        blockTfs = code.decode(buffer, tfBlockBytes);
>>>                                        assert curBlockSize == decoded.length;
>>>
>>>                                }
>>>                                freq = blockTfs[curBlockIdx];
>>>                                doc += blockIds[curBlockIdx++];
>>>
>>>                                count++;
>>>
>>>                                if (deletedDocs == null || !deletedDocs.get(doc)) {
>>>                                        docs[i] = doc;
>>>                                        freqs[i] = freq;
>>>                                        ++i;
>>>                                }
>>>                        }
>>>
>>>
>>>
>>> 2010/12/15 Michael McCandless <lu...@mikemccandless.com>:
>>>> Hi Li Li,
>>>>
>>>> That issue has such a big patch, and enough of us are now iterating on
>>>> it, that we cut a dedicated branch for it.
>>>>
>>>> But note that this branch is off of trunk (to be 4.0).
>>>>
>>>> You should be able to do this:
>>>>
>>>>  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>>>>
>>>> And then run things in there.  I just committed FOR/PFOR prototype
>>>> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
>>>> using those codecs by running "ant test
>>>> -Dtests.codec=PatchedFrameOfRef".
>>>>
>>>> Please post patches back if you improve things!  We need all the help
>>>> we can get :)
>>>>
>>>> Mike
>>>>
>>>> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fa...@gmail.com> wrote:
>>>>> hi Michael
>>>>>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>>>>>    I am not familiar with patch. do I need download
>>>>> LUCENE-2723.patch(there are many patches after this name, do I need
>>>>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>>>>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>>>>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>>>>
>>>>>
>>>>> 2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
>>>>>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>>>>>
>>>>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>>>>
>>>>>> I've also found that running -Xbatch produces more consistent results
>>>>>> from run to run, however, those results may not be as fast as running
>>>>>> w/o -Xbatch.
>>>>>>
>>>>>> Also, it's better to test on actual data (ie a Lucene index's
>>>>>> postings), and in the full context of searching, because then we get a
>>>>>> sense of what speedups a real app will see... micro-benching is nearly
>>>>>> impossible in Java since Hotspot acts very differently vs the "real"
>>>>>> test.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>>>>>>> Hi
>>>>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>>>>>   I use the implementation in
>>>>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>>>>> decode small number of integer arrays.
>>>>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>>>>> as 10000, it's faster than s9 and vint.
>>>>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>>>>> faster
>>>>>>>   e.g.
>>>>>>>                ct.testNewPFDCodes(list);
>>>>>>>                ct.testPFor(list);
>>>>>>>                ct.testVInt(list);
>>>>>>>                ct.testS9(list);
>>>>>>>
>>>>>>> NewPFD decode: 3614705
>>>>>>> PForDelta decode: 17320
>>>>>>> VINT decode: 16483
>>>>>>> S9 decode: 19835
>>>>>>> when I call by the following sequence
>>>>>>>
>>>>>>>                ct.testPFor(list);
>>>>>>>                ct.testNewPFDCodes(list);
>>>>>>>                ct.testVInt(list);
>>>>>>>                ct.testS9(list);
>>>>>>>
>>>>>>> PForDelta decode: 3212140
>>>>>>> NewPFD decode: 19556
>>>>>>> VINT decode: 16762
>>>>>>> S9 decode: 16483
>>>>>>>
>>>>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

is ForDecompressImpl generated by codes or manully coded?
I am frustrated by
http://code.google.com/p/integer-array-compress-kit/ which contains
too many bugs( I fixed more than 20 but there still existed bugs)
Because decoder has too many branches and in normal situation, some
branches seldom  occurs .

these decoder implemented in branch assume blockSize==128, it has less
branches than decoder which support arbitrary blockSize.
How do you test these decoder to ensure every branch is tested?

2010/12/16 Michael McCandless <lu...@mikemccandless.com>:
> On the bulkpostings branch you can do something like this:
>
>  CodecProvider cp = new CodecProvider();
>  cp.register(new PatchedFrameOfRefCodec());
>  cp.setDefaultFieldCodec("PatchedFrameOfRef");
>
> Then whenever you create an IW or IR, use the advanced method that
> accepts a CodecProvider.  Then the index will always use PForDelta to
> write/read.
>
> I suspect conjunction queries got faster because we no longer skip if
> the docID we seek is already in the current buffer (currently sized
> 64).  Ie, skip is very costly when the target isn't far.  This was
> sort of an accidental byproduct of forcing even conjunction queries
> using Standard (vInt) codec to work on block buffers, but I think it's
> an important opto that we should more generally apply.
>
> Skipping for block codecs and Standard/vInt are done w/ the same class
> now.  It's just that the block codec must store the long filePointer
> where the block starts *and* the int offset into the block, vs
> Standard codec that just stores a filePointer.
>
> On "how do we implement bulk read" this is the core change on the
> bulkpostings branch -- we have a new API to separately bulk-read
> docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
> improving this (and getting to a clean PFor/For impl) now...
>
> Mike
>
> On Thu, Dec 16, 2010 at 4:29 AM, Li Li <fa...@gmail.com> wrote:
>> hi Michael,
>>   lucene 4 has so much changes that I don't know how to index and
>> search with specified codec. could you please give me some code
>> snipplets that using PFor codec so I can trace the codes.
>>   in you blog http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
>>   you said "The AND query, curiously, got faster; I think this is
>> because I modified its scoring to first try seeking within the block
>> of decoded ints".
>>   I am also curious about the result because VINT only need decode
>> part of the doclist while PFor need decode the whole block. But I
>> think with conjuction queries, the main time is used for searching in
>> skiplist. I haven't read your codes yet. But I guess the skiplist for
>> VINT and the skiplist for PFor is different.
>>   e.g.   lucene 2.9's default skipInterval is 16, so it like
>>   level 1                                               256
>>   level 0  16 32 48 64 80 96 112 128 ...   256
>>   when need skipTo(60) we need read 0 16 32 48 64 in level0
>>   but when use block, e.g. block size is 128, my implementation of skiplist is
>>   level 1       256
>>   level 0 128 256
>>   when skipTo(60) we only read 2 item in level0 and decode the first
>> block which contains 128 docIDs
>>
>>   How do you implement bulk read?
>>   I did like this: I decode a block and cache it in a int array. I
>> think I can buffer up to 100K docIDs and tfs for disjuction queries(it
>> cost less than 1MB memory for each term)
>>   SegmentTermDocs.read(final int[] docs, final int[] freqs)
>>           ...........
>>                        while (i < length && count < df) {
>>                                if (curBlockIdx >= curBlockSize) { //this condition is often
>> false, we may optimize it. but JVM hotspots will cache "hot" codes. So
>> ...
>>                                        int idBlockBytes = freqStream.readVInt();
>>                                        curBlockIdx = 0;
>>                                        for (int k = 0; k < idBlockBytes; k++) {
>>                                                buffer[k] = freqStream.readInt();
>>                                        }
>>
>>                                        blockIds = code.decode(buffer,idBlockBytes);
>>                                        curBlockSize = blockIds.length;
>>
>>                                        int tfBlockBytes = freqStream.readVInt();
>>                                        for (int k = 0; k < tfBlockBytes; k++) {
>>                                                buffer[k] = freqStream.readInt();
>>                                        }
>>                                        blockTfs = code.decode(buffer, tfBlockBytes);
>>                                        assert curBlockSize == decoded.length;
>>
>>                                }
>>                                freq = blockTfs[curBlockIdx];
>>                                doc += blockIds[curBlockIdx++];
>>
>>                                count++;
>>
>>                                if (deletedDocs == null || !deletedDocs.get(doc)) {
>>                                        docs[i] = doc;
>>                                        freqs[i] = freq;
>>                                        ++i;
>>                                }
>>                        }
>>
>>
>>
>> 2010/12/15 Michael McCandless <lu...@mikemccandless.com>:
>>> Hi Li Li,
>>>
>>> That issue has such a big patch, and enough of us are now iterating on
>>> it, that we cut a dedicated branch for it.
>>>
>>> But note that this branch is off of trunk (to be 4.0).
>>>
>>> You should be able to do this:
>>>
>>>  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>>>
>>> And then run things in there.  I just committed FOR/PFOR prototype
>>> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
>>> using those codecs by running "ant test
>>> -Dtests.codec=PatchedFrameOfRef".
>>>
>>> Please post patches back if you improve things!  We need all the help
>>> we can get :)
>>>
>>> Mike
>>>
>>> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fa...@gmail.com> wrote:
>>>> hi Michael
>>>>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>>>>    I am not familiar with patch. do I need download
>>>> LUCENE-2723.patch(there are many patches after this name, do I need
>>>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>>>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>>>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>>>
>>>>
>>>> 2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
>>>>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>>>>
>>>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>>>
>>>>> I've also found that running -Xbatch produces more consistent results
>>>>> from run to run, however, those results may not be as fast as running
>>>>> w/o -Xbatch.
>>>>>
>>>>> Also, it's better to test on actual data (ie a Lucene index's
>>>>> postings), and in the full context of searching, because then we get a
>>>>> sense of what speedups a real app will see... micro-benching is nearly
>>>>> impossible in Java since Hotspot acts very differently vs the "real"
>>>>> test.
>>>>>
>>>>> Mike
>>>>>
>>>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>>>>>> Hi
>>>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>>>>   I use the implementation in
>>>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>>>> decode small number of integer arrays.
>>>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>>>> as 10000, it's faster than s9 and vint.
>>>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>>>> faster
>>>>>>   e.g.
>>>>>>                ct.testNewPFDCodes(list);
>>>>>>                ct.testPFor(list);
>>>>>>                ct.testVInt(list);
>>>>>>                ct.testS9(list);
>>>>>>
>>>>>> NewPFD decode: 3614705
>>>>>> PForDelta decode: 17320
>>>>>> VINT decode: 16483
>>>>>> S9 decode: 19835
>>>>>> when I call by the following sequence
>>>>>>
>>>>>>                ct.testPFor(list);
>>>>>>                ct.testNewPFDCodes(list);
>>>>>>                ct.testVInt(list);
>>>>>>                ct.testS9(list);
>>>>>>
>>>>>> PForDelta decode: 3212140
>>>>>> NewPFD decode: 19556
>>>>>> VINT decode: 16762
>>>>>> S9 decode: 16483
>>>>>>
>>>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

On the bulkpostings branch you can do something like this:

  CodecProvider cp = new CodecProvider();
  cp.register(new PatchedFrameOfRefCodec());
  cp.setDefaultFieldCodec("PatchedFrameOfRef");

Then whenever you create an IW or IR, use the advanced method that
accepts a CodecProvider.  Then the index will always use PForDelta to
write/read.

I suspect conjunction queries got faster because we no longer skip if
the docID we seek is already in the current buffer (currently sized
64).  Ie, skip is very costly when the target isn't far.  This was
sort of an accidental byproduct of forcing even conjunction queries
using Standard (vInt) codec to work on block buffers, but I think it's
an important opto that we should more generally apply.

Skipping for block codecs and Standard/vInt are done w/ the same class
now.  It's just that the block codec must store the long filePointer
where the block starts *and* the int offset into the block, vs
Standard codec that just stores a filePointer.

On "how do we implement bulk read" this is the core change on the
bulkpostings branch -- we have a new API to separately bulk-read
docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
improving this (and getting to a clean PFor/For impl) now...

Mike

On Thu, Dec 16, 2010 at 4:29 AM, Li Li <fa...@gmail.com> wrote:
> hi Michael,
>   lucene 4 has so much changes that I don't know how to index and
> search with specified codec. could you please give me some code
> snipplets that using PFor codec so I can trace the codes.
>   in you blog http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
>   you said "The AND query, curiously, got faster; I think this is
> because I modified its scoring to first try seeking within the block
> of decoded ints".
>   I am also curious about the result because VINT only need decode
> part of the doclist while PFor need decode the whole block. But I
> think with conjuction queries, the main time is used for searching in
> skiplist. I haven't read your codes yet. But I guess the skiplist for
> VINT and the skiplist for PFor is different.
>   e.g.   lucene 2.9's default skipInterval is 16, so it like
>   level 1                                               256
>   level 0  16 32 48 64 80 96 112 128 ...   256
>   when need skipTo(60) we need read 0 16 32 48 64 in level0
>   but when use block, e.g. block size is 128, my implementation of skiplist is
>   level 1       256
>   level 0 128 256
>   when skipTo(60) we only read 2 item in level0 and decode the first
> block which contains 128 docIDs
>
>   How do you implement bulk read?
>   I did like this: I decode a block and cache it in a int array. I
> think I can buffer up to 100K docIDs and tfs for disjuction queries(it
> cost less than 1MB memory for each term)
>   SegmentTermDocs.read(final int[] docs, final int[] freqs)
>           ...........
>                        while (i < length && count < df) {
>                                if (curBlockIdx >= curBlockSize) { //this condition is often
> false, we may optimize it. but JVM hotspots will cache "hot" codes. So
> ...
>                                        int idBlockBytes = freqStream.readVInt();
>                                        curBlockIdx = 0;
>                                        for (int k = 0; k < idBlockBytes; k++) {
>                                                buffer[k] = freqStream.readInt();
>                                        }
>
>                                        blockIds = code.decode(buffer,idBlockBytes);
>                                        curBlockSize = blockIds.length;
>
>                                        int tfBlockBytes = freqStream.readVInt();
>                                        for (int k = 0; k < tfBlockBytes; k++) {
>                                                buffer[k] = freqStream.readInt();
>                                        }
>                                        blockTfs = code.decode(buffer, tfBlockBytes);
>                                        assert curBlockSize == decoded.length;
>
>                                }
>                                freq = blockTfs[curBlockIdx];
>                                doc += blockIds[curBlockIdx++];
>
>                                count++;
>
>                                if (deletedDocs == null || !deletedDocs.get(doc)) {
>                                        docs[i] = doc;
>                                        freqs[i] = freq;
>                                        ++i;
>                                }
>                        }
>
>
>
> 2010/12/15 Michael McCandless <lu...@mikemccandless.com>:
>> Hi Li Li,
>>
>> That issue has such a big patch, and enough of us are now iterating on
>> it, that we cut a dedicated branch for it.
>>
>> But note that this branch is off of trunk (to be 4.0).
>>
>> You should be able to do this:
>>
>>  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>>
>> And then run things in there.  I just committed FOR/PFOR prototype
>> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
>> using those codecs by running "ant test
>> -Dtests.codec=PatchedFrameOfRef".
>>
>> Please post patches back if you improve things!  We need all the help
>> we can get :)
>>
>> Mike
>>
>> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fa...@gmail.com> wrote:
>>> hi Michael
>>>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>>>    I am not familiar with patch. do I need download
>>> LUCENE-2723.patch(there are many patches after this name, do I need
>>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>>
>>>
>>> 2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
>>>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>>>
>>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>>
>>>> I've also found that running -Xbatch produces more consistent results
>>>> from run to run, however, those results may not be as fast as running
>>>> w/o -Xbatch.
>>>>
>>>> Also, it's better to test on actual data (ie a Lucene index's
>>>> postings), and in the full context of searching, because then we get a
>>>> sense of what speedups a real app will see... micro-benching is nearly
>>>> impossible in Java since Hotspot acts very differently vs the "real"
>>>> test.
>>>>
>>>> Mike
>>>>
>>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>>>>> Hi
>>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>>>   I use the implementation in
>>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>>> decode small number of integer arrays.
>>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>>> as 10000, it's faster than s9 and vint.
>>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>>> faster
>>>>>   e.g.
>>>>>                ct.testNewPFDCodes(list);
>>>>>                ct.testPFor(list);
>>>>>                ct.testVInt(list);
>>>>>                ct.testS9(list);
>>>>>
>>>>> NewPFD decode: 3614705
>>>>> PForDelta decode: 17320
>>>>> VINT decode: 16483
>>>>> S9 decode: 19835
>>>>> when I call by the following sequence
>>>>>
>>>>>                ct.testPFor(list);
>>>>>                ct.testNewPFDCodes(list);
>>>>>                ct.testVInt(list);
>>>>>                ct.testS9(list);
>>>>>
>>>>> PForDelta decode: 3212140
>>>>> NewPFD decode: 19556
>>>>> VINT decode: 16762
>>>>> S9 decode: 16483
>>>>>
>>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

hi Michael,
   lucene 4 has so much changes that I don't know how to index and
search with specified codec. could you please give me some code
snipplets that using PFor codec so I can trace the codes.
   in you blog http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
   you said "The AND query, curiously, got faster; I think this is
because I modified its scoring to first try seeking within the block
of decoded ints".
   I am also curious about the result because VINT only need decode
part of the doclist while PFor need decode the whole block. But I
think with conjuction queries, the main time is used for searching in
skiplist. I haven't read your codes yet. But I guess the skiplist for
VINT and the skiplist for PFor is different.
   e.g.   lucene 2.9's default skipInterval is 16, so it like
   level 1                                               256
   level 0  16 32 48 64 80 96 112 128 ...   256
   when need skipTo(60) we need read 0 16 32 48 64 in level0
   but when use block, e.g. block size is 128, my implementation of skiplist is
   level 1       256
   level 0 128 256
   when skipTo(60) we only read 2 item in level0 and decode the first
block which contains 128 docIDs

   How do you implement bulk read?
   I did like this: I decode a block and cache it in a int array. I
think I can buffer up to 100K docIDs and tfs for disjuction queries(it
cost less than 1MB memory for each term)
   SegmentTermDocs.read(final int[] docs, final int[] freqs)
           ...........
			while (i < length && count < df) {
				if (curBlockIdx >= curBlockSize) { //this condition is often
false, we may optimize it. but JVM hotspots will cache "hot" codes. So
...
					int idBlockBytes = freqStream.readVInt();
					curBlockIdx = 0;
					for (int k = 0; k < idBlockBytes; k++) {
						buffer[k] = freqStream.readInt();
					}
					
					blockIds = code.decode(buffer,idBlockBytes);
					curBlockSize = blockIds.length;

					int tfBlockBytes = freqStream.readVInt();
					for (int k = 0; k < tfBlockBytes; k++) {
						buffer[k] = freqStream.readInt();
					}
					blockTfs = code.decode(buffer, tfBlockBytes);
					assert curBlockSize == decoded.length;			

				}
				freq = blockTfs[curBlockIdx];
				doc += blockIds[curBlockIdx++];

				count++;

				if (deletedDocs == null || !deletedDocs.get(doc)) {
					docs[i] = doc;
					freqs[i] = freq;
					++i;
				}
			}



2010/12/15 Michael McCandless <lu...@mikemccandless.com>:
> Hi Li Li,
>
> That issue has such a big patch, and enough of us are now iterating on
> it, that we cut a dedicated branch for it.
>
> But note that this branch is off of trunk (to be 4.0).
>
> You should be able to do this:
>
>  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings
>
> And then run things in there.  I just committed FOR/PFOR prototype
> codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
> using those codecs by running "ant test
> -Dtests.codec=PatchedFrameOfRef".
>
> Please post patches back if you improve things!  We need all the help
> we can get :)
>
> Mike
>
> On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fa...@gmail.com> wrote:
>> hi Michael
>>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>>    I am not familiar with patch. do I need download
>> LUCENE-2723.patch(there are many patches after this name, do I need
>> the latest one?) and LUCENE-2723_termscorer.patch and patch them
>> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
>> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>>
>>
>> 2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
>>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>>
>>> Ie, does your test first "warmup" the JRE and then do the real test?
>>>
>>> I've also found that running -Xbatch produces more consistent results
>>> from run to run, however, those results may not be as fast as running
>>> w/o -Xbatch.
>>>
>>> Also, it's better to test on actual data (ie a Lucene index's
>>> postings), and in the full context of searching, because then we get a
>>> sense of what speedups a real app will see... micro-benching is nearly
>>> impossible in Java since Hotspot acts very differently vs the "real"
>>> test.
>>>
>>> Mike
>>>
>>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>>>> Hi
>>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>>   I use the implementation in
>>>> http://code.google.com/p/integer-array-compress-kit/
>>>>   it implements a basic PForDelta algorithm and an improved one(which
>>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>>   But compare it with VInt and S9, it's speed is very slow when only
>>>> decode small number of integer arrays.
>>>>   e.g. when I decoded int[256] arrays which values are randomly
>>>> generated between 0 and 100, if decode just one array. PFor(or
>>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>>> as 10000, it's faster than s9 and vint.
>>>>   Another strange phenomena is that when call PFor decoder twice, the
>>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>>> faster
>>>>   e.g.
>>>>                ct.testNewPFDCodes(list);
>>>>                ct.testPFor(list);
>>>>                ct.testVInt(list);
>>>>                ct.testS9(list);
>>>>
>>>> NewPFD decode: 3614705
>>>> PForDelta decode: 17320
>>>> VINT decode: 16483
>>>> S9 decode: 19835
>>>> when I call by the following sequence
>>>>
>>>>                ct.testPFor(list);
>>>>                ct.testNewPFDCodes(list);
>>>>                ct.testVInt(list);
>>>>                ct.testS9(list);
>>>>
>>>> PForDelta decode: 3212140
>>>> NewPFD decode: 19556
>>>> VINT decode: 16762
>>>> S9 decode: 16483
>>>>
>>>>   My implementation is -- group docIDs and termDocFreqs into block
>>>> which contains 128 integers. when SegmentTermDocs's next method
>>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

Hi Li Li,

That issue has such a big patch, and enough of us are now iterating on
it, that we cut a dedicated branch for it.

But note that this branch is off of trunk (to be 4.0).

You should be able to do this:

  svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

And then run things in there.  I just committed FOR/PFOR prototype
codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
using those codecs by running "ant test
-Dtests.codec=PatchedFrameOfRef".

Please post patches back if you improve things!  We need all the help
we can get :)

Mike

On Wed, Dec 15, 2010 at 5:54 AM, Li Li <fa...@gmail.com> wrote:
> hi Michael
>    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
>    I am not familiar with patch. do I need download
> LUCENE-2723.patch(there are many patches after this name, do I need
> the latest one?) and LUCENE-2723_termscorer.patch and patch them
> (patch -p1 <LUCENE-2723.patch)? I just check out the latest source
> code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene
>
>
> 2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
>> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>>
>> Ie, does your test first "warmup" the JRE and then do the real test?
>>
>> I've also found that running -Xbatch produces more consistent results
>> from run to run, however, those results may not be as fast as running
>> w/o -Xbatch.
>>
>> Also, it's better to test on actual data (ie a Lucene index's
>> postings), and in the full context of searching, because then we get a
>> sense of what speedups a real app will see... micro-benching is nearly
>> impossible in Java since Hotspot acts very differently vs the "real"
>> test.
>>
>> Mike
>>
>> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>>> Hi
>>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>>   I use the implementation in
>>> http://code.google.com/p/integer-array-compress-kit/
>>>   it implements a basic PForDelta algorithm and an improved one(which
>>> called NewPForDelta, but there are many bugs and I have fixed them),
>>>   But compare it with VInt and S9, it's speed is very slow when only
>>> decode small number of integer arrays.
>>>   e.g. when I decoded int[256] arrays which values are randomly
>>> generated between 0 and 100, if decode just one array. PFor(or
>>> NewPFor) is very slow. when it continuously decodes many arrays such
>>> as 10000, it's faster than s9 and vint.
>>>   Another strange phenomena is that when call PFor decoder twice, the
>>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>>> is faster. reverse the call sequcence, the 2nd called decoder is
>>> faster
>>>   e.g.
>>>                ct.testNewPFDCodes(list);
>>>                ct.testPFor(list);
>>>                ct.testVInt(list);
>>>                ct.testS9(list);
>>>
>>> NewPFD decode: 3614705
>>> PForDelta decode: 17320
>>> VINT decode: 16483
>>> S9 decode: 19835
>>> when I call by the following sequence
>>>
>>>                ct.testPFor(list);
>>>                ct.testNewPFDCodes(list);
>>>                ct.testVInt(list);
>>>                ct.testS9(list);
>>>
>>> PForDelta decode: 3212140
>>> NewPFD decode: 19556
>>> VINT decode: 16762
>>> S9 decode: 16483
>>>
>>>   My implementation is -- group docIDs and termDocFreqs into block
>>> which contains 128 integers. when SegmentTermDocs's next method
>>> called(or read readNoTf).it decodes a block and save it to a cache.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Li Li <fa...@gmail.com>.

hi Michael
    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
    I am not familiar with patch. do I need download
LUCENE-2723.patch(there are many patches after this name, do I need
the latest one?) and LUCENE-2723_termscorer.patch and patch them
(patch -p1 <LUCENE-2723.patch)? I just check out the latest source
code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene


2010/12/14 Michael McCandless <lu...@mikemccandless.com>:
> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>
> Ie, does your test first "warmup" the JRE and then do the real test?
>
> I've also found that running -Xbatch produces more consistent results
> from run to run, however, those results may not be as fast as running
> w/o -Xbatch.
>
> Also, it's better to test on actual data (ie a Lucene index's
> postings), and in the full context of searching, because then we get a
> sense of what speedups a real app will see... micro-benching is nearly
> impossible in Java since Hotspot acts very differently vs the "real"
> test.
>
> Mike
>
> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
>> Hi
>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>   I use the implementation in
>> http://code.google.com/p/integer-array-compress-kit/
>>   it implements a basic PForDelta algorithm and an improved one(which
>> called NewPForDelta, but there are many bugs and I have fixed them),
>>   But compare it with VInt and S9, it's speed is very slow when only
>> decode small number of integer arrays.
>>   e.g. when I decoded int[256] arrays which values are randomly
>> generated between 0 and 100, if decode just one array. PFor(or
>> NewPFor) is very slow. when it continuously decodes many arrays such
>> as 10000, it's faster than s9 and vint.
>>   Another strange phenomena is that when call PFor decoder twice, the
>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>> is faster. reverse the call sequcence, the 2nd called decoder is
>> faster
>>   e.g.
>>                ct.testNewPFDCodes(list);
>>                ct.testPFor(list);
>>                ct.testVInt(list);
>>                ct.testS9(list);
>>
>> NewPFD decode: 3614705
>> PForDelta decode: 17320
>> VINT decode: 16483
>> S9 decode: 19835
>> when I call by the following sequence
>>
>>                ct.testPFor(list);
>>                ct.testNewPFDCodes(list);
>>                ct.testVInt(list);
>>                ct.testS9(list);
>>
>> PForDelta decode: 3212140
>> NewPFD decode: 19556
>> VINT decode: 16762
>> S9 decode: 16483
>>
>>   My implementation is -- group docIDs and termDocFreqs into block
>> which contains 128 integers. when SegmentTermDocs's next method
>> called(or read readNoTf).it decodes a block and save it to a cache.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: strange problem of PForDelta decoder

Posted by Michael McCandless <lu...@mikemccandless.com>.

Likely you are seeing the startup cost of hotspot compiling the PFOR code?

Ie, does your test first "warmup" the JRE and then do the real test?

I've also found that running -Xbatch produces more consistent results
from run to run, however, those results may not be as fast as running
w/o -Xbatch.

Also, it's better to test on actual data (ie a Lucene index's
postings), and in the full context of searching, because then we get a
sense of what speedups a real app will see... micro-benching is nearly
impossible in Java since Hotspot acts very differently vs the "real"
test.

Mike

On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fa...@gmail.com> wrote:
> Hi
>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>   I use the implementation in
> http://code.google.com/p/integer-array-compress-kit/
>   it implements a basic PForDelta algorithm and an improved one(which
> called NewPForDelta, but there are many bugs and I have fixed them),
>   But compare it with VInt and S9, it's speed is very slow when only
> decode small number of integer arrays.
>   e.g. when I decoded int[256] arrays which values are randomly
> generated between 0 and 100, if decode just one array. PFor(or
> NewPFor) is very slow. when it continuously decodes many arrays such
> as 10000, it's faster than s9 and vint.
>   Another strange phenomena is that when call PFor decoder twice, the
> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
> is faster. reverse the call sequcence, the 2nd called decoder is
> faster
>   e.g.
>                ct.testNewPFDCodes(list);
>                ct.testPFor(list);
>                ct.testVInt(list);
>                ct.testS9(list);
>
> NewPFD decode: 3614705
> PForDelta decode: 17320
> VINT decode: 16483
> S9 decode: 19835
> when I call by the following sequence
>
>                ct.testPFor(list);
>                ct.testNewPFDCodes(list);
>                ct.testVInt(list);
>                ct.testS9(list);
>
> PForDelta decode: 3212140
> NewPFD decode: 19556
> VINT decode: 16762
> S9 decode: 16483
>
>   My implementation is -- group docIDs and termDocFreqs into block
> which contains 128 integers. when SegmentTermDocs's next method
> called(or read readNoTf).it decodes a block and save it to a cache.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org