You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2010/07/29 07:05:19 UTC

[jira] Created: (LUCENE-2575) Concurrent byte and int block implementations

Concurrent byte and int block implementations
---------------------------------------------

                 Key: LUCENE-2575
                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
    Affects Versions: Realtime Branch
            Reporter: Jason Rutherglen
             Fix For: Realtime Branch


The current *BlockPool implementations aren't quite concurrent.
We really need something that has a locking flush method, where
flush is called at the end of adding a document. Once flushed,
the newly written data would be available to all other reading
threads (ie, postings etc). I'm not sure I understand the slices
concept, it seems like it'd be easier to implement a seekable
random access file like API. One'd seek to a given position,
then read or write from there. The underlying management of byte
arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-2575:
-------------------------------------

    Attachment: LUCENE-2575.patch

As per discussion, this patch removes byte block pool forwarding address rewrites by always allocating 4 bytes at the end of each slice.  newSlice has been replaced with newSliceByLevel because we were always calling this with the first level size.  TestByteSlices passes. 

With this working, we will not need to implement byte block copy-on-write.  Instead, a posting-upto per reader will be used.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916001#action_12916001 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

I guess another possible solution is to do away with interleaved slices altogether and simply allocate byte[]s per term and chain them together.  Then we would not need to worry about concurrency with slicing.  This would certainly make debugging easier however it'd add 8 bytes (for the object pointer) per term, somewhat negating the parallel array cutover.  Perhaps it's just a price we'd want to pay.  That and we'd probably still need a unique posting upto array per reader.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916853#action_12916853 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

bq. The posting-upto should stop the reader prior to reaching a byte element whose value is 0, ie, it should never happen.

OK but then we are making a full copy of postings upto (int per term) on every reopen?  Or will we try to make this copy-on-write as well?

So now we need copy-on-write per-term int for tf and for posting upto?  Anything else?

I fear a copy-on-write check per-term is going to be a sizable perf hit.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914631#action_12914631 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

Since we now have parallel arrays, we could also store the level byte in a separate parallel array, ie outside of the pool?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914630#action_12914630 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

bq. This issue is blocked because the change made to ByteBlockPool to add the level of the slice, to the beginning of the slice, moves all of the positions forward by one.

Hmm so does this waste that byte?  Ie when the next slice is allocated, today, we overwrite the level byte w/ the 4 bytes forwarding address.  Ie the level byte is only needed when the slice isn't full yet.

But if you move the level byte to the start, does that mean it's never re-used?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-2575:
-------------------------------------

    Attachment: LUCENE-2575.patch

Added a unit test for payloads, term vectors, and doc stores.  The reader flushes term vectors and doc stores on demand, once per reader.  Also, little things are getting cleaned up in the realtime branch.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909839#action_12909839 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

Is there a way to know the level of a slice given only the forwarding address/position?  It doesn't look like it.  Hmm... This could mean encoding the level or the size of the slice into the slice, which would elongate slices in general, I suppose though that the level index would only add one byte and that would be okay. 

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907069#action_12907069 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

bq. every term has its own open IndexOutput

I'm not seeing IndexOutput in use with the RAM buffer, do you
mean the the write* (writeVInt, writeBytes, writeByte) methods
of TermsHashPerField? 

Included in this patch will need to be a way to concurrently
grow other arrays such as ParallelPostingsArray. PPA is used to
store pointers to data stored in the block pools. Maybe we need
a class that concurrently manages growing arrays and block
pools. 

Or we may need to slightly re-architect how we're storing the
RAM buffer data so that concurrency can be guaranteed, ie, I
think we'll need to write to temporary arrays, which are then
flushed to primary readable arrays. The flush would occur after
adding a document, or probably for better efficiency, only when
getReader is called.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913383#action_12913383 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

This issue is blocked because the change made to ByteBlockPool to add the level of the slice, to the beginning of the slice, moves all of the positions forward by one.  This has caused TestByteSlices to fail an assertion.  I'm not sure if the test needs to be changed, or there's a bug in the new BBP implementation.  Either way it's a bit of a challenge to debug.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-2575:
-------------------------------------

    Attachment: LUCENE-2575.patch

This includes a basic implementation of the sorted term id based
term enum. We'll want to over-allocate the sorted term id array
so that future merges of new term ids will not require
allocating a new array for growth. I think overall the ram
buffer based searching will not require too much more of a RAM
outlay. The merging of new term ids could occur in a background
thread if we think it's expensive, however for now we can simply
merge them in on demand as new RAM readers are created.

Seek is implemented as a binary search of the sorted term ids.
If this is not efficient enough, we can implement a terms index
ala the current system.

For now the conversion from CSLM to sorted term id array can be
a percentage of the total number of terms, which I'll default to
10%. We may want to make this a function (eg, percentage) of RAM
consumption in the future.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909580#action_12909580 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

For the posting skip list we need to implement seek on the
ByteSliceReader. However if we're rewriting a portion of a
slice, then I guess we could have a problem... Meaning we'd be
storing an absolute position in the skip list, and we could go
to look up the value, however that byte(s) could have been
altered to not be delta encoded doc ids anymore, but instead
is/are the forwarding address to the next slice. 

Do we need an intelligent mechanism that interacts with the byte
slice writer to not point at byte array elements (ie the end of
slices) that could later be converted into forwarding addresses?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915130#action_12915130 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

Here are the new parallel arrays.  It seems like something went wrong and there are too many, however I think each is required.

{code}
final int[] skipStarts; // address where the term's skip list starts (for reading)
final int[] skipAddrs; // where writing left off
final int[] sliceAddrs; // the start addr of the last posting slice
final byte[] sliceLevels; // posting slice levels
final int[] skipLastDoc; // last skip doc written
final int[] skipLastAddr; // last skip addr written
{code}

In regards to writing into the skip list the start address of
the first level 9 posting slice: Because we're writing vints
into the posting slices, and vints may span more than 1 byte, we
may (and this has happened in testing) write a vint that spans
slices, so if we record the last slice address and read a vint
from that point, we'll get an incorrect vint. If we start 1+
bytes into a slice, we will not know where the slice ends
(because we are assuming they're 200 bytes in length). Perhaps
in the slice address parallel array we can somehow encode the
first slice's length, or add yet another parallel array for the
length of the first slice.  Something to think about.

We can't simply read
ahead 200 bytes (ie, level 9), nor can

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914860#action_12914860 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

Further thoughts on ref counting the byte[]s.  If we add a BytesRefCount (or some other similarly named class that I want to call BytesRef, though I can't use because that's taken), then I think adding 4 bytes for the int count variable, 8 bytes for the byte[] pointer, is 12 bytes total added to a 32k (ie, 32768 len) byte[] really too much?  I don't think so.  

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-2575:
-------------------------------------

    Attachment: LUCENE-2575.patch

Term frequency is recorded and returned.  There are Terms, TermsEnum, DocsEnum implementations.  Needs the term vectors, doc stores exposed via the RAM reader, concurrency unit tests, and a payload unit test.  Still quite rough.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915685#action_12915685 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

bq. A copy of the byte[][] refs is made when getReader is called.

Hmm why can't the reader just use the current byte[][]?  The writer only adds in new blocks to this array (doesn't overwrite the already written blocks, until flush)?  (And then allocates a new byte[][] once that array is full).

{quote}
I think the issue at the
moment is I'm using a boolean[] to signify if a byte[] needs to
be copied before being written to
{quote}
Hmm so we also copy-on-write a given byte[] block?  Is this because JMM can't make the guarantees we need about other threads reading the bytes written?

{quote}
I have a suspicion we'll change our minds about pooling byte[]s.
We may end up implementing ref counting anyways (as described
above), and the sudden garbage generated could be a massive
change for users?
{quote}

But even if we do reuse, we will cause tons of garbage, until the still-open readers are closed?  Ie we cannot re-use the byte[] being "held open" by any NRT reader that's still referencing the in-RAM segment after that segment had been flushed to disk.

Also the garbage shouldn't be that bad since each object is large.  It's not like 3.x's situation with FieldCache or terms dict index, for example....

I would start simple by dropping reuse.  We can then add it back if we see perf issues?

{quote}
Both very common types of queries, so we probably need some type
of skipping, which we will, it'll just be single-level.
{quote}
I would start simple, here, and make skipping stupid, ie just scan.  You can get everything working, all tests passing, etc., and then adding in skipping is much more isolated change.  You need all the isolation you can get here!  This stuff is *hairy*.

{quote}
As a side note, there is still an issue in my mind around the
term frequencies parallel array (introduced in these patches),
in that we'd need to make a copy of it for each reader (because
if it changes, the scoring model becomes inaccurate?).
{quote}

Hmm your'e right that each reader needs a private copy, to remain truly "point in time".  This (4 bytes per unique term X number of readers reading that term) is a non-trivial addition of RAM.

BTW I'm assuming IW will now be modal?  Ie caller must tell IW up front if NRT readers will be used?  Because non-NRT users shouldn't have to pay all this added RAM cost?


> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913403#action_12913403 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

A further question for this issue, in regards to copy-on-write
of the 1st dimension of the byte[][] array, will we want to keep
a count of references to the byte array, in the case of, lets
say multiple readers keeping references to each individual byte
array (the one with the bytes data). Assuming we will want to
continue to pool the byte[]s, I think we'll need to use
reference counting, or simply not pool the byte[]s after
flushing, in order to avoid overwriting of arrays.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916914#action_12916914 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

bq. I fear a copy-on-write check per-term is going to be a sizable perf hit.

For indexing?  The byte[] buffers are also using a page based system.  I think we'll need to measure the performance difference.  We can always shift the cost to getreader by copying from a writable (indexing based) tf array into a per-reader tf of paged-ints.  While this'd be a complete iteration the length of the terms, the CPU cache could make it extremely fast (because each page would be cached, and we'd be iterating sequentially over an array, methinks).

The other cost is the lookup of the upto when iterating the postings, however that'd be one time per term-docs instantiation, ie, negligible.  



> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914607#action_12914607 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

The current MultiLevelSkipList* system relies on writing out
fixed length skip list buffers before they are readable. This
obviously will not work for RT so I'm working on modifying MLSL
into new class(es) that writes and reads from the concurrent-ish
BBP. 

In trunk, each level is a RAMOutputStream, that'll nee to
changechange, and each level will likely be a stream keyed into
the BBP. A question is whether we will statically assign the
number of levels prior to the creation of the MLSL, or will we
need to somehow make the number of levels dynamic, in which case
using streams becomes slightly more complicated.



> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922243#action_12922243 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

The other unique thing implemented in the Twitter search as described by the shared slides, is each posting is a single int.  This makes it fairly simply to detect if a posting has been written because if it hasn't, it'll be 0 or some other pre-init'd value.  However given our postings contain multiple vints, payloads, and we have both freq and prox streams, I don't think we can properly detect while reading if a given posting has in fact been completely written.  We'd maybe need a posting verification check, like writing the posting to a buffer first, then writing the buffer with it's length at the beginning.   That's unnecessarily complex if system array copy is fast enough for copying between a write and read upto array.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-2575:
-------------------------------------

    Attachment: LUCENE-2575.patch

Here's a start at concurrency, the terms dictionary, and
iterating over doc ids. 

* It needs concurrency unit tests

* At an as yet undetermined interval, we need to conglomerate
the existing terms into a sorted int[] rather than continue to
use the ConcurrentSkipListMap, which consumes a far greater
amount of RAM. The tradeoff and reason for using the CSLM is the
level of concurrency gained by using it at the cost of greater
memory consumption when compared with the sorted int[] of term
ids.

* An int[] based term enum needs to be implemented. In addition,
a multi term enum, maybe there's one we can use, I'm not
familiar enough with the new flex code base.

* Copy on write is used to obtain a read-only version of the
ByteBlockPool and IntBlockPool. In the case of the byte blocks,
a boolean[] marks which elements need to be copied prior to
writing by the DocumentsWriterPerThread on byte slice forwarding
address rewrite.

* A write lock on each DWPT guarantees that as reference copies
are made, arrays being copied will not be altered in flight.
There shouldn't be an issue even though to get a complete
IndexReader[], we need to wait for each document to finish
flushing, we're not blocking indexing, only the obtaining of the
IRs. I can't see this being an issue for most use cases.

* Similarly, a reference is copied of the ParallelPostingsArray
(rather than a full copy) for use by the RAM Buffer based
IndexReader. It is OK for the PPA to be changed during future doc
adds, as the only the elements greater than the IRs max term id
will be altered, ie, we're not going to run into JMM thread
issues because the writing and read-only array reference copies
occur in a reentrant lock.

* Recycling of byte[]s becomes a bit more complex as RAM IRs will
likely hold references to them. When the RAM IR is closed, however,
the byte[]s can be recycled. The user could experience unusual
RAM usage spikes if IRs are not closed properly.



> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909852#action_12909852 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

In the following line of ByteBlockPool.allocSlice we're recording the slice level, however it's at the end of the slice rather than the beginning, which is where we'll need to write the level in order to implement slice seek.  I'm not immediately sure what's reading the level at this end position of the byte[].

{code}
buffer[byteUpto-1] = (byte) (16|newLevel);
{code}

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910123#action_12910123 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

bq. I'm not immediately sure what's reading the level at this end position of the byte[].

This is so that once we exhaust the slice and must allocate the next one we know what size (level + 1, ceiling'd) to make the next slice.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909771#action_12909771 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

Because of the way byte slices work, eg, they need to pre-know
the size of the slice before iterating on it, we can't simply
point to the middle of a slice and read without probably
iterating over the forwarding address.

It seems the skip list will need to point to the beginning of a
slice. This'll make the interval iteration in the RAM buffer
skip list writer a little more complicated than today in that
it'll need to store positions that are the start of byte slices.
In other words, the intervals will be slightly uneven at times.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916086#action_12916086 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

{quote}
Correct. The example of where everything could go wrong is the
rewriting of a byte slice forwarding address while a reader is
traversing the same slice.
{quote}

Ahh right that's a real issue.

{quote}
bq. It's not like 3.x's situation with FieldCache or terms dict index, for example....

What's the GC issue with FieldCache and terms dict?
{quote}

In 3.x, the string index FieldCache and the terms index generate tons
of garbage, ie allocate zillions of tiny objects.  (This is fixed in
4.0).

My only point was that having 32 KB arrays as garbage is much less GC
load than having the same net KB across zillions of tiny objects...

{quote}
There's
the term-freq parallel array, however if getReader is never
called, it's a single additional array that's essentially
innocuous, if useful.
{quote}

Hmm the full copy of the tf parallal array is going to put a highish
cost on reopen?  So some some of transactional (incremental
copy-on-write) data structure is needed (eg PagedInts)...

We don't store tf now do we?  Adding 4 bytes per unique term isn't
innocuous!

{quote}
OK, I think there's a solution to copying the actual byte[],
we'd need to alter the behavior of BBPs. It would require always
allocating 3 empty bytes at the end of a slice for the
forwarding address,
{quote}

Good idea -- this'd make the byte[] truly write-once.

This would really decrease RAM efficiency low-doc-freq (eg 1) terms,
though, because today they make use of those 3 bytes.  We'd need to
increase the level 0 slice size...

{quote}
The reason this would work is, past readers that are
iterating their term docs concurrently with the change to the
posting-upto array, will stop at the maxdoc anyways. This'll
be fun to implement.
{quote}

Hmm... but the reader needs to read 'beyond' the end of a given slice,
still?  Ie say global maxDoc is 42, and a given posting just read doc
27 (which in fact is its last doc).  It would then try to read the
next doc?

Oh, except, the next byte would be a 0 (because we always clear the
byte[]), which [I think] is never a valid byte value in the postings
stream, except as a first byte, which we would not hit here (since we
know we always have at least a first byte).  So maybe we can get by
w/o fully copy of postingUpto?


> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914802#action_12914802 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

{quote}
bq. Can we just have IW allocate a new byte[][] after flush?  So then any open readers can keep using the one they have?

This means the prior byte[]s will still be recycled after all
active previous flush readers are closed?
{quote}

Probably we should stop reusing the byte[] with this change?  So when all readers using a given byte[] are finally GCd, is when that byte[] is reclaimed.

{quote}
bq. it's possible single level skipping, with a larger skip interval, is fine for even large RAM buffers.

True, I'll implement a default of one level, and a default
large-ish skip interval.
{quote}

Well, I was thinking only implement the single-level skip case (since it ought to be alot simpler than the MLSLW/R)....

{quote}
How many scorers, or how often is skipping used? It's mostly for
disjunction queries?
{quote}

Actually, conjunction (AND) queries, and also PhraseQuery (which is really an AND query followed by positions checking).  One thing to remember is that skipping is *costly* (especially, the first time you use it) -- I think we over-use it today, ie, in many cases we should do a spin loop (.next()) instead, if your target "is not that far away".  PhraseQuery (the exact case) has a heuristic to do this, but really this ought to be implemented in the codec.

bq. get deletes working in the RT branch,

Do we have a design thought out for this?  The challenge is because every doc state now has its own private docID stream, we need a global sequence ID to track "when" a deletion arrived, to know whether or not that deletion applies to each docID, right?  (And, each added doc must also record the sequenceID when it was added).


> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915797#action_12915797 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

{quote}Hmm so we also copy-on-write a given byte[] block? Is
this because JMM can't make the guarantees we need about other
threads reading the bytes written?{quote}

Correct. The example of where everything could go wrong is the
rewriting of a byte slice forwarding address while a reader is
traversing the same slice. The forwarding address could be
half-written, and suddenly we're bowling in lane 6 when we
should be in lane 9. By making a [read-only] ref copy of the
byte[]s we're ensuring that the byte[]s are in a consistent
state while being read.

So I'm using a boolean[] to tell the writer whether it needs to
make a copy of the byte[]. The boolean[] also tells the writer
if it's already made a copy. Whereas in IndexReader.clone we're
keeping ref counts of the norms byte[], and decrementing each
time we make a copy until finally it's 0, and then we give it to
the GC (here we'd do the same or give it back to the allocator). 

{quote}But even if we do reuse, we will cause tons of garbage,
until the still-open readers are closed? Ie we cannot re-use the
byte[] being "held open" by any NRT reader that's still
referencing the in-RAM segment after that segment had been
flushed to disk.{quote}

If we do pool, it won't be very difficult to implement, we have
a single point of check-in/out of the byte[]s in the allocator
class.

In terms of the first implementation, by all means we should
minimize "tricky" areas of the code by not implementing skip
lists and byte[] pooling.

{quote}It's not like 3.x's situation with FieldCache or terms
dict index, for example....{quote}

What's the GC issue with FieldCache and terms dict?

{quote}BTW I'm assuming IW will now be modal? Ie caller must
tell IW up front if NRT readers will be used? Because non-NRT
users shouldn't have to pay all this added RAM cost?{quote}

At present it's still all on demand. Skip lists will require
going modal because we need to build those upfront (well we
could go back and build them on demand, that'd be fun). There's
the term-freq parallel array, however if getReader is never
called, it's a single additional array that's essentially
innocuous, if useful.

{quote}Hmm your'e right that each reader needs a private copy,
to remain truly "point in time". This (4 bytes per unique term X
number of readers reading that term) is a non-trivial addition
of RAM.{quote}

PagedInt time? However even that's not going to help much if in
between getReader calls, 10,000s of terms were seen, we could
have updated 1000s of pages. AtomicIntArray does not help
because concurrency isn't the issue, it's point-in-timeness
that's required. Still I guess PagedInt won't hurt, and in the
case of minimal term freq changes, we'd still be potentially
saving RAM. Is there some other data structure we could pull out
of a hat and use?



> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907628#action_12907628 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

Actually, have said that, we don't need to pre-allocate the forwarding address because we're copy-on-writing the byte[]s.  So I guess we're good!

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914634#action_12914634 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

Maybe we can not skip until we've hit the max slice?  This way skipping would always know it's on the max slice.  This works out to 429 bytes into the stream... likely this is fine.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907493#action_12907493 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

I'm finally understanding the slice concept, basically we're
over-allocating space within the ByteBlockPool byte[]s for more
postings for a particular term, hence the levelSizeArray which
determines the length of each "slice" of a byte[] the postings
will use. They're probably not always filled in completely?

It's a bit tricky to follow by reading the code, which makes
figuring out how to make the RAM buffer concurrent challenging.
Especially in the newSlice method which rewrites the end of the
last slice with the forwarding index/address of the next slice.
It's very clever however maybe we can encapsulate it better with
methods delineating the various operations which right now are
operations directly on the assortment of arrays. In general we
can possibly get away with using copy-on-write to achieve
performant single-threaded write and multi-threaded reader
concurrency.



> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893615#action_12893615 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

{quote}Where is there a concurrency problem? Is it a JMM
visibility issue of writes from one thread vs reads, in a shared
byte[]?{quote}

For example if the 1st dimension of the byte array increases,
how is that made visible while it's also being read? In
addition, there's the JMM issue of the actual byte[]s that hold
the data (ie, if we're writing to a byte[] there's no guarantee
as to when those bytes will be available to reading threads).

We're writing segments to the Directory API today, it seems we
should be able to also write to a virtual ram filesystem (a
concurrent RAMDir?) with some basic block level locking
semantics (encapsulated in a flush method). Then we could safely
and concurrently read from this new RAMDr. Couldn't we also use
the RAMDir to interleave IndexOutputs, or provide an abstraction
that performs actual the interleaving?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914863#action_12914863 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

In regards to the performance effects on writes of obtaining the reader from each DWPT, there should not be any, because it is the thread calling getReader that will wait for the lock on the DWPT in between doc adds.  The copy-on-write is it's most primitive form, is a copy of object references, eg, the cost is extremely low.  And so I do not think indexing performance will be affected whatsoever by the copy-on-write approach.  Of course we'll need to benchmark to verify.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914902#action_12914902 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

The RAM buffer single-level skip list writer probably requires two additional parallel arrays.  One for the beginning address into the skip list BBP.  The second for the address upto, where the last skip list entry that was written left off.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915130#action_12915130 ] 

Jason Rutherglen edited comment on LUCENE-2575 at 9/27/10 1:44 AM:
-------------------------------------------------------------------

Here are the new parallel arrays.  It seems like something went wrong and there are too many, however I think each is required.

{code}
final int[] skipStarts; // address where the term's skip list starts (for reading)
final int[] skipAddrs; // where writing left off
final int[] sliceAddrs; // the start addr of the last posting slice
final byte[] sliceLevels; // posting slice levels
final int[] skipLastDoc; // last skip doc written
final int[] skipLastAddr; // last skip addr written
{code}

In regards to writing into the skip list the start address of
the first level 9 posting slice: Because we're writing vints
into the posting slices, and vints may span more than 1 byte, we
may (and this has happened in testing) write a vint that spans
slices, so if we record the last slice address and read a vint
from that point, we'll get an incorrect vint. If we start 1+
bytes into a slice, we will not know where the slice ends
(because we are assuming they're 200 bytes in length). Perhaps
in the slice address parallel array we can somehow encode the
first slice's length, or add yet another parallel array for the
length of the first slice.  Something to think about.

      was (Author: jasonrutherglen):
    Here are the new parallel arrays.  It seems like something went wrong and there are too many, however I think each is required.

{code}
final int[] skipStarts; // address where the term's skip list starts (for reading)
final int[] skipAddrs; // where writing left off
final int[] sliceAddrs; // the start addr of the last posting slice
final byte[] sliceLevels; // posting slice levels
final int[] skipLastDoc; // last skip doc written
final int[] skipLastAddr; // last skip addr written
{code}

In regards to writing into the skip list the start address of
the first level 9 posting slice: Because we're writing vints
into the posting slices, and vints may span more than 1 byte, we
may (and this has happened in testing) write a vint that spans
slices, so if we record the last slice address and read a vint
from that point, we'll get an incorrect vint. If we start 1+
bytes into a slice, we will not know where the slice ends
(because we are assuming they're 200 bytes in length). Perhaps
in the slice address parallel array we can somehow encode the
first slice's length, or add yet another parallel array for the
length of the first slice.  Something to think about.

We can't simply read
ahead 200 bytes (ie, level 9), nor can
  
> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914626#action_12914626 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

{quote}
I think we'll need to use
reference counting, or simply not pool the byte[]s after
flushing, in order to avoid overwriting of arrays.
{quote}

Can we just have IW allocate a new byte[][] after flush?  So then any open readers can keep using the one they have?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914838#action_12914838 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

{quote}Can you explain what's the "copy on write ByteBlockPool"?
Exactly when do we make a copy....? {quote}

A copy of the byte[][] refs is made when getReader is called.
Each DWPT is locked, eg, writes stop, a copy of the byte[][] is
made (just the refs) for that reader. I think the issue at the
moment is I'm using a boolean[] to signify if a byte[] needs to
be copied before being written to. As with BV and norms cloning,
read-only references are carried forward, which would imply
making copies of the boolean[] as well. In other words, as with
BV and norms, I think we need ref counts to the individual
byte[]s so that read-only references to byte[]s are carried
forward properly. However this implies creating a BytesRefCount
object because a parallel array cannot point back to the same
underlying byte[] if the byte[] in the byte[][] can be replaced
when a copy is made. 

{quote}Do we have a design thought out for this? The challenge
is because every doc state now has its own private docID
stream{quote}

It sounded easy when I first heard it, however, I needed to
write it down to fully understand and work through what's going
on. That process is located in LUCENE-2558. 

{quote}Well, I was thinking only implement the single-level skip
case (since it ought to be alot simpler than the
MLSLW/R)....{quote}

I started on this, eg, implementing a single-level skip list
that reads and writes from the BBP. It's a good lesson in how to
use the BBP.

{quote}Actually, conjunction (AND) queries, and also
PhraseQuery{quote}

Both very common types of queries, so we probably need some type
of skipping, which we will, it'll just be single-level.

{quote}Probably we should stop reusing the byte[] with this
change? So when all readers using a given byte[] are finally
GCd, is when that byte[] is reclaimed.{quote}

I have a suspicion we'll change our minds about pooling byte[]s.
We may end up implementing ref counting anyways (as described
above), and the sudden garbage generated *could* be a massive
change for users? Of course ref counting was difficult to
implement the first time around in LUCENE-1314, perhaps however
it'll be easier the 2nd time. 

As a side note, there is still an issue in my mind around the
term frequencies parallel array (introduced in these patches),
in that we'd need to make a copy of it for each reader (because
if it changes, the scoring model becomes inaccurate?). However,
we could in fact use a 2 dimensional PagedBytes (in this case,
PagesInts) for this purpose. Or is the garbage of an int[] the
size of the number of docs OK per reader? There is also the
lookup cost to consider.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910201#action_12910201 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

bq. we know what size (level + 1, ceiling'd) to make the next slice.

Thanks.  In the midst of debugging last night I realized this.  The next question is whether to remove it.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914684#action_12914684 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

{quote}Maybe we can not skip until we've hit the max slice? This
way skipping would always know it's on the max slice. This works
out to 429 bytes into the stream... likely this is fine. {quote}

Me like-y. I'll implement the skip list to point to the largest
level slices.

{quote}Can we just have IW allocate a new byte[][] after flush?
So then any open readers can keep using the one they have?{quote}

This means the prior byte[]s will still be recycled after all
active previous flush readers are closed? If there are multiple
readers from the previous flush, we'd probably still need
reference counting (ala bitvector and norms)? Unfortunately a
reference count parallel array will not quite work because we're
copy-on-writing the byte[]s, eg, there's nothing consistent for
the index numeral to point to. A hash map of byte[]s would
likely be too heavyweight? We may need to implement a ByteArray
object composed of a byte[] and a refcount. This is somewhat
counter to our parallel array memory savings strategy, though it
is directly analogous to the way norms are implemented in
SegmentReader.

{quote}it's possible single level skipping, with a larger skip
interval, is fine for even large RAM buffers.{quote}

True, I'll implement a default of one level, and a default
large-ish skip interval.

{quote}Maybe we can get an initial version of this working,
without the skipping? Ie skipping is implemented as scanning.
{quote}

How many scorers, or how often is skipping used? It's mostly for
disjunction queries? If we limit the skip level to one, and not
implement the BBP level byte at the beginning of the slice, the
MLSL will be a lot easier (ie faster) to implement and test. 

I'd like to see BytesHash get out of THPF (eg, LUCENE-2662), get
deletes working in the RT branch, and merge the flush by DWPT to
trunk. Concurrently I'll work on the search on the RAM buffer
which is most of the way completed. I'd prefer to test a more
complete version of LUCENE-2312 with skip lists (which can
easily be turned off), so that when we do take it through the
laundromat of testing, we won't need to retrofit anything back
in, re-test, and possibly re-design. 

On a side note related to testing: One naive way I've tested is
to do the copy-on-write of the BBP when the segment needs to be
flushed to disk, and write the segment from the read-only copy
of the BBP. If the segment is correct, then at least we know the
copy worked properly and nothing's missing.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908849#action_12908849 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

One thing I noticed, correct me if I'm wrong, is the term doc
frequency (the one stored per term, ie, TermsEnum.docFreq)
doesn't seem to be currently recorded in the ram buffer code
tree. It will be easy to add, though if we make it accurate per
RAM index reader then we could be allocating a unique array, the
length of the number of terms, per reader. I'll implement it
this way to start and we can change it later if necessary.
Actually, to save RAM this could be another use case where a 2
dimensional copy-on-write array is practical.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907587#action_12907587 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

Right, we over-allocate each slice according to the sizes in levelSizeArray.

For a given "stream", all of its slices but the last one will be filled in.  The interleaved slices "logically" encode many streams (one per unique term), and all of these streams will have a "tip" slice, where any bytes written to that stream will go.

How to make this properly concurrent is a challenge.  Each reader knows its max docID, so, it can stop reading a given stream when it hits a docID over the max.  But, because the write can go back and overwrite the last 4 bytes of a slice (w/ the address of the next slice), we have to guard for that case (when a reader is trying to read that slice at the same time).

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916005#action_12916005 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

The last comment shows the brain is tired, ie, ignore it because
there would be too many pointers for the byte[]s. 

The comment prior however will probably work, and I think
there's a solution to excessive posting-upto int[] per reader
generation. If when getReader is called, we copy a writable
posting-upto array to a single master posting-upto parallel
array, then we will not need to create a unique int[] per
reader. The reason this would work is, past readers that are
iterating their term docs concurrently with the change to the
posting-upto array, will stop at the maxdoc anyways. This'll
be fun to implement.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915137#action_12915137 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

There's a little error in thinking of the last comment.  Also, the best solution is probably to store the length of the posting slice into the skip list byte pool.  This'll mean a slight modification to byte slice reader, however I think it'll work.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914607#action_12914607 ] 

Jason Rutherglen edited comment on LUCENE-2575 at 9/24/10 3:28 PM:
-------------------------------------------------------------------

The current MultiLevelSkipList* system relies on writing out
fixed length skip list buffers before they are readable. This
obviously will not work for RT so I'm working on modifying MLSL
into new class(es) that writes and reads from the concurrent-ish
BBP. 

In trunk, each level is a RAMOutputStream, that'll need to
change, and each level will likely be a stream keyed into
the BBP. A question is whether we will statically assign the
number of levels prior to the creation of the MLSL, or will we
need to somehow make the number of levels dynamic, in which case
using streams becomes slightly more complicated.



      was (Author: jasonrutherglen):
    The current MultiLevelSkipList* system relies on writing out
fixed length skip list buffers before they are readable. This
obviously will not work for RT so I'm working on modifying MLSL
into new class(es) that writes and reads from the concurrent-ish
BBP. 

In trunk, each level is a RAMOutputStream, that'll nee to
changechange, and each level will likely be a stream keyed into
the BBP. A question is whether we will statically assign the
number of levels prior to the creation of the MLSL, or will we
need to somehow make the number of levels dynamic, in which case
using streams becomes slightly more complicated.


  
> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914624#action_12914624 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

Maybe we can get an initial version of this working, without the skipping?  Ie skipping is implemented as scanning.

My guess is for in-RAM postings we don't need as aggressive skipping as we do on-disk, and it's possible single level skipping, with a larger skip interval, is fine for even large RAM buffers.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907624#action_12907624 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

I see the stream as an argument to the writeByte method, however I only see 0 or 1 being passed in for freq and prox respectively.  

I'm not sure how we can implement the linked slices concept concurrently without pre-allocating 4 bytes at the end of each slice for the forwarding address.  We would be able to go back to byte[]s that are readonly, copy, rewrite the forwarding address(es), and then synchronously flush the rewritten byte[]s back to the readonly list.  We could have a switch that turns the auto-address writing on or off if a user does not plan on using realtime search.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921784#action_12921784 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

The issue with the model given is the posting-upto is handed to
the byte slice reader as the end index. However newly written
bytes may not actually make it to a reader thread as per the
JMM. A reader thread may reach partially written bytes. There
doesn't seem to be a way to tell the reader it's reached the end
of the written bytes and so we probably need to add 2 paged ints
arrays for freq and prox uptos respectively. This would be
unfortunate because either the paged ints will need to be
updated during the get reader call, or during indexing. Both
could be detrimental to performance, though the net is still
faster that the current NRT solution. The alternative is to
simply copy-on-write the byte blocks, though that'd need to
include the int blocks as well. I think we'd want to update the
paged ints during indexing, otherwise discount it as a solution
because otherwise it'd require full array iterations in the get
reader call to compare and update. The advantage of
copy-on-write of the blocks is the indexing speed will not be
affected, nor the read speed, the main potential performance
drag could be the garbage generated by the byte and int arrays
thrown away on reader close. It would depend on how many blocks
were updated in between get reader calls. 

We probably need to implement both solutions, try them out and
measure the performance difference. 

There's Michael B.'s multiple slice levels linked together
by atomic int arrays illustrated here:
http://www.box.net/shared/hivdg1hge9 

After reading this, the main idea I think we can use is to
instead of using paged ints, simply maintain 2 upto arrays. One
that's being written to, and a 2nd that's guaranteed to be in
sync with the byte blocks. This would save on garbage and
lookups into paged ints. The cost would is the array copy in the
get reader lock. Given the array already exists, the copy should
be fast?  Perhaps this is the go ahead solution?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916199#action_12916199 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

bq. We'd need to increase the level 0 slice size...

Yes. 

{quote}but the reader needs to read 'beyond' the end of a given
slice, still? Ie say global maxDoc is 42, and a given posting
just read doc 27 (which in fact is its last doc). It would then
try to read the next doc?{quote}

The posting-upto should stop the reader prior to reaching a byte
element whose value is 0, ie, it should never happen.

The main 'issue', which really isn't one, is that each reader
cannot maintain a copy of the byte[][] spine as it'll be
growing. New buffers will be added and the master posting-upto
will also be changing, therefore allowing 'older' readers to
possibly continue past their original point-in-time byte[][].
This is solved by adding synchronized around the obtainment of
the byte[] buffer from the BBP, thereby preventing out of bounds
exceptions.

{quote}We don't store tf now do we? Adding 4 bytes per unique
term isn't innocuous!{quote}

What I meant is, if we're merely maintaining the term freq array
during normal, non-RT indexing, then we're not constantly
creating new arrays, we're in innocuous land, though there is no
use for the array in this case, eg, it shouldn't be created
unless RT had been flipped on, modally. 

{quote}Hmm the full copy of the tf parallal array is going to
put a highish cost on reopen? So some some of transactional
(incremental copy-on-write) data structure is needed (eg
PagedInts)...{quote}

Right, this to me is the remaining 'problem', or rather
something that needs a reasonable go-ahead solution. For now we
can assume PagedInts is the answer.

In addition, to summarize the skip list. It needs to store the
doc, address into the BBP, and the length to the end of the
slice from the given address. This allows us to point to a
document anywhere in the postings BBP, and still continue with
slice iteration. In the test code I've written, the slice level
is stored as well, I'm not sure why/if that's required. I think
it's a hint to the BBP reader as to the level of the next slice.



> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893564#action_12893564 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

Logically, every term has its own open IndexOutput, where it can write any number of bytes.  During indexing, when we hit a given term, we init its IndexOutput (two of of them -- one frq, one prx) and write a few bytes as appropriate.

It's that abstraction that the interleaved byte slices API provides -- the ability to hold open a great many IndexOutputs.

We should then be able to init IndexInputs against these slices as well, but they can only sequentially scan.

To handle skipping, I think we can write to another ByteBlockPool?  That skip data would be similar to the multi-level skip data we now record, except instead of indexing into a single frq or prx file, it indexes into positions in the primary ByteBlockPool.

Where is there a concurrency problem?  Is it a JMM visibility issue of writes from one thread vs reads, in a shared byte[]?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913597#action_12913597 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

The reference counting described above is a common pattern throughout Lucene, one similarly used place is IR reopen and clone.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916000#action_12916000 ] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

OK, I think there's a solution to copying the actual byte[],
we'd need to alter the behavior of BBPs. It would require always
allocating 3 empty bytes at the end of a slice for the
forwarding address, rather than what we do today, which is write
the postings up to the end of the slice, then when allocating a
new slice, copying the last 3 bytes forward to the new slice
location. We would also need to pass a unique parallel posting
upto array to each reader. This is required so that the reader
never ventures beyond the end of a slice, as the slice was
written when the reader was instantiated.

This would yield significant savings because we would not be
generating garbage from the byte[]s, which are 32 KB each. They
add up if the indexing is touching many different byte[]s for
example. With this solution, there would essentially not be any
garbage generated from incremental indexing, only after a DWPTs
segment is flushed (and all readers were also GCed). 

The only downside is we'd be leaving those 3 bytes per term
unallocated at all times, that's not a very high price. Perhaps
more impacting is the posting upto array per reader, which'd be
4 bytes per term, the same cost as the term freq array. It's a
pick your poison problem.

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914803#action_12914803 ] 

Michael McCandless commented on LUCENE-2575:
--------------------------------------------

Can you explain what's the "copy on write ByteBlockPool"?  Exactly when do we make a copy....?

> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org