You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2010/07/23 20:36:50 UTC

[jira] Created: (LUCENE-2558) Use sequence ids for deleted docs

Use sequence ids for deleted docs
---------------------------------

                 Key: LUCENE-2558
                 URL: https://issues.apache.org/jira/browse/LUCENE-2558
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: Realtime Branch
            Reporter: Jason Rutherglen
            Priority: Minor
             Fix For: Realtime Branch


Utilizing the sequence ids created via the update document
methods, we will enable IndexReader deleted docs over a sequence
id array. 

One of the decisions is what primitive type to use. We can start
off with an int[], then possibly move to a short[] (for lower
memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893579#action_12893579 ] 

Michael McCandless commented on LUCENE-2558:
--------------------------------------------

Resolving deleted terms -> doc IDs doesn't require a sorted terms dict right?  Ie a simple hash lookup suffices?

> Use sequence ids for deleted docs
> ---------------------------------
>
>                 Key: LUCENE-2558
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2558
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>
> Utilizing the sequence ids created via the update document
> methods, we will enable IndexReader deleted docs over a sequence
> id array. 
> One of the decisions is what primitive type to use. We can start
> off with an int[], then possibly move to a short[] (for lower
> memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893911#action_12893911 ] 

Jason Rutherglen commented on LUCENE-2558:
------------------------------------------

I'm implementing a basic doc id iterator per DWPT which will allow us to implement delete by term, and the deleted docs sequence ids.  This is for merging of segments?  However we're using readers to do the merging so this really won't be useful?

> Use sequence ids for deleted docs
> ---------------------------------
>
>                 Key: LUCENE-2558
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2558
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>
> Utilizing the sequence ids created via the update document
> methods, we will enable IndexReader deleted docs over a sequence
> id array. 
> One of the decisions is what primitive type to use. We can start
> off with an int[], then possibly move to a short[] (for lower
> memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893619#action_12893619 ] 

Jason Rutherglen commented on LUCENE-2558:
------------------------------------------

{quote}Resolving deleted terms -> doc IDs doesn't require a
sorted terms dict right? Ie a simple hash lookup suffices?
{quote}

True, however I figured it'd be best to try our own dog food, or
APIs. I think the main issue right now is the concurrency of the
*BlockPools from LUCENE-2575. Then we should be able to
implement deleting, which doesn't require skip lists. I guess if
we really wanted to, we could simply buffer terms and only apply
them in getReader.  getReader would block any writes that could
be altering the *BlockPools. Maybe this is a good first step? Is there
any reason we need to apply deletes in the actual updateDoc and
deleteDoc methods?  

> Use sequence ids for deleted docs
> ---------------------------------
>
>                 Key: LUCENE-2558
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2558
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>
> Utilizing the sequence ids created via the update document
> methods, we will enable IndexReader deleted docs over a sequence
> id array. 
> One of the decisions is what primitive type to use. We can start
> off with an int[], then possibly move to a short[] (for lower
> memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913400#action_12913400 ] 

Jason Rutherglen commented on LUCENE-2558:
------------------------------------------

For the deleted docs sequence id array, perhaps I'm a little bit
confused, but how will we signify in the sequence id array if a
document is deleted? I believe we need a secondary sequence id
array for deleted docs that is init'd to -1. When a document is
deleted, the sequence id is set for that doc in the
del-docs-seq-arr. When the deleted docs Bits is being accessed,
for a given doc, we'll compare the IRs seq-id-up-to with the
del-docs-seq-id, and if the IR seq-id is greater than or equal
to, the Bits.get method will return true, meaning the document
is deleted. 

I am forgetting how concurrency will work in this case, ie,
insuring multi-threaded visibility due to the JMM. Actually,
because we're pausing the writes/deletes when get reader is
called on the DWPT, JMM concurrency should be OK.

> Use sequence ids for deleted docs
> ---------------------------------
>
>                 Key: LUCENE-2558
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2558
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>
> Utilizing the sequence ids created via the update document
> methods, we will enable IndexReader deleted docs over a sequence
> id array. 
> One of the decisions is what primitive type to use. We can start
> off with an int[], then possibly move to a short[] (for lower
> memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893004#action_12893004 ] 

Jason Rutherglen commented on LUCENE-2558:
------------------------------------------

I tried to start on this however, nothing can be deleted without the terms dictionary and the terms docs working in order to obtain the doc ids to delete.

> Use sequence ids for deleted docs
> ---------------------------------
>
>                 Key: LUCENE-2558
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2558
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: Realtime Branch
>
>
> Utilizing the sequence ids created via the update document
> methods, we will enable IndexReader deleted docs over a sequence
> id array. 
> One of the decisions is what primitive type to use. We can start
> off with an int[], then possibly move to a short[] (for lower
> memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org