You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2010/07/26 20:13:16 UTC

[jira] Issue Comment Edited: (LUCENE-2567) RT Terms Dictionary

    [ https://issues.apache.org/jira/browse/LUCENE-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892396#action_12892396 ] 

Jason Rutherglen edited comment on LUCENE-2567 at 7/26/10 2:12 PM:
-------------------------------------------------------------------

Further thinking about the RT terms dictionary, we probably do
not want to insert terms into the concurrent terms dictionary as
we're tokenizing due to the inherent insertion performance O(log n)
degradation. Also, I believe that implementing a flattened
http://fuseyism.com/classpath/doc/java/util/concurrent/ConcurrentSkipListMap-source.html 
will be fairly complex. 

Instead we may want to queue new terms and merge them into the
terms dictionary on demand (ie, when a new terms enum is
created). One way to achieve this could be to only periodically
recreate a new terms index (lets say after N new terms have been
added). Each term in the terms index array points to a position
in the terms linked list (as today). The binary search of the
terms index, then the linear scan of the terms dictionary would
also behave as usual. We need to figure out how we want to
concurrently grow the terms index arrays.  

      was (Author: jasonrutherglen):
    Further thinking about the RT terms dictionary, we probably do
not want to insert terms into the concurrent terms dictionary as
we're tokenizing due to the inherent insertion performance (log(n))
degradation. Also, I believe that implementing a flattened
http://fuseyism.com/classpath/doc/java/util/concurrent/ConcurrentSkipListMap-source.html 
will be fairly complex. 

Instead we may want to queue new terms and merge them into the
terms dictionary on demand (ie, when a new terms enum is
created). One way to achieve this could be to only periodically
recreate a new terms index (lets say after N new terms have been
added). Each term in the terms index array points to a position
in the terms linked list (as today). The binary search of the
terms index, then the linear scan of the terms dictionary would
also behave as usual. We need to figure out how we want to
concurrently grow the terms index arrays.  
  
> RT Terms Dictionary
> -------------------
>
>                 Key: LUCENE-2567
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2567
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> Implement an in RAM terms dictionary for realtime search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org