You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2011/06/14 00:15:47 UTC

[jira] [Created] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Add non-desctructive sort to BytesRefHash
-----------------------------------------

                 Key: LUCENE-3199
                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/index
    Affects Versions: 4.0
            Reporter: Jason Rutherglen
            Priority: Minor


Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3199:
------------------------------------

    Attachment: LUCENE-3199.patch

hey jason, I actually moved this a little further and added a ReadOnly View To BytesRefHash. This View provides next(), seekExact() and seekCeil() methods just like we have TermsEnum. 
The view is actually sorted if needed and can incrementally merge with a previously created view. 
Initially I wondered if this approach would be feasible performance wise but in fact this  is actually really fast. I did some poor-mans benchmarks where I opened a new view every 500 to 1000 new unique terms and this takes around 0.001 to 0.01 millisecond on average. I have never seen it taking longer than 0.1 ms. I think it would be worth while exploring if we can go that simple and reopen such a view for each document while we are indexing. The view actually allocates only one additional array and reuses all other references from the BytesRefHash instance. It seems this one additional int[] is not too bad though.

the patch is still rough. I will work further on it next week. 

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-3199:
-------------------------------------

    Attachment: LUCENE-3199.patch

Here's a version of this issue.  Added are a couple of new methods to TestBytesRefHash to test the new frozen compact and sorting functionality of BytesRefHash.

This is being posted now because it's useful in relation to LUCENE-2312 and a terms dictionary that is composed of sorted by term[id]s int[]s.

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096108#comment-13096108 ] 

Jason Rutherglen commented on LUCENE-3199:
------------------------------------------

Simon,

In summary this is using the BytesRefHash sort, performing array copies and
then merge [sorting] into a new copy / view. 

Array copies are fast and counter intuitively generate far less garbage than
objects (in Java). 

Instead of creating term 'segments' that would be merged while iterating the
terms enum, we'll be generating static point-in-time terms dict views. These
will be useful for enabling DocTermsIndex field caches for RT, the only
remaining design 'challenge' for RT / LUCENE-2312. Because there is a terms
hash, we can seek exact to the term rather than perform an [optimized] seek to
the term.

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048987#comment-13048987 ] 

Jason Rutherglen commented on LUCENE-3199:
------------------------------------------

I think the issue with this, as it relates to realtime search, is in order to sort, we'll need to freeze indexing.

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-3199:
-------------------------------------

    Attachment: LUCENE-3199.patch

This is a minor update when compared with the last patch.  

It adds the option of pruning the [oversized] int[] returned by the compact method.  

Added are additional unit tests.

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097257#comment-13097257 ] 

Jason Rutherglen commented on LUCENE-3199:
------------------------------------------

Ok, solved the above comment by taking the sorted ord array and building a new reverse array from that... 

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096094#comment-13096094 ] 

Uwe Schindler commented on LUCENE-3199:
---------------------------------------

Cool idea with the view! Policeman work: SorterTemplate looks correct :-)

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3199:
------------------------------------

    Attachment: LUCENE-3199.patch

new version, fixed one concurrency issue and added some doc strings

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097246#comment-13097246 ] 

Jason Rutherglen commented on LUCENE-3199:
------------------------------------------

I started integrating the patch into LUCENE-2312.  I think the main functionality missing is a reverse int[] that points from a term id to the sorted ords array.  That array would be used for implementing the RT version of DocTermsIndex, where a doc id -> term id -> sorted term id index.  

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3199) Add non-desctructive sort to BytesRefHash

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096231#comment-13096231 ] 

Jason Rutherglen commented on LUCENE-3199:
------------------------------------------

Simon, I think your patch should be in a different issue, eg, sorted bytes ref hash view or something.

> Add non-desctructive sort to BytesRefHash
> -----------------------------------------
>
>                 Key: LUCENE-3199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3199
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch, LUCENE-3199.patch
>
>
> Currently the BytesRefHash is destructive.  We can add a method that returns a non-destructively generated int[].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org