You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/08/06 12:51:15 UTC

[jira] Created: (LUCENE-2589) Add a variable-sized int block codec

Add a variable-sized int block codec
------------------------------------

                 Key: LUCENE-2589
                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.0


We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.

But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896496#action_12896496 ] 

Michael McCandless commented on LUCENE-2589:
--------------------------------------------

I think we should do both!  (Single static random seed and some basic docs about all the neat props our tests now accept...).  But I think these are separate issues?  I'll commit this one (enabling you to specify codec & its param on the ant command-line) shortly.

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch, LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2589.
----------------------------------------

    Resolution: Fixed

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch, LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2589:
---------------------------------------

    Attachment: LUCENE-2589.patch

Attached patch.

Just like for the fixed int block case, I added a MockVariableIntBlockCodec (in src/test), with a stupid variable-sized int block encoding.

These MockVariable/FixedIntBlockCodec serve as a good example of how one can take any low-level int encoder and turn it into a Lucene codec.

I also increased randomness of the codecs picked for testing, by adding params like block size (for both fixed & variable mock intblock codecs) and the freq cutoff for Pulsing.  So these configurations are now also randomly picked when running tests (= spikes on the monster's back).

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896406#action_12896406 ] 

Simon Willnauer commented on LUCENE-2589:
-----------------------------------------

{quote}Actually i would prefer we just fix LuceneTestCase etc so that each test class has a single static random seed, then there would be less parameters. Then we can change the failure message to just say 'reproduce with -D....' and I think it would be best.{quote}

A single random seed per TestCase / Class would make things way easier IMO and I would agree that we should have that per Class. Nevertheless, would that deprecate all parameters? When I want to use randomized tests but need to force a certain codec that wouldn't work. Anyway, a documentation of whatever we do here would help people new to lucene to get started with patches and test.

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch, LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896402#action_12896402 ] 

Robert Muir commented on LUCENE-2589:
-------------------------------------

{quote}
Looks good to me - makes sense to have the size configurable. I wonder if we should start some documentation either in src/test/../package.html or on the wiki which holds information about how we test and which properties are recognized in the unit test.
{quote}

Actually i would prefer we just fix LuceneTestCase etc so that each test class has a single static random seed, then there would be less parameters. Then we can change the failure message to just say 'reproduce with -D....' and I think it would be best.

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch, LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896417#action_12896417 ] 

Robert Muir commented on LUCENE-2589:
-------------------------------------

bq. When I want to use randomized tests but need to force a certain codec that wouldn't work.

Yes it would, as random codec selection would be determined by the same random seed (so if you use the same seed, you force the same codec).

bq. Anyway, a documentation of whatever we do here would help people new to lucene to get started with patches and test.

I dont think we should add a lot of documentation (which will only become obselete as i know i will be adding even more dimensions to the test ASAP). I think its better to simplify and use a single seed for selecting all parameters! 

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch, LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reopened LUCENE-2589:
----------------------------------------


I want to make it possible to pass params to the test codecs, eg -Dtest.codecs=Pulsing(4)

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2589:
---------------------------------------

    Attachment: LUCENE-2589.patch

Simple patch -- uses regexp to parse out a single int param.

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch, LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896397#action_12896397 ] 

Simon Willnauer commented on LUCENE-2589:
-----------------------------------------

bq. Simple patch - uses regexp to parse out a single int param.
Looks good to me - makes sense to have the size configurable. I wonder if we should start some documentation either in src/test/../package.html or on the wiki which holds information about how we test and which properties are recognized in the unit test.

Thoughts?

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch, LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2589) Add a variable-sized int block codec

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2589.
----------------------------------------

    Resolution: Fixed

> Add a variable-sized int block codec
> ------------------------------------
>
>                 Key: LUCENE-2589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2589
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2589.patch
>
>
> We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.
> But algorithms like Simple9/16 are not fixed -- they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org