You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Rowe (JIRA)" <ji...@apache.org> on 2008/11/01 05:49:44 UTC

[jira] Created: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa
-------------------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-1434
                 URL: https://issues.apache.org/jira/browse/LUCENE-1434
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Other
    Affects Versions: 2.4
            Reporter: Steven Rowe
            Priority: Minor
             Fix For: 2.9


Provides support for converting byte sequences to Strings that can be used as index terms, and back again. The resulting Strings preserve the original byte sequences' sort order (assuming the bytes are interpreted as unsigned).

The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence.  Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D800-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.

This class is intended to serve as a mechanism to allow CollationKeys to serve as index terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1434:
------------------------------------------

    Assignee: Michael McCandless

> IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1434
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1434
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1434.patch
>
>
> Provides support for converting byte sequences to Strings that can be used as index terms, and back again. The resulting Strings preserve the original byte sequences' sort order (assuming the bytes are interpreted as unsigned).
> The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence.  Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D800-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.
> This class is intended to serve as a mechanism to allow CollationKeys to serve as index terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683171#action_12683171 ] 

Michael McCandless commented on LUCENE-1434:
--------------------------------------------

This looks good.  I plan to commit shortly!

> IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1434
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1434
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1434.patch
>
>
> Provides support for converting byte sequences to Strings that can be used as index terms, and back again. The resulting Strings preserve the original byte sequences' sort order (assuming the bytes are interpreted as unsigned).
> The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence.  Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D800-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.
> This class is intended to serve as a mechanism to allow CollationKeys to serve as index terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1434.
----------------------------------------

    Resolution: Fixed

Thanks Steven!

> IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1434
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1434
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1434.patch
>
>
> Provides support for converting byte sequences to Strings that can be used as index terms, and back again. The resulting Strings preserve the original byte sequences' sort order (assuming the bytes are interpreted as unsigned).
> The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence.  Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D800-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.
> This class is intended to serve as a mechanism to allow CollationKeys to serve as index terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1434) IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa

Posted by "Steven Rowe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe updated LUCENE-1434:
--------------------------------

    Attachment: LUCENE-1434.patch

> IndexableBinaryStringTools: convert arbitrary byte sequences into Strings that can be used as index terms, and vice versa
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1434
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1434
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>    Affects Versions: 2.4
>            Reporter: Steven Rowe
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1434.patch
>
>
> Provides support for converting byte sequences to Strings that can be used as index terms, and back again. The resulting Strings preserve the original byte sequences' sort order (assuming the bytes are interpreted as unsigned).
> The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence.  Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D800-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.
> This class is intended to serve as a mechanism to allow CollationKeys to serve as index terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org