You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/05/03 19:41:03 UTC

[jira] [Created] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

NumericField should be stored in binary format in index (matching Solr's format)
--------------------------------------------------------------------------------

                 Key: LUCENE-3065
                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
            Reporter: Michael McCandless
            Priority: Minor
             Fix For: 3.2, 4.0


(Spinoff of LUCENE-3001)

Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972

We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.

A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: [jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by Uwe Schindler <uw...@thetaphi.de>.
Sorry, I did not want to delete this one, my huper duper browser gots totally confused and disturbed...

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Uwe Schindler (JIRA) [mailto:jira@apache.org]
> Sent: Thursday, May 05, 2011 6:13 PM
> To: dev@lucene.apache.org
> Subject: [jira] [Updated] (LUCENE-3065) NumericField should be stored in
> binary format in index (matching Solr's format)
> 
> 
>      [ https://issues.apache.org/jira/browse/LUCENE-
> 3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> 
> Uwe Schindler updated LUCENE-3065:
> ----------------------------------
> 
>     Comment: was deleted
> 
> (was: Ideally this could be done with the schema-like approach of one of the
> GSoC projects?
> 
> We already discussed about that: We can use the FieldsReader/FieldsWriter
> type flag (which currently says, binary/text and compressed (unused now))
> in the index file format to mark a field as NumericField. In that case,
> Document.getField() would return the NumericField instance.
> 
> For Lucene backwards we should still support creating "text-only" fields.
> 
> The new binary format would also be compatible with solr, as on getField,
> Solr would get a NumericField and can decide using instanceof what to do.
> Old Solr indexes without the NumericField marker flag would return as
> byte[], in which case, solr would do the decoding.
> 
> For storing on index side, Solr could move to NumericField completely (I dont
> like the current approach using NumericTokenStream and to/fromInternal
> wrappers around conventional Field).)
> 
> > NumericField should be stored in binary format in index (matching
> > Solr's format)
> > ----------------------------------------------------------------------
> > ----------
> >
> >                 Key: LUCENE-3065
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
> >             Project: Lucene - Java
> >          Issue Type: Improvement
> >          Components: Index
> >            Reporter: Michael McCandless
> >            Assignee: Uwe Schindler
> >            Priority: Minor
> >             Fix For: 3.2, 4.0
> >
> >         Attachments: LUCENE-3065.patch, LUCENE-3065.patch,
> > LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch,
> > LUCENE-3065.patch, LUCENE-3065.patch
> >
> >
> > (Spinoff of LUCENE-3001)
> > Today when writing stored fields we don't record that the field was a
> > NumericField, and so at IndexReader time you get back an "ordinary"
> > Field and your number has turned into a string.  See
> > https://issues.apache.org/jira/browse/LUCENE-
> 1701?focusedCommentId=127
> > 21972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tab
> > panel#comment-12721972 We have spare bits already in stored fields,
> > so, we should use one to record that the field is numeric, and then encode
> the numeric field in Solr's more-compact binary format.
> > A nice side-effect is we fix the long standing issue that you don't get a
> NumericField back when loading your document.
> 
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028693#comment-13028693 ] 

Michael McCandless commented on LUCENE-3065:
--------------------------------------------

Patch looks great Uwe!  Except we need to resolve this Field/Fieldable/AbstractField.  Probably we should go and finish LUCENE-2310...

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029410#comment-13029410 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Sorry my browser or JIRA deleted wrong comments, so I removed one from me and one from Mike :( - Sorry.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3065.
-----------------------------------

    Resolution: Fixed

Committed trunk revision: 1100526

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065-trunk.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029873#comment-13029873 ] 

Michael McCandless commented on LUCENE-3065:
--------------------------------------------

Looks great Uwe!  Awesome to finally get NumericField back at search time...

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029427#comment-13029427 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Earwin: The long-term plan for flexible indexing is to make also stored fields flexible. For now its not possible, so NumericFields are handled separately. In the future, this might be a stored fields codec.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

New patch with the changes proposed before (no more instanceof chains).

I think this is now ready to commit.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029011#comment-13029011 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

I started a new issue in Solr for the changes there: SOLR-2497

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028733#comment-13028733 ] 

Michael McCandless commented on LUCENE-3065:
--------------------------------------------

Patch looks great Uwe!

I think we should deprecate Document.getField?  And advertise in CHANGES that this is an [intentional] BW break, ie, you can no longer .getField if it's a NumericField (you'll hit CCE, just like you already do for lazy fields)?  I think that's the lesser evil here?

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029421#comment-13029421 ] 

Earwin Burrfoot commented on LUCENE-3065:
-----------------------------------------

It's sad NumericFields are hardbaked into index format.

Eg - I have some fields that are similar to Numeric in that they are 'stringified' binary structures, and they can't become first-class in the same manner as Numeric.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Comment: was deleted

(was: Ideally this could be done with the schema-like approach of one of the GSoC projects?

We already discussed about that: We can use the FieldsReader/FieldsWriter type flag (which currently says, binary/text and compressed (unused now)) in the index file format to mark a field as NumericField. In that case, Document.getField() would return the NumericField instance.

For Lucene backwards we should still support creating "text-only" fields.

The new binary format would also be compatible with solr, as on getField, Solr would get a NumericField and can decide using instanceof what to do. Old Solr indexes without the NumericField marker flag would return as byte[], in which case, solr would do the decoding.

For storing on index side, Solr could move to NumericField completely (I dont like the current approach using NumericTokenStream and to/fromInternal wrappers around conventional Field).)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

Here the patch with my changes

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Comment: was deleted

(was: Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's
NumericUtils, and fixed FieldsWriter/Reader to use free bits in the
field's flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField
back, when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like
you did before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr
that must now be fixed to handle the fact that a field can come back
as NumericField?  Anyone know where...?)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030294#comment-13030294 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Committed 3.x revision: 1100480

Now forward-porting to trunk...

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028677#comment-13028677 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Mike:
I reviewed the patch again: You are currently using 3 bits already. 1 bit is solely for detecting numerics, the other two are the type.

In my opinion, to check if its a numeric field, use a MASK of 3 bits and check for !=0. As soon as any bit in this mask is set, its numeric. The actual numeric fields have values !=0:

{code}
private static final int _NUMERIC_BIT_SHIFT = 3;
static final byte FIELD_IS_NUMERIC_MASK = 0x07 << _NUMERIC_BIT_SHIFT;

static final byte FIELD_IS_NUMERIC_INT = 1 << _NUMERIC_BIT_SHIFT;
static final byte FIELD_IS_NUMERIC_LONG = 2 << _NUMERIC_BIT_SHIFT;
static final byte FIELD_IS_NUMERIC_FLOAT = 3 << _NUMERIC_BIT_SHIFT;
static final byte FIELD_IS_NUMERIC_DOUBLE = 4 << _NUMERIC_BIT_SHIFT;
// unused: static final byte FIELD_IS_NUMERIC_SHORT = 5 << _NUMERIC_BIT_SHIFT;
// unused: static final byte FIELD_IS_NUMERIC_BYTE = 6 << _NUMERIC_BIT_SHIFT;
// and we have still one more over :-)  7 << _NUMERIC_BIT_SHIFT

// check if field is numeric:
if ((bits & FIELD_IS_NUMERIC_MASK) != 0) {}

// parse type:
switch (bits & FIELD_IS_NUMERIC_MASK) {
  case FIELD_IS_NUMERIC_INT: ...
}
{code}

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028680#comment-13028680 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

This gives us more freedom in future, as we are limit to completely 8 bits, 3 are already used - this only adds 3 more not 4.

By the way, for performance reasons all constants should be declared as int not byte, as the byte read from index is already an int.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028397#comment-13028397 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Aboutthe patch: Maybe the byte[] returning methods in NumericUtils should use BytesRef and reuse that for storing (applies to trunk)?

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028483#comment-13028483 ] 

Michael McCandless commented on LUCENE-3065:
--------------------------------------------

Ugh!  Field/Fieldable/AbstractField strikes again.... hmm not sure what to do.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028526#comment-13028526 ] 

Chris Male commented on LUCENE-3065:
------------------------------------

The Field/Fieldable/AbstractField problem is what I've been addressing in LUCENE-2310.  There I took the step of making NumericField extend Field, with a series of unsupported fields.  This seemed easiest to do particularly with FieldType in mind.  I then deprecated all the Fieldable methods in Document.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

Moved test to TestFieldsReader

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: SOLR-2497.patch

MoreLikeThis problem solved, it was as I said. The test included a TrieInt field into the "similarity fields", so it was used to calculate similarity. As with previous Solr the TrieField was invisible to MLT this had no effect.
By the way: There is a commented out part with explicitely the MLT field, but I dont understand it. It seems that it was never understood/supported.
Now, all numeric fields should work with MLT.

Now only the TestDistributedSearch is still failing with a strange date failure. I'll dig.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, SOLR-2497.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028454#comment-13028454 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

There is still a problem - first the good news:

- If user calls Document.get(field), the returned string is as before, so there is no break at all. The reason is the implementation of NumericField.stringValue(), it returns what the user is used to from 3.0
- If a user calls getFieldable(field) all is fine, too. The only change is that it not could return NumericField. If the user simply calls stringValue() all is identical to 3.0

Problems start with:

- If user calls Document.getField(name) it returns Field (internally it casts the getFieldable()) result to Field. But NumericField does not subclass Field -> ClassCastException. 

How to handle this?

- Maybe change those methods to return AbstractField, but thats a binary break and users will complain, because not everything works as expected
- Make NumericField subclass Field (and Field is unfinalized) - thats a bad idea, because Field has too many methods / members that are out of scope
- Deprecate Document.getField() and make it internally do an instanceof check, if it gets NumericField transform to a backwards-compatible Field? - This method is already broken. If you request Lazy field loading it also throws ClassCastEx (e.g. LUCENE-609).

Not sure how to proceed. Else the patch looks fine. I think simply ignoring LazyField loading is fine, as numeric fields are a maximum of 8 bytes.... Else we would need LazyNumericField :(

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028399#comment-13028399 ] 

Ryan McKinley commented on LUCENE-3065:
---------------------------------------

bq. Is there some reason...?

Solr did its own encoding/decoding so that it could store a binary field -- with this patch, that is not necessary anymore.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029412#comment-13029412 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/5/11 4:22 PM:
---------------------------------------------------------------

Revert of deletion of Mike's first comment (sorry)

{quote}
Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField back, when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that must now be fixed to handle the fact that a field can come back as NumericField?  Anyone know where...?)
{quote}

      was (Author: thetaphi):
    Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField back, when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that must now be fixed to handle the fact that a field can come back as NumericField?  Anyone know where...?)

  
> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029906#comment-13029906 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

bq. Just to note: We also need to change the Forrest index format documentation!

I already commented on that :-) [https://issues.apache.org/jira/browse/LUCENE-3065?focusedCommentId=13028718&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13028718]

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-3065:
-------------------------------------

    Assignee: Uwe Schindler

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028707#comment-13028707 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 11:06 AM:
----------------------------------------------------------------

This patch adds some refactoring because FieldSelectorResult is an enum since 3.0, so the (slow) queue of if-statements can be replaced by a fast switch.

Also some minor comments and a missing & 0xFF when casting byte to int.

      was (Author: thetaphi):
    This patch adds some refactoring because FieldSelectorResult is an enum since 3.0, so the (slow) queue of id-statements can be replaced by a fast switch.

Also some minor comments and a missing & 0xFF when casting byte to int.
  
> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3065:
---------------------------------------

    Attachment: LUCENE-3065.patch

Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's
NumericUtils, and fixed FieldsWriter/Reader to use free bits in the
field's flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField
back, when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like
you did before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr
that must now be fixed to handle the fact that a field can come back
as NumericField?  Anyone know where...?

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029903#comment-13029903 ] 

Robert Muir commented on LUCENE-3065:
-------------------------------------

{quote}
I just add this TODO here:
Don't forget to add a new 3.1 index format to TestBackwardsCompatibility!
{quote}

Can we also update the description of the bits in fileformats.html?

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment:     (was: LUCENE-3065-solr-only.patch)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029908#comment-13029908 ] 

Robert Muir commented on LUCENE-3065:
-------------------------------------

ahh sorry I missed that. patch looks good to me though!

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029412#comment-13029412 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField back, when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that must now be fixed to handle the fact that a field can come back as NumericField?  Anyone know where...?)


> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065-solr-only.patch

Here a first step in cutover of Solr to NumericField. Most tests work, except:
- TestDistributedSearch, fails with a strange date problem - I have no idea what goes wrong
- TestMoreLikeThis: fails because the returned documents are different than expected. The reason for this is simple: As TrieField's underlying Lucene fields now are NumericField, stringValue() returns something (in contrast, solr's old fields returned null because they were binary). This confuses maybe MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). Maybe we should simply exclude those fields or fix the test (I prefer latter one, because the numerics should also taken into account).

The following changes had to be made:
- Cut over all places in Solr where Field instead of abstract Fieldable is used to Fieldable. This affects some leftover parts in various components (calling Document.getField instead of Document.getFieldable), but mainly SchemaField/FieldType: createField() now returns Fieldable
- TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsitency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal())

If somebody could help with the rest of the solr stuff and maybe test test test! Yonik? Ryan? There may be some itches not covered by tests.

Thanks for help from Solr specialists (I am definitely not one, I am more afraid of the code than I can help)!!!

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065-solr-only.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment:     (was: LUCENE-3065.patch)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

New patch, previous one had a leftover unused constant from Mike's patch.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065-trunk.patch

This is the patch for trunk. Will commit soon!

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065-trunk.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Comment: was deleted

(was: Here a first step in cutover of Solr to NumericField. Most tests work, except:
- TestDistributedSearch, fails with a strange date problem - I have no idea what goes wrong
- TestMoreLikeThis: fails because the returned documents are different than expected. The reason for this is simple: As TrieField's underlying Lucene fields now are NumericField, stringValue() returns something (in contrast, solr's old fields returned null because they were binary). This confuses maybe MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). Maybe we should simply exclude those fields or fix the test (I prefer latter one, because the numerics should also taken into account).

The following changes had to be made:
- Cut over all places in Solr where Field instead of abstract Fieldable is used to Fieldable. This affects some leftover parts in various components (calling Document.getField instead of Document.getFieldable), but mainly SchemaField/FieldType: createField() now returns Fieldable
- TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsitency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal())

If somebody could help with the rest of the solr stuff and maybe test test test! Yonik? Ryan? There may be some itches not covered by tests.

Thanks for help from Solr specialists (I am definitely not one, I am more afraid of the code than I can help)!!!)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028718#comment-13028718 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Just to note: We also need to change the Forrest index format documentation!

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

This patch adds some refactoring because FieldSelectorResult is an enum since 3.0, so the (slow) queue of id-statements can be replaced by a fast switch.

Also some minor comments and a missing & 0xFF when casting byte to int.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment:     (was: SOLR-2497.patch)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in ]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not include the Field into the returned array, if its not instanceof Field? We can add that to documentation, that lazy loaded and numerical fields are not returned.
- I would also like to add a method Document.getNumericValue(s), that returns Number[] or Number like the NumericField one. Like above getField() it can return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under "bugs" - we shold move.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028354#comment-13028354 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Ideally this could be done with the schema-like approach of one of the GSoC projects?

We already discussed about that: We can use the FieldsReader/FieldsWriter type flag (which currently says, binary/text and compressed (unused now)) in the index file format to mark a field as NumericField. In that case, Document.getField() would return the NumericField instance.

For Lucene backwards we should still support creating "text-only" fields.

The new binary format would also be compatible with solr, as on getField, Solr would get a NumericField and can decide using instanceof what to do. Old Solr indexes without the NumericField marker flag would return as byte[], in which case, solr would do the decoding.

For storing on index side, Solr could move to NumericField completely (I dont like the current approach using NumericTokenStream and to/fromInternal wrappers around conventional Field).

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028764#comment-13028764 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 2:44 PM:
---------------------------------------------------------------

I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in LUCENE-2310]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not include the Field into the returned array, if its not instanceof Field? We can add that to documentation, that lazy loaded and numerical fields are not returned.
- I would also like to add a method Document.getNumericValue(s), that returns Number[] or Number like the NumericField one. Like above getField() it can return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under "bugs" - we shold move.

      was (Author: thetaphi):
    I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in ]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not include the Field into the returned array, if its not instanceof Field? We can add that to documentation, that lazy loaded and numerical fields are not returned.
- I would also like to add a method Document.getNumericValue(s), that returns Number[] or Number like the NumericField one. Like above getField() it can return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under "bugs" - we shold move.
  
> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028540#comment-13028540 ] 

Yonik Seeley commented on LUCENE-3065:
--------------------------------------

bq. I then deprecated all the Fieldable methods in Document.

Hmmm, I thought Fieldable was a step forward.  The Field class is the worst of the bunch!

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

More refactoring:
- Now NumericFields also reproduce the indexed/omitNorms/omitTF settings - only precStep cannot be reproduced
- Cut over to int instead of byte, this removes lots of casting in FieldsReader

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Issue Type: Improvement  (was: Bug)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028408#comment-13028408 ] 

Ryan McKinley commented on LUCENE-3065:
---------------------------------------

bq. If so, can you take a crack at it?  Thanks.  Or, we can postpone... not necessary for this initial cutover.

I'll take a crack at it... but I don't think its necessary in the first pass


> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029657#comment-13029657 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

I just add this TODO here:
*Don't forget to add a new 3.1 index format to TestBackwardsCompatibility!*

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028395#comment-13028395 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

{quote}
Spookily, nothing failed in Solr... I assume there's somewhere in Solr
that must now be fixed to handle the fact that a field can come back
as NumericField? Anyone know where...?
{quote}

Thats easy to understand: Solr does not use NumericField at all. It produces a NumericTokenStream and indexes it like any other analyzer. The byte[] field is indexed as a separate Field with only store=true and binary.

This is what I wanted to say with my last comment.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

Updated patch with some improvements:
- NumericField now lazy inits the NumericTokenStream only when tokenStreamValue() is caled for the first time. This speeds up stored fields reading, as the TokenStream is generally not needed in that case.
- I currently dont like the instanceof chains in FieldsWriter and this lazy init code. Maybe NumericField and NumericTokenStream should define an enum type for the value so you can call NumericField.getValueType() - does anybody have a better idea?
- Improved JavaDocs for NumericField to reflect the new stored fields format

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028454#comment-13028454 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/3/11 10:01 PM:
----------------------------------------------------------------

There is still a problem - first the good news:

- If user calls Document.get(field), the returned string is as before, so there is no break at all. The reason is the implementation of NumericField.stringValue(), it returns what the user is used to from 3.0
- If a user calls getFieldable(field) all is fine, too. The only change is that it could return NumericField now. If the user simply calls stringValue() all is identical to 3.0

Problems start with:

- If user calls Document.getField(name) it returns Field (internally it casts the getFieldable()) result to Field. But NumericField does not subclass Field -> ClassCastException. 

How to handle this?

- Maybe change those methods to return AbstractField, but thats a binary break and users will complain, because not everything works as expected
- Make NumericField subclass Field (and Field is unfinalized) - thats a bad idea, because Field has too many methods / members that are out of scope
- Deprecate Document.getField() and make it internally do an instanceof check, if it gets NumericField transform to a backwards-compatible Field? - This method is already broken. If you request Lazy field loading it also throws ClassCastEx (e.g. LUCENE-609).

Not sure how to proceed. Else the patch looks fine. I think simply ignoring LazyField loading is fine, as numeric fields are a maximum of 8 bytes.... Else we would need LazyNumericField :(

      was (Author: thetaphi):
    There is still a problem - first the good news:

- If user calls Document.get(field), the returned string is as before, so there is no break at all. The reason is the implementation of NumericField.stringValue(), it returns what the user is used to from 3.0
- If a user calls getFieldable(field) all is fine, too. The only change is that it not could return NumericField. If the user simply calls stringValue() all is identical to 3.0

Problems start with:

- If user calls Document.getField(name) it returns Field (internally it casts the getFieldable()) result to Field. But NumericField does not subclass Field -> ClassCastException. 

How to handle this?

- Maybe change those methods to return AbstractField, but thats a binary break and users will complain, because not everything works as expected
- Make NumericField subclass Field (and Field is unfinalized) - thats a bad idea, because Field has too many methods / members that are out of scope
- Deprecate Document.getField() and make it internally do an instanceof check, if it gets NumericField transform to a backwards-compatible Field? - This method is already broken. If you request Lazy field loading it also throws ClassCastEx (e.g. LUCENE-609).

Not sure how to proceed. Else the patch looks fine. I think simply ignoring LazyField loading is fine, as numeric fields are a maximum of 8 bytes.... Else we would need LazyNumericField :(
  
> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Attachment: LUCENE-3065.patch

Next iteration:

Reverted changes in Solr (they should come later), Lucene instead uses natively IndexInput and IndexOutput to write/read ints and longs.

Solr's changes are completely unrelated.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028412#comment-13028412 ] 

Yonik Seeley commented on LUCENE-3065:
--------------------------------------

bq. I'll take a crack at it... but I don't think its necessary in the first pass

Should we try to accept both (binary or numeric field coming back) so this isn't a needless index format break, or is there another lucene index format break in the cards soon anyway?

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028398#comment-13028398 ] 

Michael McCandless commented on LUCENE-3065:
--------------------------------------------

{quote}
Thats easy to understand: Solr does not use NumericField at all. It produces a NumericTokenStream and indexes it like any other analyzer. The byte[] field is indexed as a separate Field with only store=true and binary.

This is what I wanted to say with my last comment.
{quote}
Ahhhh, OK.  So, not spooky.

We should eventually fix that; shouldn't Solr just use NumericField instead of doing this encode/decode itself?  Is there some reason...?

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028570#comment-13028570 ] 

Chris Male commented on LUCENE-3065:
------------------------------------

Yeah there is an element of truth to that except I'm not convinced we need to have such a complicated hierarchy (although I've since been thinking about field definitions coming from different sources, so maybe an interface is best).  But yes, Field is a mess and I've been trying to clean that out too.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028416#comment-13028416 ] 

Uwe Schindler commented on LUCENE-3065:
---------------------------------------

Mike: One thing about the bitmask and the 4 values. There is also an issue open to extend NumericField by byte and short. Maybe we should reserve 3 bits instead of 2 for the numeric field type - so 0x70 instead of 0x30 as mask? I just want to reseve this one extra bit, so we dont need to do any dumb masks and values later, if we extend.

About the index format change:
As described above, for Solr it's not a problem. New fields are always indexed using NumericField. On the query side, when Document.getField is called, it could simply check the return value with instanceof. If the getter returns not a NumericField, Solr knows that it's binary and can decode manually. This would safe backwards.

Else its no break at all if we support both stored field formats during indexing somehow (in Lucene its string, returning a String Field or new binary NumericField). The index format itsself does not change generally (no need to bump version numbers, as we only use unused bits?)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3065:
----------------------------------

    Comment: was deleted

(was: MoreLikeThis problem solved, it was as I said. The test included a TrieInt field into the "similarity fields", so it was used to calculate similarity. As with previous Solr the TrieField was invisible to MLT this had no effect.
By the way: There is a commented out part with explicitely the MLT field, but I dont understand it. It seems that it was never understood/supported.
Now, all numeric fields should work with MLT.

Now only the TestDistributedSearch is still failing with a strange date failure. I'll dig.)

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028404#comment-13028404 ] 

Michael McCandless commented on LUCENE-3065:
--------------------------------------------

Uwe: I agree, I'll use BytesRef in trunk.

Ryan: OK.  Should we try to fix that w/ this issue?  If so, can you take a crack at it?  Thanks.  Or, we can postpone... not necessary for this initial cutover.

> NumericField should be stored in binary format in index (matching Solr's format)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3065
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3065.patch
>
>
> (Spinoff of LUCENE-3001)
> Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an "ordinary" Field and your number has turned into a string.  See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
> We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format.
> A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org