You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Lance Norskog (JIRA)" <ji...@apache.org> on 2010/02/04 06:16:27 UTC

[jira] Created: (SOLR-1754) Legacy numeric types do not check input for bad syntax

Legacy numeric types do not check input for bad syntax
------------------------------------------------------

                 Key: SOLR-1754
                 URL: https://issues.apache.org/jira/browse/SOLR-1754
             Project: Solr
          Issue Type: Bug
    Affects Versions: 1.4
            Reporter: Lance Norskog
             Fix For: 1.5


The legacy numeric types do not check their input values for valid input. A text string is accepted as input for any of these types: IntField, LongField, FloatField, DoubleField. DateField checks its input.

In general this is a no-fix, except: that IntField is a necessary memory type because it cuts memory use in sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1754) Legacy numeric types do not check input for bad syntax

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829769#action_12829769 ] 

Hoss Man commented on SOLR-1754:
--------------------------------

The reason we never explicitly tested the input value was for speed -- if the user says it's an int we trust them. The only places any FieldTypes explicitly validate the input strings (ie: SortableIntField, DateField, etc..) is when they get it free as a side effect of conversion (in DateField's case: even though we index the raw string, we have to parse it anyway looking for DateMath)

Is there really any memory efficiency from IntField that can't be achieved with an appropriate precisionStep on TrieIntField?

> Legacy numeric types do not check input for bad syntax
> ------------------------------------------------------
>
>                 Key: SOLR-1754
>                 URL: https://issues.apache.org/jira/browse/SOLR-1754
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Lance Norskog
>             Fix For: 1.5
>
>
> The legacy numeric types do not check their input values for valid input. A text string is accepted as input for any of these types: IntField, LongField, FloatField, DoubleField. DateField checks its input.
> In general this is a no-fix, except: that IntField is a necessary memory type because it cuts memory use in sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1754) Legacy numeric types do not check input for bad syntax

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lance Norskog resolved SOLR-1754.
---------------------------------

    Resolution: Not A Problem

After futher discussion, this is fine as it is, given that IntField is obsolete and the new types check their inputs.

> Legacy numeric types do not check input for bad syntax
> ------------------------------------------------------
>
>                 Key: SOLR-1754
>                 URL: https://issues.apache.org/jira/browse/SOLR-1754
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Lance Norskog
>             Fix For: 1.5
>
>
> The legacy numeric types do not check their input values for valid input. A text string is accepted as input for any of these types: IntField, LongField, FloatField, DoubleField. DateField checks its input.
> In general this is a no-fix, except: that IntField is a necessary memory type because it cuts memory use in sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1754) Legacy numeric types do not check input for bad syntax

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829902#action_12829902 ] 

Hoss Man commented on SOLR-1754:
--------------------------------

The second array you are talking about only exists if you use the StringIndex based FieldCache.

The TrieField subclasses all use the raw primitive FieldCache types, they just use a special parser to decode the Trie value into the raw primitive value ... take a look at o.a.s.schema.TrieField.getSortField.

If you look at stats.jsp you can see which FieldCaches are loaded for each field, and verify that all the TreidIntField's you sort on are using a primitive int[], and not a StringIndex.

> Legacy numeric types do not check input for bad syntax
> ------------------------------------------------------
>
>                 Key: SOLR-1754
>                 URL: https://issues.apache.org/jira/browse/SOLR-1754
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Lance Norskog
>             Fix For: 1.5
>
>
> The legacy numeric types do not check their input values for valid input. A text string is accepted as input for any of these types: IntField, LongField, FloatField, DoubleField. DateField checks its input.
> In general this is a no-fix, except: that IntField is a necessary memory type because it cuts memory use in sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1754) Legacy numeric types do not check input for bad syntax

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829906#action_12829906 ] 

Lance Norskog commented on SOLR-1754:
-------------------------------------

Aha! Thank you very much. 

> Legacy numeric types do not check input for bad syntax
> ------------------------------------------------------
>
>                 Key: SOLR-1754
>                 URL: https://issues.apache.org/jira/browse/SOLR-1754
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Lance Norskog
>             Fix For: 1.5
>
>
> The legacy numeric types do not check their input values for valid input. A text string is accepted as input for any of these types: IntField, LongField, FloatField, DoubleField. DateField checks its input.
> In general this is a no-fix, except: that IntField is a necessary memory type because it cuts memory use in sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1754) Legacy numeric types do not check input for bad syntax

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829812#action_12829812 ] 

Yonik Seeley commented on SOLR-1754:
------------------------------------

bq. Is there really any memory efficiency from IntField that can't be achieved with an appropriate precisionStep on TrieIntField?

Nope... sorting on both will be equivalent, regardless of what the precisionStep is.


> Legacy numeric types do not check input for bad syntax
> ------------------------------------------------------
>
>                 Key: SOLR-1754
>                 URL: https://issues.apache.org/jira/browse/SOLR-1754
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Lance Norskog
>             Fix For: 1.5
>
>
> The legacy numeric types do not check their input values for valid input. A text string is accepted as input for any of these types: IntField, LongField, FloatField, DoubleField. DateField checks its input.
> In general this is a no-fix, except: that IntField is a necessary memory type because it cuts memory use in sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1754) Legacy numeric types do not check input for bad syntax

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829868#action_12829868 ] 

Lance Norskog commented on SOLR-1754:
-------------------------------------

I thought that this is how sorting works:

An array of java ints, 4 bytes apiece, is allocated with one for every document in the index. The ints are set to each successive docid.

A separate array of objects is allocated, one entry for every term in the field. This array is sorted by the term value. There is other data hanging off this that we will not discuss.

My understanding was that if the field type is a Java int, the second array is not created, and only the first is needed.  And that the Solr IntField creates this type, and so if the field is a Solr IntField sorting requires less memory because it does not make the second array. 

If the field is some other type, like a TrieField, sorting on that field cannot possibly use the same amount of memory as sorting on a Java int field. Clearly something about this is wrong. Please set me straight.

> Legacy numeric types do not check input for bad syntax
> ------------------------------------------------------
>
>                 Key: SOLR-1754
>                 URL: https://issues.apache.org/jira/browse/SOLR-1754
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Lance Norskog
>             Fix For: 1.5
>
>
> The legacy numeric types do not check their input values for valid input. A text string is accepted as input for any of these types: IntField, LongField, FloatField, DoubleField. DateField checks its input.
> In general this is a no-fix, except: that IntField is a necessary memory type because it cuts memory use in sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.