You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (Created) (JIRA)" <ji...@apache.org> on 2011/12/16 23:14:31 UTC

[jira] [Created] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

TrieField.isTokenized returns true regardless of precisionStep
--------------------------------------------------------------

                 Key: SOLR-2976
                 URL: https://issues.apache.org/jira/browse/SOLR-2976
             Project: Solr
          Issue Type: Bug
    Affects Versions: 3.5
            Reporter: Hoss Man


regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...

{code}
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
<field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
{code}

...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.

we should consider redefining TrieField.isTokenized to be something like...

{code}
@Override
public boolean isTokenized() {
  return Integer.MAX_VALUE != precisionStep;
}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171286#comment-13171286 ] 

Yonik Seeley commented on SOLR-2976:
------------------------------------

bq. precisionStep="infinite"

Heh.  Try explaining that one to users ;-)

"0 disables the more-tokens-for-faster-range-queries feature" seems pretty understandable to most people.

                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171259#comment-13171259 ] 

Hoss Man commented on SOLR-2976:
--------------------------------

FYI: doing some code skimming the current implications of this are:

* QEC will unneccessarily fail to work if your uniqueKey is a precisionStep=0 TrieField
* stats.facet will mistakenly refuse to facet on a multiValued=false precisionStep=0 TrieField

Related thread: http://www.mail-archive.com/solr-user@lucene.apache.org/msg60073.html
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171290#comment-13171290 ] 

Uwe Schindler commented on SOLR-2976:
-------------------------------------

I just prefer a non-numeric. And even *LucidImagination-people* dont understand this (I had a discussion with one of your employees who did not know). When I explained it to him what precision step means, he said:

- Document it in the schema / Lucene NRQ javadocs
- Document it in the Wiki
- *rename the 0* as it makes no sense
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-2976) stats.facet no longer works on single valued trie fields that don't use precision step

Posted by "Hoss Man (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man reassigned SOLR-2976:
------------------------------

    Assignee: Ryan McKinley

i haven't had any more time to try and make sense of this, and don't anticipate doing so in the near future.

giving to ryan since he worked on SOLR-1023 in the hopes that it's something he understands and can help bang out a fix for easily.
                
> stats.facet no longer works on single valued trie fields that don't use precision step
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>            Assignee: Ryan McKinley
>         Attachments: SOLR-2976.patch, SOLR-2976_3.4_test.patch
>
>
> As reported on the mailing list, 3.5 introduced a regression that prevents single valued Trie fields that don't use precision steps (to add course grained terms) from being used in stats.facet.
> two immediately obvious problems...
> 1) in 3.5 the stats component is checking if isTokenzed() is true for the field type (which is probably wise) but regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true
> 2) the 3.5 stats faceting will fail if the FieldType is multivalued - it doesn't check if the SchemaField is configured to be single valued (overriding the FieldType)
> so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...stats.facet will not work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171303#comment-13171303 ] 

Yonik Seeley commented on SOLR-2976:
------------------------------------

No need to apologize for disagreeing, but I still think "0" is fine.

And if we're pushing for consistency, perhaps Lucene should change to something more easily understood as "disable this feature".
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171269#comment-13171269 ] 

Yonik Seeley commented on SOLR-2976:
------------------------------------

IIRC the meaning of isTokenized was taken from lucene long ago:  "True if this field's value should be analyzed".
Looking at the current uses of isTokenized in Solr, it's been a bit abused and actually may no longer be needed.

                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171292#comment-13171292 ] 

Hoss Man commented on SOLR-2976:
--------------------------------

bq. in general the precisionStep is somehow inconsistent between Solr and Lucene

it's not inconsistent, Solr's TrieField uses Integer.MAX_VALUE correctly, it just happily accepts config values <=0 as being equivalent to specifying Integer.MAX_VALUE  (the javadocs for TrieField don't even say you can specify "0" ... they say "Note that if you use a precisionStep of 32 for int/float and 64 for long/double/date, then multiple terms will not be generated") .. if you want to add yet another symbolic constant for Integer.MAX_VALUE i'm fine with that, but please open a new issue -- it's totally orthogonal to what we're talking about here.
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171277#comment-13171277 ] 

Uwe Schindler commented on SOLR-2976:
-------------------------------------

bq. IIRC the meaning of isTokenized was taken from lucene long ago: "True if this field's value should be analyzed". Looking at the current uses of isTokenized in Solr, it's been a bit abused and actually may no longer be needed.

It is often used in solr as "multiValued", which is a separate property of a field. +1 to remove is Tokenized (especially, as Lucene no longer differentiates between tokenized and not tokenized. Every field in Lucene trunk has a TokenStream/AttributeSource, although it returns only one token.
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171310#comment-13171310 ] 

Hoss Man commented on SOLR-2976:
--------------------------------

Seriously guys: start a new fucking issue if you care so much, and debate the optimal API/docs/sample configs for precisionStep there.

whether a new symbolic constant is added really has *ZERO* bearing on _this_ issue, which is about whether or not TrieField.isTokenized() is broken.

(this is what the "related issues" Jira link type is for)
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2976) stats.facet no longer works on single valued trie fields that don't use precision step

Posted by "Hoss Man (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-2976:
---------------------------

    Attachment: SOLR-2976.patch
                SOLR-2976_3.4_test.patch

SOLR-2976_3.4_test.patch is a simple test patch against 3.4 showing the basics of what use to work when trying to do stats faceting on trie fields.  If you apply this patch to 3.5 or the 3x branch (requires massaging as the line numbers have changed heavily) you'll see the test fail.

SOLR-2976.patch shows my attempt at fixing some of these problems on trunk...

1) fix TrieField.isTokenized to be based on precision step
2) test TrieField.isTokenized
3) fix StatsComponent to look at the SchemaField not just the FieldType
4) make StatsComponentTest give better errors
5) make StatsComponentTest try to use stats.facet on a trie field with one term per value

But doing this has exposed a new bug i don't fully understand yet: Test now throws an NFE that seems to be coming from the code for generating the stat's facets on a trie field -- but it is dependent on which field type we are generating stats over.  If the stats are against a trie field, then the faceting on a trie field fails -- but if the stats are on a simple numeric, then the faceting on a trie field passes.  

Need to wade into this more later...

{code}
    [junit] Testcase: testStats(org.apache.solr.handler.component.StatsComponentTest):	Caused an ERROR
    [junit] exception with field: stats_ti
    [junit] java.lang.RuntimeException: exception with field: stats_ti
    [junit] 	at org.apache.solr.handler.component.StatsComponentTest.testStats(StatsComponentTest.java:68)
    [junit] 	at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:528)
    [junit] 	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
    [junit] 	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
    [junit] Caused by: java.lang.RuntimeException: Exception during query
    [junit] 	at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:267)
    [junit] 	at org.apache.solr.handler.component.StatsComponentTest.doTestFacetStatisticsResult(StatsComponentTest.java:275)
    [junit] 	at org.apache.solr.handler.component.StatsComponentTest.testStats(StatsComponentTest.java:65)
    [junit] Caused by: java.lang.NumberFormatException: For input string: "N"
    [junit] 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    [junit] 	at java.lang.Integer.parseInt(Integer.java:449)
    [junit] 	at java.lang.Integer.parseInt(Integer.java:499)
    [junit] 	at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:303)
    [junit] 	at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:294)
    [junit] 	at org.apache.solr.schema.TrieField.toInternal(TrieField.java:324)
    [junit] 	at org.apache.solr.request.UnInvertedField.getStats(UnInvertedField.java:609)
    [junit] 	at org.apache.solr.handler.component.SimpleStats.getStatsFields(StatsComponent.java:235)
    [junit] 	at org.apache.solr.handler.component.SimpleStats.getStatsCounts(StatsComponent.java:211)
    [junit] 	at org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:70)

{code}

                
> stats.facet no longer works on single valued trie fields that don't use precision step
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>         Attachments: SOLR-2976.patch, SOLR-2976_3.4_test.patch
>
>
> As reported on the mailing list, 3.5 introduced a regression that prevents single valued Trie fields that don't use precision steps (to add course grained terms) from being used in stats.facet.
> two immediately obvious problems...
> 1) in 3.5 the stats component is checking if isTokenzed() is true for the field type (which is probably wise) but regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true
> 2) the 3.5 stats faceting will fail if the FieldType is multivalued - it doesn't check if the SchemaField is configured to be single valued (overriding the FieldType)
> so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...stats.facet will not work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2976) stats.facet no longer works on single valued trie fields that don't use precision step

Posted by "Hoss Man (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-2976:
---------------------------

    Description: 
As reported on the mailing list, 3.5 introduced a regression that prevents single valued Trie fields that don't use precision steps (to add course grained terms) from being used in stats.facet.

two immediately obvious problems...

1) in 3.5 the stats component is checking if isTokenzed() is true for the field type (which is probably wise) but regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true

2) the 3.5 stats faceting will fail if the FieldType is multivalued - it doesn't check if the SchemaField is configured to be single valued (overriding the FieldType)


so even if a user has something like this in their schema...

{code}
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
<field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
{code}

...stats.facet will not work.



  was:
regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...

{code}
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
<field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
{code}

...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.

we should consider redefining TrieField.isTokenized to be something like...

{code}
@Override
public boolean isTokenized() {
  return Integer.MAX_VALUE != precisionStep;
}
{code}

        Summary: stats.facet no longer works on single valued trie fields that don't use precision step  (was: TrieField.isTokenized returns true regardless of precisionStep)

I started looking into this today and realized there are additional problems with the stats faceting code changed in 3.5 as it relates to tried fields and the original problem report.  Updating the summary/description to expand the scope
                
> stats.facet no longer works on single valued trie fields that don't use precision step
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> As reported on the mailing list, 3.5 introduced a regression that prevents single valued Trie fields that don't use precision steps (to add course grained terms) from being used in stats.facet.
> two immediately obvious problems...
> 1) in 3.5 the stats component is checking if isTokenzed() is true for the field type (which is probably wise) but regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true
> 2) the 3.5 stats faceting will fail if the FieldType is multivalued - it doesn't check if the SchemaField is configured to be single valued (overriding the FieldType)
> so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...stats.facet will not work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171295#comment-13171295 ] 

Uwe Schindler commented on SOLR-2976:
-------------------------------------

Sorry Hoss, this annoys me since long time and this issue seemed to be the right place to complain about precStep==0, which makes no sense (sorry, Yonik).
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171274#comment-13171274 ] 

Uwe Schindler commented on SOLR-2976:
-------------------------------------

Hi Hoss,

in general the precisionStep is somehow inconsistent between Solr and Lucene. The problem is that precisionStep==0 is not defined at all. The minimium precision step in Lucene is 1 and means lot's of terms per distinct value. What Solr defines as precisionStep 0 is in Lucene everything >= 64 (for longs) or >= 32 for ints.

In general it is confusing that we have two precSteps. I would prefer it in this issue to clean this up and make the solr schema simply allow a symbolic constant for the precision step (as 0 makes no sense and infinite is not a valid number in Integer.valueOf). How about precisionStep="infinite", because that would be consistent with Lucene. For backwards compatibility, 0 could still be supported, but Lucene throws IAE.
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171307#comment-13171307 ] 

Uwe Schindler commented on SOLR-2976:
-------------------------------------

I would agree to also use a constant in Lucene (that maps internally to Integer.MAX_VALUE).
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2976) TrieField.isTokenized returns true regardless of precisionStep

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171288#comment-13171288 ] 

Hoss Man commented on SOLR-2976:
--------------------------------

bq. it's been a bit abused and actually may no longer be needed.

Good point ... other then the two uses i mentioned, i think LukeRequestHandler is the only other place (outside of FieldType) in Solr that even cares about FieldType.isTokenized()

(other things internally to FieldType care about the TOKENIZED property, but even that isn't used by much other then TextField)
                
> TrieField.isTokenized returns true regardless of precisionStep
> --------------------------------------------------------------
>
>                 Key: SOLR-2976
>                 URL: https://issues.apache.org/jira/browse/SOLR-2976
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Hoss Man
>
> regardless of the precisionStep used, TrieField.isTokenized is hardcoded to return true -- so even if a user has something like this in their schema...
> {code}
> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" />
> <field name="ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
> {code}
> ...any code paths that are driven by isTokenized will think their may be multiple terms per document when in reality there is at most one.
> we should consider redefining TrieField.isTokenized to be something like...
> {code}
> @Override
> public boolean isTokenized() {
>   return Integer.MAX_VALUE != precisionStep;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org