You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Nattapong Sirilappanich (JIRA)" <ji...@apache.org> on 2012/06/28 13:51:46 UTC

[jira] [Created] (LUCENE-4176) Can not produce proper collation key for ICUCollatedTermAttributeImp

Nattapong Sirilappanich created LUCENE-4176:
-----------------------------------------------

             Summary: Can not produce proper collation key for ICUCollatedTermAttributeImp
                 Key: LUCENE-4176
                 URL: https://issues.apache.org/jira/browse/LUCENE-4176
             Project: Lucene - Java
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 5.0
            Reporter: Nattapong Sirilappanich


org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
The given hash value produce incorrect comparison result.
The source code below return 1 for Lucene 3.6.
The code here return 0.
Code to reproduce:

IndexWriter writer = new IndexWriter(ramDir, conf);
Document doc = new Document();
FieldType fieldType = new FieldType();
fieldType.setIndexed(true);
fieldType.setStored(true);
Field field = new Field("content","เข", fieldType);
doc.add(field);
writer.addDocument(doc);
writer.close();
IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);

ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
System.out.println(result.length);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (LUCENE-4176) Can not produce proper collation key for ICUCollatedTermAttributeImp

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4176:
--------------------------------

    Attachment: LUCENE-4176.patch

ok attached is a patch fixing the QP bug with your test.

There was a bug in your test as well: it doesnt actually analyze the terms because because it doesnt set fieldType.setTokenized(true).

This is separately a huge trap. I'll open another issue for that.
                
> Can not produce proper collation key for ICUCollatedTermAttributeImp
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4176
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4176
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/queryparser
>    Affects Versions: 5.0
>            Reporter: Nattapong Sirilappanich
>         Attachments: LUCENE-4176.patch, LUCENE-4176.patch
>
>
> org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
> The given hash value produce incorrect comparison result.
> The source code below return 1 for Lucene 3.6.
> The code here return 0.
> Code to reproduce:
> IndexWriter writer = new IndexWriter(ramDir, conf);
> Document doc = new Document();
> FieldType fieldType = new FieldType();
> fieldType.setIndexed(true);
> fieldType.setStored(true);
> Field field = new Field("content","เข", fieldType);
> doc.add(field);
> writer.addDocument(doc);
> writer.close();
> IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
> QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);
> ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
> System.out.println(result.length);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (LUCENE-4176) Can not produce proper collation key for ICUCollatedTermAttributeImp

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4176:
--------------------------------

    Attachment: LUCENE-4176.patch

untested patch.
                
> Can not produce proper collation key for ICUCollatedTermAttributeImp
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4176
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4176
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/queryparser
>    Affects Versions: 5.0
>            Reporter: Nattapong Sirilappanich
>         Attachments: LUCENE-4176.patch
>
>
> org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
> The given hash value produce incorrect comparison result.
> The source code below return 1 for Lucene 3.6.
> The code here return 0.
> Code to reproduce:
> IndexWriter writer = new IndexWriter(ramDir, conf);
> Document doc = new Document();
> FieldType fieldType = new FieldType();
> fieldType.setIndexed(true);
> fieldType.setStored(true);
> Field field = new Field("content","เข", fieldType);
> doc.add(field);
> writer.addDocument(doc);
> writer.close();
> IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
> QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);
> ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
> System.out.println(result.length);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (LUCENE-4176) Can not produce proper collation key for ICUCollatedTermAttributeImp

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4176:
--------------------------------

    Component/s:     (was: modules/analysis)
                 modules/queryparser
    
> Can not produce proper collation key for ICUCollatedTermAttributeImp
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4176
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4176
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/queryparser
>    Affects Versions: 5.0
>            Reporter: Nattapong Sirilappanich
>
> org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
> The given hash value produce incorrect comparison result.
> The source code below return 1 for Lucene 3.6.
> The code here return 0.
> Code to reproduce:
> IndexWriter writer = new IndexWriter(ramDir, conf);
> Document doc = new Document();
> FieldType fieldType = new FieldType();
> fieldType.setIndexed(true);
> fieldType.setStored(true);
> Field field = new Field("content","เข", fieldType);
> doc.add(field);
> writer.addDocument(doc);
> writer.close();
> IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
> QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);
> ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
> System.out.println(result.length);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4176) Can not produce proper collation key for ICUCollatedTermAttributeImp

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403038#comment-13403038 ] 

Robert Muir commented on LUCENE-4176:
-------------------------------------

Thanks for reporting this: the bug is actually AnalyzingQueryParser. it should not consume with CharTermAttribute.toString(), instead it should just consume the bytes.
                
> Can not produce proper collation key for ICUCollatedTermAttributeImp
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4176
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4176
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/queryparser
>    Affects Versions: 5.0
>            Reporter: Nattapong Sirilappanich
>
> org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
> The given hash value produce incorrect comparison result.
> The source code below return 1 for Lucene 3.6.
> The code here return 0.
> Code to reproduce:
> IndexWriter writer = new IndexWriter(ramDir, conf);
> Document doc = new Document();
> FieldType fieldType = new FieldType();
> fieldType.setIndexed(true);
> fieldType.setStored(true);
> Field field = new Field("content","เข", fieldType);
> doc.add(field);
> writer.addDocument(doc);
> writer.close();
> IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
> QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);
> ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
> System.out.println(result.length);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Resolved] (LUCENE-4176) Can not produce proper collation key for ICUCollatedTermAttributeImp

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-4176.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 5.0
                   4.0

Thanks for reporting this: I committed the fix to AnalyzingQueryParser.

But until LUCENE-4178 is resolved, be sure you setTokenized(true) in your fieldtype!
                
> Can not produce proper collation key for ICUCollatedTermAttributeImp
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4176
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4176
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/queryparser
>    Affects Versions: 5.0
>            Reporter: Nattapong Sirilappanich
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-4176.patch, LUCENE-4176.patch
>
>
> org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
> The given hash value produce incorrect comparison result.
> The source code below return 1 for Lucene 3.6.
> The code here return 0.
> Code to reproduce:
> IndexWriter writer = new IndexWriter(ramDir, conf);
> Document doc = new Document();
> FieldType fieldType = new FieldType();
> fieldType.setIndexed(true);
> fieldType.setStored(true);
> Field field = new Field("content","เข", fieldType);
> doc.add(field);
> writer.addDocument(doc);
> writer.close();
> IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
> QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);
> ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
> System.out.println(result.length);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-4176) Can not produce proper collation key for ICUCollatedTermAttributeImp

Posted by "Nattapong Sirilappanich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403675#comment-13403675 ] 

Nattapong Sirilappanich commented on LUCENE-4176:
-------------------------------------------------

Thanks for the fix and sorry for any confusions.
                
> Can not produce proper collation key for ICUCollatedTermAttributeImp
> --------------------------------------------------------------------
>
>                 Key: LUCENE-4176
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4176
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/queryparser
>    Affects Versions: 5.0
>            Reporter: Nattapong Sirilappanich
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-4176.patch, LUCENE-4176.patch
>
>
> org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
> The given hash value produce incorrect comparison result.
> The source code below return 1 for Lucene 3.6.
> The code here return 0.
> Code to reproduce:
> IndexWriter writer = new IndexWriter(ramDir, conf);
> Document doc = new Document();
> FieldType fieldType = new FieldType();
> fieldType.setIndexed(true);
> fieldType.setStored(true);
> Field field = new Field("content","เข", fieldType);
> doc.add(field);
> writer.addDocument(doc);
> writer.close();
> IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
> QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);
> ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
> System.out.println(result.length);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org