You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2009/06/15 19:46:07 UTC

[jira] Created: (LUCENE-1692) Contrib analyzers need tests

Contrib analyzers need tests
----------------------------

                 Key: LUCENE-1692
                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
             Project: Lucene - Java
          Issue Type: Test
          Components: contrib/analyzers
            Reporter: Robert Muir


The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.

This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721888#action_12721888 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

michael, I updated my svn and I think you might have missed some of the tests.

there are tests in the patch for BrazilianAnalyzer, CzechAnalyzer, and DutchAnalyzer... (these are new directories, maybe that is why?)

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720154#action_12720154 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

Michael: LUCENE-973 would save me from having to create tests for the CJKAnalyzer.

It would also fix a bug.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721462#action_12721462 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

Me too :)  Robert can you cons up a patch?  Which files can be safely removed from the DutchAnalyzer?  (stems/words.txt?)

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721457#action_12721457 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

Michael, I think it would be nice to fix the Thai offset bug, so highlighter will work. this is a safe one-line fix and its an obvious error.

The SmartChineseAnalyzer empty token bug is pretty serious, i think indexing empty tokens for every piece of punctuation could really hurt similarity computation (am i wrong, never tried?)

The Thai .type() bug is something that could be fixed later, i don't think the token type being ALPHANUM versus NUM is really hurting anyone.

The issue where DutchAnalyzer doesnt do what it claims, i think thats not really hurting anyone, and they can use the snowball version if they want accurate snowball behavior.
I do think the huge files in DutchAnalyzer that aren't being used can be removed if you want to save 1MB, but I'm not sure how important that is.

Let me know your thoughts. 

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721418#action_12721418 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

michael: I'm think I'm done here.

if you consider any of the bugs important just let me know, can try to help get them fixed.


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721469#action_12721469 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

Probably eclipse isn't running with asserts?

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721910#action_12721910 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

OK I committed them.  Thanks for catching this Robert!

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721451#action_12721451 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

bq. michael: I'm think I'm done here.

OK I'll review.  Thanks!!

bq. if you consider any of the bugs important just let me know, can try to help get them fixed.

Likely I won't be able to judge the severity of these bugs... so please chime in if you think they should be fixed...

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692.txt

answered my own question, here's tests for brazilian as a start.


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>         Attachments: LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721461#action_12721461 ] 

Mark Miller commented on LUCENE-1692:
-------------------------------------

heh -

+1 on fixing them all. Including reclaiming that 1 mb of space if we can ...

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721475#action_12721475 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

probably, fixed it and testing with ant now. ill upload it at least so you can verify the behavior i've discovered.

do you want me to include patch with the two bugfixes (chinese empty token and thai offsets), or give you something separate for those?

for the other 2 bugs:
fixing the Thai tokentype bug, well its really a bug in the standardtokenizer grammar. i wasn't sure you wanted to change that at this moment, but if you want it fixed let me know!
in my opinion: fix for DutchAnalyzer is to deprecate/remove the contrib completely, since it claims to do snowball stemming, why shouldnt someone just use the Dutch snowball stemmer from the contrib/snowball package!

  


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692_patch2.txt

patch with a couple addtl tests for contrib/analysis, with some javadocs cleanup and wording.
there is also fix to the synonyms test to actually test its reset() ...
no code changes though.


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692_patch2.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721711#action_12721711 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

Latest patch looks good Robert, thanks!

Deprecating DutchAnalyzer (in favor of Snowball) makes sense to me -- any objections out there?

(And I'll "svn rm" the two large & unused files).

Robert, could you open a new issue for the Thai token type bug (that requires a change to StandardTokenizer's grammar)?  We seem to be accumulating a number of these "fix StandardTokeninizer's grammar" but we don't have a good way to do this back-compatibly... matchVersion is a good way for the user to express compatibility requirement, but we don't know how to [cleanly] switch on that to different grammar variants.

Is that the only issue not addressed by the latest patch?

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692.txt

added tests for czech.
added additional tests for smartchineseanalyzer, there is a bug very similar to the recent CJK one here... generating empty tokens.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692.txt

patch with the two one-line fixes:
1. fix offsets for thai analyzer so highlighting, etc will work.
2. use stopwords list by default for smartchineseanalyzer so punctuation isn't indexed in a strange way.

i updated the testcases to reflect these.




> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692_patch2.txt

correct missing cjk test. 

if no one objects i would like to commit these javadocs and tests tomorrow.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692_patch2.txt, LUCENE-1692_patch2.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1692:
------------------------------------------

    Assignee: Michael McCandless

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719799#action_12719799 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

first I looked at BrazilianAnalyzer... out of curiousity can someone explain to me how the behavior of BrazilianStemmer differs from the Portuguese snowball analyzer... because it looks to be the same algorithm to me!


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720696#action_12720696 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

michael, ok. I know additional tests here (against the old api) might be more code to convert, but I think it will actually make the process easier, whenever that is or whatever is involved.

i have some time this evening to try to improve the coverage here (against the old api).


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720144#action_12720144 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

These are much needed... thanks Robert.  Let me know when you're done iterating (and/or when we need to wrap up 2.9) and we'll get these in.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721463#action_12721463 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

michael, i guess junit from my eclipse != junit from ant, because it passes in eclipse...annoying

I will fix the test so it runs correctly from ant.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1692.
----------------------------------------

    Resolution: Fixed

Thanks Robert!

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692.txt

patch with new testcase demonstrating the chinese behavior.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721504#action_12721504 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

ok got it,

the IDEOGRAPHIC FULL STOP is being converted into a comma token by the tokenizer.
if you use the default constructor: SmartChineseAnalyzer(), it won't load the default stopwords list, such as from my Luke screenshot.
if you instead instantiate it like this: SmartChineseAnalyzer(true), then it loads the default stopwords list.
the default stopwords list includes things like comma, so it ends out getting removed.

maybe its not a bug, but this is really non-obvious behavior...!


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721824#action_12721824 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

OK I will commit this soon.  Thanks Robert!

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1692:
---------------------------------------

    Fix Version/s: 2.9

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721906#action_12721906 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

Duh, I forgot to svn add them!  Sorry.  I'm glad you caught that.  I'm really wanting "svn patch"....

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-1692.
---------------------------------

    Resolution: Fixed

Committed revision 805400.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692_patch2.txt, LUCENE-1692_patch2.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reopened LUCENE-1692:
---------------------------------

         Assignee: Robert Muir  (was: Michael McCandless)
    Lucene Fields: [New, Patch Available]  (was: [New])

if possible, i think these might be good to add for the release.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692_patch2.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720372#action_12720372 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

thanks, i'll upload some more tests hopefully soon. I think most have rudimentary tests.

but some are not sufficient to ensure any api conversion is really working.

for example ThaiAnalyzer does not have any offset tests, but if that broke then highlighting would break.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721512#action_12721512 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

later tonight i can workup a patch to address the thai offset issue and at least javadoc'ing the chinese behavior.

if you think the addt'l 2 issues [thai tokentype, dutchanalyzer behavior/huge files] should be fixed or documented in some way, please let me know.


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692.txt

michael: here is an updated patch.

i removed that chinese test, there's something strange going on here [see my screenshot] but i can't seem to create a test case to show it!


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692.txt

add tests for dutchanalyzer.

this analyzer claims to implement snowball, although tests reveal some differences. it also has about 1MB of text files that don't appear to be in use at all...

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: LUCENE-1692.txt

adds tests for thaianalyzer token offsets and types, both of which have bugs!
tests for correct behavior are included but commented out.


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721466#action_12721466 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

michael: yes the stems/words.txt

for stems.txt/words.txt: I am scratching my head trying to figure out what they were originally intended to do. If its to support dictionary stemming with wordlistloader, then it really needs to be one tab-separated file, not two files.

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721460#action_12721460 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

I'm seeing this test failure:
{code}
    [junit] Testcase: testBuggyPunctuation(org.apache.lucene.analysis.cn.TestSmartChineseAnalyzer):	Caused an ERROR
    [junit] null
    [junit] java.lang.AssertionError
    [junit] 	at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:240)
    [junit] 	at org.apache.lucene.analysis.cn.TestSmartChineseAnalyzer.testBuggyPunctuation(TestSmartChineseAnalyzer.java:51)
{code}

It's because null is being passed to ts.next in the final assertTrue line:

{code}
    nt = ts.next(nt);
    while (nt != null) {
      assertEquals(result[i], nt.term());
      i++;
      nt = ts.next(nt);
    }
    assertTrue(ts.next(nt) == null);
{code}

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721790#action_12721790 ] 

Robert Muir commented on LUCENE-1692:
-------------------------------------

michael, yes the only issue... i'll open another issue for the thai token type.


> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1692) Contrib analyzers need tests

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720570#action_12720570 ] 

Michael McCandless commented on LUCENE-1692:
--------------------------------------------

Robert, you should probably also hold up on API conversion, since the API itself is now changing (LUCENE-1693).

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1692) Contrib analyzers need tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1692:
--------------------------------

    Attachment: example.jpg

Having trouble figuring this one out

> Contrib analyzers need tests
> ----------------------------
>
>                 Key: LUCENE-1692
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1692
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: example.jpg, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
>
>
> The analyzers in contrib need tests, preferably ones that test the behavior of all the Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org