You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Created) (JIRA)" <ji...@apache.org> on 2012/03/15 18:21:37 UTC

[jira] [Created] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

tie MockGraphTokenFilter into all analyzers tests
-------------------------------------------------

                 Key: LUCENE-3873
                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
             Project: Lucene - Java
          Issue Type: Task
          Components: modules/analysis
            Reporter: Robert Muir


Mike made a MockGraphTokenFilter on LUCENE-3848.

Many filters currently arent tested with anything but a simple tokenstream.
we should test them with this, too, it might find bugs (zero-length terms,
stacked terms/synonyms, etc)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238422#comment-13238422 ] 

Michael McCandless commented on LUCENE-3873:
--------------------------------------------

I agree we can use it in specific places for starters...

The patch on LUCENE-3848 mixes in "TokenStream to Automaton" and MockGraphTokenFilter; I'll split that apart and only commit MockGraphTokenFilter here.

One problem is... MockGraphTokenFilter isn't setting offsets currently.... I think to do this "correctly" it needs to buffer up pending input tokens, until it's reached the posLength it wants to output for a random token, and then set the offset accordingly.
                
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3873:
---------------------------------------

    Attachment: LUCENE-3873.patch

Patch... I think it's close, but there are still some nocommits...

I had to rework the original MockGraphTokenFilter to sometimes buffer tokens so
it can set the correct offsets.

I added a few test cases to existing analyzers (SynFilter, Japanese,
Standard), and new direct test cases.

I also created a new MockHoleInjectingTokenFilter...

Tests seem to pass... but it wouldn't surprise me if beasting/jenkins
uncovers something...

                
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>         Attachments: LUCENE-3873.patch
>
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-3873.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
    
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-3873.patch, LUCENE-3873.patch
>
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238398#comment-13238398 ] 

Michael McCandless commented on LUCENE-3873:
--------------------------------------------

LUCENE-3848 has the MockGraphTokenFilter patch...
                
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3873:
--------------------------------

    Fix Version/s: 3.6.1
    
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-3873.patch, LUCENE-3873.patch
>
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238405#comment-13238405 ] 

Robert Muir commented on LUCENE-3873:
-------------------------------------

One way we can tie this in is via LUCENE-3919.

But: I think we can use this filter in some individual tests immediately?

E.g. we can just add a method testRandomGraphs to the filters that do lots
of crazy state-capturing, putting this thing in-front-of/behind them in
the analyzer and call checkRandomData?
                
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Michael McCandless (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-3873:
------------------------------------------

    Assignee: Michael McCandless
    
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

Posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3873:
---------------------------------------

    Attachment: LUCENE-3873.patch

New patch, fixing all nocommits.  I think it's ready...
                
> tie MockGraphTokenFilter into all analyzers tests
> -------------------------------------------------
>
>                 Key: LUCENE-3873
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3873
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>         Attachments: LUCENE-3873.patch, LUCENE-3873.patch
>
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org