You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 20:53:26 UTC

[jira] [Created] (LUCENE-3942) SynonymFilter should set pos length att

SynonymFilter should set pos length att
---------------------------------------

                 Key: LUCENE-3942
                 URL: https://issues.apache.org/jira/browse/LUCENE-3942
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.0


Tokenizers/Filters can now produce graphs instead of a single linear
chain of tokens, by setting the PositionLengthAttribute, expressing
where (how many positions ahead) this token "ends".

The default is 1, meaning it ends at the next position, to be
backwards compatible.

SynonymFilter produces graph output tokens, as long as the output is a
single token, but currently never sets the pos length to express this.
EG for the rule "wifi network -> hotspot", the hotspot token should
have pos length = 2.  With LUCENE-3940 this will allow us to verify
that the offsets for such tokens are correct...


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3942) SynonymFilter should set pos length att

Posted by "Michael McCandless (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3942:
---------------------------------------

    Attachment: LUCENE-3942.patch

Patch to set pos length > 1 when appropriate... I think it's ready.

Note that SynFilter still cannot *consume* a graph, so eg you cannot apply it after WDF or after Kuromoji... we need to separately fix that.
                
> SynonymFilter should set pos length att
> ---------------------------------------
>
>                 Key: LUCENE-3942
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3942
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-3942.patch
>
>
> Tokenizers/Filters can now produce graphs instead of a single linear
> chain of tokens, by setting the PositionLengthAttribute, expressing
> where (how many positions ahead) this token "ends".
> The default is 1, meaning it ends at the next position, to be
> backwards compatible.
> SynonymFilter produces graph output tokens, as long as the output is a
> single token, but currently never sets the pos length to express this.
> EG for the rule "wifi network -> hotspot", the hotspot token should
> have pos length = 2.  With LUCENE-3940 this will allow us to verify
> that the offsets for such tokens are correct...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3942) SynonymFilter should set pos length att

Posted by "Michael McCandless (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-3942.
----------------------------------------

    Resolution: Fixed
    
> SynonymFilter should set pos length att
> ---------------------------------------
>
>                 Key: LUCENE-3942
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3942
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-3942.patch
>
>
> Tokenizers/Filters can now produce graphs instead of a single linear
> chain of tokens, by setting the PositionLengthAttribute, expressing
> where (how many positions ahead) this token "ends".
> The default is 1, meaning it ends at the next position, to be
> backwards compatible.
> SynonymFilter produces graph output tokens, as long as the output is a
> single token, but currently never sets the pos length to express this.
> EG for the rule "wifi network -> hotspot", the hotspot token should
> have pos length = 2.  With LUCENE-3940 this will allow us to verify
> that the offsets for such tokens are correct...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3942) SynonymFilter should set pos length att

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3942:
--------------------------------

    Fix Version/s: 3.6.1
    
> SynonymFilter should set pos length att
> ---------------------------------------
>
>                 Key: LUCENE-3942
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3942
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-3942.patch
>
>
> Tokenizers/Filters can now produce graphs instead of a single linear
> chain of tokens, by setting the PositionLengthAttribute, expressing
> where (how many positions ahead) this token "ends".
> The default is 1, meaning it ends at the next position, to be
> backwards compatible.
> SynonymFilter produces graph output tokens, as long as the output is a
> single token, but currently never sets the pos length to express this.
> EG for the rule "wifi network -> hotspot", the hotspot token should
> have pos length = 2.  With LUCENE-3940 this will allow us to verify
> that the offsets for such tokens are correct...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org