You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Croley (Created) (JIRA)" <ji...@apache.org> on 2012/02/02 21:20:53 UTC

[jira] [Created] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

EnglishPossessiveFilter should work with Unicode right single quotation mark
----------------------------------------------------------------------------

                 Key: LUCENE-3748
                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 3.5, 3.4, 3.2, 3.1
            Reporter: David Croley
            Priority: Minor
         Attachments: LucenePatch

The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199189#comment-13199189 ] 

Robert Muir commented on LUCENE-3748:
-------------------------------------

I agree with the patch. We can easily add backwards compat here, no problem.

As far as any potential others, the only possibility from my perspective is U+FF07 FULLWIDTH APOSTROPHE, 
though I could go either way on that (since its a compatibility character)

Any other opinions?
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Walter Underwood (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199390#comment-13199390 ] 

Walter Underwood commented on LUCENE-3748:
------------------------------------------

Why make separate patches for characters instead of using Unicode normalization? Converting to NFKC would also solve this for the prime character (U+2032) and any other codepoint that is equivalent.

Compatibility normalization is designed for precisely this purpose, equivalence ignoring appearance.
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: LucenePatch, Patch-Lucene-3748
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199204#comment-13199204 ] 

Steven Rowe commented on LUCENE-3748:
-------------------------------------

+1, and +1 to include U+FF07.

There are several other characters listed with U+0027 APOSTROPHE in http://www.unicode.org/charts/PDF/U0000.pdf that could be interpreted visually as an English apostrophe, e.g. U+02BC MODIFIER LETTER APOSTROPHE, but it would be unusual for people to use those characters as apostrophes in English text, so I think it would be fine to exclude them.  (By contrast, the Unicode standard says that U+2019 is the *preferred* apostrophe form.)
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Robert Muir (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned LUCENE-3748:
-----------------------------------

    Assignee: Robert Muir
    
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: LucenePatch, Patch-Lucene-3748
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199217#comment-13199217 ] 

Robert Muir commented on LUCENE-3748:
-------------------------------------

Thats my thoughts exactly Steven.

I think by default we should go with U+0027 and U+2019 (and as i mentioned, either FF07 or not, its less important). 

As far as other look-alikes, sure it could happen, BUT the user could just place ASCIIFoldingFilter before
EnglishPossessiveFilter if they want that more brutal behavior... thats a more lossy normalization that I 
don't think we should do by default...
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199267#comment-13199267 ] 

Robert Muir commented on LUCENE-3748:
-------------------------------------

I think we should do it (despite the cruft).

One of these days we will realize our goal of a stable interface between indexwriter etc and analyzers such
that if you are really worried about this with old indexes, you just use lucene-analyzers-ancient-version.jar
and it works with the newer lucene-core.jar

But until then, i think we need it (e.g. we add a deprecated ctor for api compatibility that forwards to VERSION.LUCENE_35)
and conditionalize the handling based on Version.

If you dont want to cruft-it-up lemme know, otherwise feel free to add a patch :)

                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3748:
--------------------------------

    Attachment: LUCENE-3748.patch

updated patch: thanks again David.

I added some javadocs, CHANGES.txt, an assertion to the solr factory, and (somewhat reluctantly) FF07.

                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: LUCENE-3748.patch, LucenePatch, Patch-Lucene-3748
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-3748.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
                   3.6

Thanks David!
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3748.patch, LucenePatch, Patch-Lucene-3748
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "David Croley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199262#comment-13199262 ] 

David Croley commented on LUCENE-3748:
--------------------------------------

If you want to preserve backwards compatibility, I guess I could pass matchVersion in from the calling Analyzer, but that crufts it up a bit. Is that necessary?
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "David Croley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Croley updated LUCENE-3748:
---------------------------------

    Attachment: Patch-Lucene-3748

newer patch that preserve backwards compatibility. Not sure if I've done that the best way, so feel free to change as needed.
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch, Patch-Lucene-3748
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "David Croley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Croley updated LUCENE-3748:
---------------------------------

    Attachment: LucenePatch

patch to address bug and add unit test for same.
                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199398#comment-13199398 ] 

Robert Muir commented on LUCENE-3748:
-------------------------------------

Walter: U+2019 does not decompose at all (see http://unicode.org/cldr/utility/character.jsp?a=2019&B1=Show)

This is because its not a compatibility character of any reason, in fact its the single quote (U+0027) 
thats ambiguous, U+2019 is the correct one here.

>From a pedantic point of view, we should be forcing you to disambiguate the very ambiguous single quote (U+0027)
on your keyboard and *ONLY* handling U+2019 in this filter, but I realize some people might find this opinion a 
tad extreme :)



                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: LucenePatch, Patch-Lucene-3748
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode "\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8 text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to '\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org