You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike Spencer (Created) (JIRA)" <ji...@apache.org> on 2012/03/01 00:09:58 UTC

[jira] [Created] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

PatternReplaceCharFilterFactory can't replace with ampersands in index
----------------------------------------------------------------------

                 Key: SOLR-3185
                 URL: https://issues.apache.org/jira/browse/SOLR-3185
             Project: Solr
          Issue Type: Bug
          Components: Schema and Analysis
    Affects Versions: 3.5
            Reporter: Mike Spencer
            Priority: Minor


Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
<charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="(^\w)\s[&amp;]\s(\w)"
                    replacement="$1&amp;$2" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Mike Spencer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Spencer updated SOLR-3185:
-------------------------------

    Description: 
Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
<charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="(^\w)\s[&amp;]\s(\w)"
                    replacement="$1&amp;amp;$2" />

  was:
Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
<charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="(^\w)\s[&amp;]\s(\w)"
                    replacement="$1&amp;$2" />

    
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>
> Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;amp;$2" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Mike Spencer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220605#comment-13220605 ] 

Mike Spencer commented on SOLR-3185:
------------------------------------

Sorry, had improper formatting before. Due to how the XML configuration needs to deal with ampersands I have to use the &amp;amp; code instead of the character. It reads it fine but writes it literally instead of outputting the ampersand character.


                
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>
> Using solr.PatternReplaceCharFilterFactory to replace {noformat}A & B{noformat} with {noformat}A&B{noformat} will result in {noformat}A&amp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}A&B{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> {noformat}
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;$2" />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Closed] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Mike Spencer (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Spencer closed SOLR-3185.
------------------------------

       Resolution: Not A Problem
    Fix Version/s: 3.5

Explored the chain more in depth, discovered the issue is not related to PatternReplaceCharFilterFactory at all.
                
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>             Fix For: 3.5
>
>
> Using solr.PatternReplaceCharFilterFactory to replace {noformat}A & B{noformat} with {noformat}A&B{noformat} will result in {noformat}A&amp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}A&B{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> {noformat}
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;$2" />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221193#comment-13221193 ] 

Dawid Weiss commented on SOLR-3185:
-----------------------------------

Excellent, thanks for confirming.
                
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>             Fix For: 3.5
>
>
> Using solr.PatternReplaceCharFilterFactory to replace {noformat}A & B{noformat} with {noformat}A&B{noformat} will result in {noformat}A&amp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}A&B{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> {noformat}
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;$2" />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Mike Spencer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Spencer updated SOLR-3185:
-------------------------------

    Description: 
Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
<charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="(^\w)\s[&amp;]\s(\w)"
                    replacement="$1&amp;$2" />

  was:
Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
<charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="(^\w)\s[&amp;]\s(\w)"
                    replacement="$1&amp;$2" />

    
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>
> Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;$2" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Mike Spencer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Spencer updated SOLR-3185:
-------------------------------

    Description: 
Using solr.PatternReplaceCharFilterFactory to replace {noformat}A & B{noformat} with {noformat}A&B{noformat} will result in {noformat}A&amp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}A&B{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
{noformat}
<charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="(^\w)\s[&amp;]\s(\w)"
                    replacement="$1&amp;$2" />
{noformat}

  was:
Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
<charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="(^\w)\s[&amp;]\s(\w)"
                    replacement="$1&amp;amp;$2" />

    
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>
> Using solr.PatternReplaceCharFilterFactory to replace {noformat}A & B{noformat} with {noformat}A&B{noformat} will result in {noformat}A&amp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}A&B{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> {noformat}
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;$2" />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220421#comment-13220421 ] 

Dawid Weiss commented on SOLR-3185:
-----------------------------------

Are there any other filters in the chain? Because PatternReplaceCharFilterFactory itself doesn't replace any html entities so it'd be weird. Also, can you quote the XML verbatim? If you have this:

{noformat}
<charFilter class="solr.PatternReplaceCharFilterFactory" 
                    pattern="(^\w)\s[&amp;]\s(\w)" 
                    replacement="$1&amp;amp;$2" />
{noformat}
then indeed the replaced value will be:
{noformat}
$1&amp;$2
{noformat}
                
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>
> Using solr.PatternReplaceCharFilterFactory to replace 'A & B' (no quotes) with 'A&B' (no spaces) will result in 'A&amp;amp;B' being indexed. Query analysis will give the expected result of 'A&B'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;amp;$2" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

Posted by "Dawid Weiss (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220760#comment-13220760 ] 

Dawid Weiss commented on SOLR-3185:
-----------------------------------

I just checked and the indexed value is correct. Can you attach your solr configuration file (or an example that doesn't work for you)?
                
> PatternReplaceCharFilterFactory can't replace with ampersands in index
> ----------------------------------------------------------------------
>
>                 Key: SOLR-3185
>                 URL: https://issues.apache.org/jira/browse/SOLR-3185
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.5
>            Reporter: Mike Spencer
>            Priority: Minor
>              Labels: PatternReplaceCharFilter, regex
>
> Using solr.PatternReplaceCharFilterFactory to replace {noformat}A & B{noformat} with {noformat}A&B{noformat} will result in {noformat}A&amp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}A&B{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools.
> This is the affected charFilter:
> {noformat}
> <charFilter class="solr.PatternReplaceCharFilterFactory"
>                     pattern="(^\w)\s[&amp;]\s(\w)"
>                     replacement="$1&amp;$2" />
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org