You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/08/16 20:00:27 UTC

[jira] Created: (SOLR-2051) analysis.jsp is incorrect for protWords etc

analysis.jsp is incorrect for protWords etc
-------------------------------------------

                 Key: SOLR-2051
                 URL: https://issues.apache.org/jira/browse/SOLR-2051
             Project: Solr
          Issue Type: Bug
          Components: web gui
    Affects Versions: 3.1, 4.0
            Reporter: Robert Muir


Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.

This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).

For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.

The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.

Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899363#action_12899363 ] 

Uwe Schindler commented on SOLR-2051:
-------------------------------------

Ah, I understand the problem. So ignore my last message: The printTokens method adds attributes that may not exist to the cloned AttributeSources. After that copyTo to the original stream does not work.

Just an idea: Would it make sense, to let copyTo() automatically add missing target attributes? copyTo() is new in 3.x and trunk, so we can still change how it works.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899115#action_12899115 ] 

Robert Muir commented on SOLR-2051:
-----------------------------------

{quote}
Insert a tap after each filter? Yeah, might be safer by more closely emulating how the analysis actually works.
For example, if someone develops some whacky filters that rely on thread locals to pass info or something.

Since it looks like you've fixed it already, I'd just commit that though.
{quote}

Well, Uwe suggested CachingTokenFilter as one idea, we could keep the same overall flow. 

A 'printing tap' after each filter seems even better though... lemme try it and worse case we have this as the fix for now.


> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899109#action_12899109 ] 

Yonik Seeley commented on SOLR-2051:
------------------------------------

Ah, yeah, good catch!

bq. i wonder if we can use tee/sinks to do this cleaner?

Insert a tap after each filter?  Yeah, might be safer by more closely emulating how the analysis actually works.
For example, if someone develops some whacky filters that rely on thread locals to pass info or something.

Since it looks like you've fixed it already, I'd just commit that though.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2051:
------------------------------

    Attachment: SOLR-2051.patch

here is a slightly improved patch, but still this jsp file is scary.

i wonder if we can use tee/sinks to do this cleaner?

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated SOLR-2051:
--------------------------------

    Attachment: dynamic-AttributeSource.patch

Here the more dynamic AS, that adds missing attribute impls on restoreState() and copyTo(). This is just an idea, the AS test does not pass, as it checks for the exception previously thrown.

I changed analysis.jsp to use this. Sorry for formatting changes, but my editor fixed the tabs.

I am not sure, if this is good, as it may add tokenstreams attributes after the ctor which is discouraged and can lead to unexspected behaviour on the consumer, especially if factories dont match correct between source and target (in both cases, copyTo and restoreState). Ideally on copyTo(), the AS should check that AF is identical.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: dynamic-AttributeSource.patch, SOLR-2051.patch, SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899028#action_12899028 ] 

Robert Muir commented on SOLR-2051:
-----------------------------------

you can reproduce this easily / verify it is now correct by going to the analysis.jsp, turning on verbose output, and entering "dontstems" for field type "text", as its already in the protwords.txt

without the patch you will see it being stemmed.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2051:
------------------------------

    Attachment: SOLR-2051.patch

attached is a patch that uses AttributeSource.copyTo to preserve any custom attributes that might be in the stream.

additionally i added some logic for non-stringable terms (it will just print the bytes in hex)


> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899133#action_12899133 ] 

Uwe Schindler commented on SOLR-2051:
-------------------------------------

After a discussion with Robert, I also think that a Tap would be an elegant and less intrusive aproach (from the TokenStreams point of view). The Whole thing would simply create the Tokenizer, wrap the tap-filter around then add the next filter in chain, again add the tap again, and so on.

The filter simply calls input.increametToken() and then prints the current attributes. It can also hold a local "pos" field that is updated with positionIncrement to do formatting right. The code to resort tokens when negative position increments occur is useless, as Lucene no longer allows negative position increments (from what I know). The whole JSP would use no caching lists of tokens, no iterators, no array copy, no copyTo(). It just builds a tokenstream and consumes it. The Tap filter can also be added around the generic (non TokenizerChain Lucene Analyzer). The main code would simply do "while (ts.incrementToken())" - nothing more. All printout is done in the filters added between each chain step (or after the generic lucene analyzer).

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899415#action_12899415 ] 

Robert Muir commented on SOLR-2051:
-----------------------------------

bq. Just an idea: Would it make sense, to let copyTo() automatically add missing target attributes

Not sure, this wouldn't help the typical buffering case like SynonymFilter? I think this analysis.jsp
that crosses the tokenstreams is an extremely special case.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: dynamic-AttributeSource.patch, SOLR-2051.patch, SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899358#action_12899358 ] 

Uwe Schindler commented on SOLR-2051:
-------------------------------------

For performance reasons, I would move

{code}
final AttributeSource token = iter.next();
Iterator<Class<? extends Attribute>> atts = token.getAttributeClassesIterator();
while (atts.hasNext()) this.addAttribute(atts.next());
{code}

to the ctor of the helper tokenstream. This is the same how TeeSink and all other tokenstreams work. Adding attributes later in the tokenstreams incrementToken() is not allowed, so you can be sure that after the original tokenstreams ctor was executed all attributes are available. Doing this on each incrementToken is the same like if indexer would do this on each incrementToken call.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2051:
------------------------------

    Attachment: SOLR-2051.patch

ok, i had a look at reworking this as we discussed, but its more complicated
due to "highlight matches" etc.

So for now, heres the bugfix (same as before except it creates the fake tokenstream with the same factory as what the filter uses)

i'll commit this and we should open another issue for reworking this, and probably printing all attributes when we do that.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Resolved: (SOLR-2051) analysis.jsp is incorrect for protWords etc

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-2051.
-------------------------------

    Resolution: Fixed

Committed revision 986158, 986160

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2051.patch, SOLR-2051.patch, SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org