You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "phatak.prachi (JIRA)" <ji...@apache.org> on 2012/05/15 22:13:07 UTC

[jira] [Created] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

phatak.prachi created SOLR-3455:
-----------------------------------

             Summary: WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
                 Key: SOLR-3455
                 URL: https://issues.apache.org/jira/browse/SOLR-3455
             Project: Solr
          Issue Type: Bug
            Reporter: phatak.prachi


•	RET-34333
•	WAT-34333
•	RET 35555
•	34333

When I search for RET => RET-34333, RET 35555
When I search for RET- => RET-34333
When I search for 34333 => RET-34333, WAT-34333, 34333
When I search for RET-3 => RET-34333
When I search for RET-34333 => RET-34333
When I search for T-3 => nothing returns 
When I search for T 3 => nothing returns 


Configuration:
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
        <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
</fieldType>



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "phatak.prachi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

phatak.prachi updated SOLR-3455:
--------------------------------

    Priority: Blocker  (was: Major)
    
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Bug
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley resolved SOLR-3455.
---------------------------------

    Resolution: Invalid

if you change the config file, you will need to reindex for anything to change the search results.

Can you continue this discussion on the user list where you will likely get better results?

                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Wish
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "phatak.prachi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277923#comment-13277923 ] 

phatak.prachi commented on SOLR-3455:
-------------------------------------

Jack,
Sorry for the confusion.
This is my new configuration:
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>


previously it was not matching 34333 because I was not using WDFF. Now I using it, and it is tokenizing 34333 as a word on analysis page http://localhost:8983/solr/admin/analysis.jsp, but in my actual application it is not giving any result.


                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Bug
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "Jack Krupansky (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277930#comment-13277930 ] 

Jack Krupansky commented on SOLR-3455:
--------------------------------------

It still doesn't sound as if there is an actual bug here (as opposed to questions related to how to set up analyzers and what effects they have), so I think this should be resolved as "invalid". So, yes, any further discussion of this should be over on solr-user.
                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Wish
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "Jack Krupansky (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277878#comment-13277878 ] 

Jack Krupansky commented on SOLR-3455:
--------------------------------------

Given this configuration, 34333 should only match documents containing terms that are exactly 34333 or start with 34333, but will not match terms that have 34333 embedded within them, including after a hyphen. So, 34333 will not match RET-34333 or WAT-34333, but your original description indicates that it is matching RET-34333, and matching 34333. But later you say it is not matching 34333. Something in your description and comments is inconsistent. Until you resolve these inconsistencies in your description, the problem (if any) will not be clear.

                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Bug
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "Tomás Fernández Löbbe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277085#comment-13277085 ] 

Tomás Fernández Löbbe commented on SOLR-3455:
---------------------------------------------

This doesn't look like a bug from the description, and I don't understand the summary, you are not using WordDelimiterFilterFactory in that field type. Your test searches seem to be giving the correct results.
                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Bug
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "phatak.prachi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

phatak.prachi updated SOLR-3455:
--------------------------------

    Issue Type: Wish  (was: Bug)
    
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Wish
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "phatak.prachi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277092#comment-13277092 ] 

phatak.prachi commented on SOLR-3455:
-------------------------------------

The following is not working
When I search for 34333 => RET-34333, WAT-34333, 34333
                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Bug
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "Tomás Fernández Löbbe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277847#comment-13277847 ] 

Tomás Fernández Löbbe commented on SOLR-3455:
---------------------------------------------

You are right, I don't see why that search matches RET-34333 and WAT-34333 with your field type. The field type that you provided doesn't use the the WordDelimiterFilterFactory though, Have you pasted the correct one? Also, have you seen the other configuration attributes, like "generateNumberParts" and "splitOnNumerics"? This may be a configuration problem and not a bug, probably you would get more help on the users list?
                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Bug
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3455) WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"

Posted by "phatak.prachi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277856#comment-13277856 ] 

phatak.prachi commented on SOLR-3455:
-------------------------------------

I really appreciate your response.

This is my new configuration:
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"       enablePositionIncrements="true" />
	<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
	<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

On the analysis page it shows it has found match for 34333 but in actual application it is not showing it.
                
> WordDelimiterFilterFactory split word on hyphen though generateWordParts="0"
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-3455
>                 URL: https://issues.apache.org/jira/browse/SOLR-3455
>             Project: Solr
>          Issue Type: Bug
>            Reporter: phatak.prachi
>            Priority: Blocker
>
> •	RET-34333
> •	WAT-34333
> •	RET 35555
> •	34333
> When I search for RET => RET-34333, RET 35555
> When I search for RET- => RET-34333
> When I search for 34333 => RET-34333, WAT-34333, 34333
> When I search for RET-3 => RET-34333
> When I search for RET-34333 => RET-34333
> When I search for T-3 => nothing returns 
> When I search for T 3 => nothing returns 
> Configuration:
> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
>         <filter class="solr.StopFilterFactory"   ignoreCase="true" words="stopwords.txt"  enablePositionIncrements="true"  />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
> </fieldType>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org