You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shridhar Venkatraman <Sh...@NeemTree.com> on 2007/03/27 14:08:20 UTC
Reposting unABLE to match
Solr <http://localhost:8084/Genie/>
Solr Admin (GENIE)
ShridharVAIO:8084
cwd=C:\Program Files\netbeans-5.5\enterprise3\apache-tomcat-5.5.17\bin
SolrHome=c:\Documents and
Settings\Shridhar\Desktop\Public\Sana\KN\Genie\GenieConf/
Field Analysis
*Field name*
*Field value (Index)*
verbose output
highlight matches "unABLE TO CONNECT"
*Field value (Query)*
verbose output "unABLE TO CONNECT"
Index Analyzer
org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.StandardFilterFactory {}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 1 2
term text "unABLE CONNECT"
term type word word
source start,end 0,7 11,19
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}
term position 1 2 3
term text un ABLE CONNECT
unABLE
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
Query Analyzer
org.apache.solr.analysis.HTMLStripStandardTokenizerFactory {}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.StandardFilterFactory {}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 1 2
term text unABLE CONNECT
term type <ALPHANUM> <ALPHANUM>
source start,end 1,7 11,18
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}
term position 1 2 3
term text un ABLE CONNECT
unABLE
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
Re: Reposting unABLE to match
Posted by Chris Hostetter <ho...@fucit.org>.
: Sorry for this multiple postings...
: My email text did not get posted along with the attachment, don't know why ?
: Here it is again.
in general: don't use attachments, paste text directly into hte body of
your email, that may have had soemthing to do with your problem.
-Hoss
Re: Reposting unABLE to match
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 3/27/07, Shridhar Venkatraman <Sh...@neemtree.com> wrote:
...Reposting unABLE to match
No need to repost if your message made it to the list.
If it hasn't been answered yet, it either means that no one knows the
answer or that no one has had the time to answer yet. We're all
volunteers here.
-Bertrand
Re: Reposting unABLE to match
Posted by Ma...@ibsbe.be.
what exactly is the problem ?
seems like you end up with the same term text in both query and index
analyzer ... you should have found a match...
Shridhar Venkatraman <Sh...@NeemTree.com>
27/03/2007 14:08
Please respond to
solr-user@lucene.apache.org
To
solr-user@lucene.apache.org
cc
Subject
Reposting unABLE to match
Solr <http://localhost:8084/Genie/>
Solr Admin (GENIE)
ShridharVAIO:8084
cwd=C:\Program Files\netbeans-5.5\enterprise3\apache-tomcat-5.5.17\bin
SolrHome=c:\Documents and
Settings\Shridhar\Desktop\Public\Sana\KN\Genie\GenieConf/
Field Analysis
*Field name*
*Field value (Index)*
verbose output
highlight matches "unABLE TO CONNECT"
*Field value (Query)*
verbose output "unABLE TO CONNECT"
Index Analyzer
org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.StandardFilterFactory {}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 1 2
term text "unABLE CONNECT"
term type word word
source start,end 0,7 11,19
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}
term position 1 2 3
term text un ABLE CONNECT
unABLE
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
Query Analyzer
org.apache.solr.analysis.HTMLStripStandardTokenizerFactory {}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.StandardFilterFactory {}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 1 2
term text unABLE CONNECT
term type <ALPHANUM> <ALPHANUM>
source start,end 1,7 11,18
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}
term position 1 2 3
term text un ABLE CONNECT
unABLE
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
Re: Reposting unABLE to match
Posted by Yonik Seeley <yo...@apache.org>.
On 3/27/07, Shridhar Venkatraman <Sh...@neemtree.com> wrote:
> The phrase "unABLE TO CONNECT" does not match in my system. However, any
> combination of case is ok as long as the first letter 'U" is in
> uppercase.
>
> Bad-> uNABLE, unABLE, unaBLE....
> Gud-> Unable, UNable, UNAble...
>
> Any ideas ?
WordDelimiterFilter
lowercase to uppercase transition => split
uppercase to lowercase => no split (so capitalized words, and words
like IBMs won't cause a split).
Either configure WordDelimiterFilter differently (use catenation but
not generation), or remove it altogether.
Don't forget to re-index after you have made changes.
-Yonik
Re: Reposting unABLE to match
Posted by Ma...@ibsbe.be.
the only thing i can think of is the fact that in the index analysis the
term-type is "word"
and in the query analysis the term-type is "alphanumeric"
you should be getting a match if that doesnt matter ... you get exactly
the same term texts ...
Shridhar Venkatraman <Sh...@NeemTree.com>
27/03/2007 14:08
Please respond to
solr-user@lucene.apache.org
To
solr-user@lucene.apache.org
cc
Subject
Reposting unABLE to match
Solr <http://localhost:8084/Genie/>
Solr Admin (GENIE)
ShridharVAIO:8084
cwd=C:\Program Files\netbeans-5.5\enterprise3\apache-tomcat-5.5.17\bin
SolrHome=c:\Documents and
Settings\Shridhar\Desktop\Public\Sana\KN\Genie\GenieConf/
Field Analysis
*Field name*
*Field value (Index)*
verbose output
highlight matches "unABLE TO CONNECT"
*Field value (Query)*
verbose output "unABLE TO CONNECT"
Index Analyzer
org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.StandardFilterFactory {}
term position 1 2 3
term text "unABLE TO CONNECT"
term type word word word
source start,end 0,7 8,10 11,19
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 1 2
term text "unABLE CONNECT"
term type word word
source start,end 0,7 11,19
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}
term position 1 2 3
term text un ABLE CONNECT
unABLE
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type word word word
word
source start,end 1,3 3,7 11,18
1,7
Query Analyzer
org.apache.solr.analysis.HTMLStripStandardTokenizerFactory {}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.StandardFilterFactory {}
term position 1 2 3
term text unABLE TO CONNECT
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
source start,end 1,7 8,10 11,18
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}
term position 1 2
term text unABLE CONNECT
term type <ALPHANUM> <ALPHANUM>
source start,end 1,7 11,18
org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}
term position 1 2 3
term text un ABLE CONNECT
unABLE
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 1 2 3
term text un able connect
unable
term type <ALPHANUM> <ALPHANUM> <ALPHANUM>
<ALPHANUM>
source start,end 1,3 3,7 11,18
1,7