You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "balaji.gandhi" <ji...@gmail.com> on 2012/11/09 01:13:28 UTC

Using AnalyzingQueryParser - Solr 4.0

Hi Team,

Just trying to find out how to configure AnalyzingQueryParser in Solr 4.0.
Please let me know.

Thanks,
Balaji



--
View this message in context: http://lucene.472066.n3.nabble.com/Using-AnalyzingQueryParser-Solr-4-0-tp4019193.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using AnalyzingQueryParser - Solr 4.0

Posted by Jack Krupansky <ja...@basetechnology.com>.

Maybe you just want to use the white space tokenizer - the standard 
tokenizer treats the at-sign as if a space.

See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizerFactory.html

Or, you could use the "classic" tokenizer which does keep email addresses 
and URLs.
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizerFactory.html

And, there is some variant of the new standard tokenizer that also preserves 
email addresses and URLs - but it's name is too complex for me to recommend 
it with a straight face: UAX29URLEmailTokenizerFactory.

See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerFactory.html


-- Jack Krupansky

-----Original Message----- 
From: balaji.gandhi
Sent: Friday, November 09, 2012 8:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Using AnalyzingQueryParser - Solr 4.0

Hi Jack,

We have an email field defined like this:-

        <fieldType name="text_email" class="solr.TextField"
positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.PatternReplaceFilterFactory"
pattern="\." replacement=" DOT " replace="all"/>
                <filter class="solr.PatternReplaceFilterFactory" pattern="@"
replacement=" AT " replace="all"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
                        catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
            </analyzer>
            <analyzer type="multiterm">
                <tokenizer class="solr.KeywordTokenizerFactory" />
            </analyzer>
        </fieldType>

A query like [emailAddress : bob*] would match bob@bob.com, but queries
which include any special characters like [bob@], [bob@*] and [bob@bob.*]
will not match any email addresses.

Yes, I tried the multi-term and it does not fix the issue. Any thots?

Thanks,
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-AnalyzingQueryParser-Solr-4-0-tp4019193p4019341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using AnalyzingQueryParser - Solr 4.0

Posted by "balaji.gandhi" <ji...@gmail.com>.

Hi Jack,

We have an email field defined like this:-

        <fieldType name="text_email" class="solr.TextField"
positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.PatternReplaceFilterFactory"
pattern="\." replacement=" DOT " replace="all"/>
                <filter class="solr.PatternReplaceFilterFactory" pattern="@"
replacement=" AT " replace="all"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" 
                        catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
            </analyzer>
            <analyzer type="multiterm">
                <tokenizer class="solr.KeywordTokenizerFactory" />
            </analyzer>
        </fieldType>

A query like [emailAddress : bob*] would match bob@bob.com, but queries
which include any special characters like [bob@], [bob@*] and [bob@bob.*]
will not match any email addresses.

Yes, I tried the multi-term and it does not fix the issue. Any thots?

Thanks,
Balaji



--
View this message in context: http://lucene.472066.n3.nabble.com/Using-AnalyzingQueryParser-Solr-4-0-tp4019193p4019341.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using AnalyzingQueryParser - Solr 4.0

Posted by Jack Krupansky <ja...@basetechnology.com>.

There isn't a "QParserPlugIn" for that query parser for Solr. You would have 
to develop one yourself.

But, why do you think you need that query parser? I mean, the standard query 
parsers/analyzers for Solr are now "multi-term aware" to permit some 
combinations of case filtering and wildcards, for example.

-- Jack Krupansky

-----Original Message----- 
From: balaji.gandhi
Sent: Thursday, November 08, 2012 4:13 PM
To: solr-user@lucene.apache.org
Subject: Using AnalyzingQueryParser - Solr 4.0

Hi Team,

Just trying to find out how to configure AnalyzingQueryParser in Solr 4.0.
Please let me know.

Thanks,
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-AnalyzingQueryParser-Solr-4-0-tp4019193.html
Sent from the Solr - User mailing list archive at Nabble.com.