You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Rupinder Singh Mazara <rs...@ebi.ac.uk> on 2004/10/19 17:23:15 UTC

Null or no analyzer

Hi All

  I have a question regarding selection of Analyzer's during query parsing


  i have three field in my index db_id, full_text, subject
  all three are indexed, however while indexing I specified to lucene to
index db_id and subject but not tokenize them

  I want to give a single search box in my application to enable searching
for documents
  some query can look lile  "motor cross rally" this will get fed to
QueryParser to do the relevent parsing

  however if the user enters  Jhon Kerry  subject:"Elections 2004" I want to
make sure that No analyzer is used fro the subject field ? how can that be
done.

  this is because I expect the users to know the subject from a List of
controlled vocabularies and also I am searching for
 documents that have the exact subject I tried using the
"PerFieldAnalyzerWrapper", but how do I get hold a Analyzer that
 does nothing but pass the text trough to the Searcher  ?




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by sergiu gordea <gs...@ifit.uni-klu.ac.at>.

Erik Hatcher wrote:

> On Oct 21, 2004, at 5:38 AM, sergiu gordea wrote:
>
>> Erik Hatcher wrote:
>>
>>> I don't like the idea of users having to know how a field was 
>>> indexed though.  That seems to defeat the purpose of a 
>>> general-purpose QueryParser.
>>>
>>>     Erik
>>
>>
>> I agree that, but maybe lucene should provide some subclasses of 
>> QueryParser that should deal this problems.
>> I'm just a lucene user, not a lucene developer, but I have had to 
>> implement a Extension for MultifieldQueryParser
>> to fix some not wanted behaviour that I already discussed in the 
>> mailing list. These problems that user face with creating the right 
>> qeury strings, (with the special case of untokenized fileds) togheter
>> with MultifieldQueryParser problems, MultiSearcher problems ... I 
>> think that all together suggest the idea of creating a
>> QueryParser class hierarchy.
>>
>>  What do you think about that?
>
>
> Query parsing/expansion is the holy grail.  There are so many ways to 
> do this sort of thing that I'm mostly of the opinion it is a 
> per-project customization to get it tuned for the needs of that project.
>
> Nutch has done some nice things with query parsing/expansion and 
> extensibility.
>
> I'm all for a more extensible base to work from, no question.
>
> I'm personally not fond of MultiFieldQueryParser - I much prefer 
> aggregate fields that are indexed (not stored) to be used for 
> queries.  Blindly expanding queries across fields doesn't seem that 
> useful to me.

In my case is very usefull. Because my search has constaints like

1) has xxx file format attachment
2) has xxx type
3) was created by xxx
4) search in attachmets or not

so ... I cannot make this customization without indexing in more fields 
and searching in more fields.
Creating the queryString by adding "filed:keyword" pair is just a hardly 
maintainable way of reinventing the
wheel. So .. in may case, MultifieldQueryParser is very useful, because 
I haave to add some boolean clauses
after I create the base query.
Last month I just refactored the method that created the search query 
for our "extended search" functionality.
It was a method with 200 lines of structural code (no query parser used).
using Boolean clauses and MultifieldQueryParser helped me a lot ... and 
the result was a method with fewer, easily maintainable
lines of code.

 Of course ... this is needed in my project, but I think that almost all 
lucene indexes contain more then 2-3 fileds.

 So ... once again MultifieldQueryParser is an elegent solution.

  Sergiu

>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Oct 21, 2004, at 5:38 AM, sergiu gordea wrote:
> Erik Hatcher wrote:
>
>> I don't like the idea of users having to know how a field was indexed 
>> though.  That seems to defeat the purpose of a general-purpose 
>> QueryParser.
>>
>>     Erik
>
> I agree that, but maybe lucene should provide some subclasses of 
> QueryParser that should deal this problems.
> I'm just a lucene user, not a lucene developer, but I have had to 
> implement a Extension for MultifieldQueryParser
> to fix some not wanted behaviour that I already discussed in the 
> mailing list. These problems that user face with creating the right 
> qeury strings, (with the special case of untokenized fileds) togheter
> with MultifieldQueryParser problems, MultiSearcher problems ... I 
> think that all together suggest the idea of creating a
> QueryParser class hierarchy.
>
>  What do you think about that?

Query parsing/expansion is the holy grail.  There are so many ways to 
do this sort of thing that I'm mostly of the opinion it is a 
per-project customization to get it tuned for the needs of that 
project.

Nutch has done some nice things with query parsing/expansion and 
extensibility.

I'm all for a more extensible base to work from, no question.

I'm personally not fond of MultiFieldQueryParser - I much prefer 
aggregate fields that are indexed (not stored) to be used for queries.  
Blindly expanding queries across fields doesn't seem that useful to me.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by sergiu gordea <gs...@ifit.uni-klu.ac.at>.

Erik Hatcher wrote:

> I don't like the idea of users having to know how a field was indexed 
> though.  That seems to defeat the purpose of a general-purpose 
> QueryParser.
>
>     Erik

I agree that, but maybe lucene should provide some subclasses of 
QueryParser that should deal this problems.
I'm just a lucene user, not a lucene developer, but I have had to 
implement a Extension for MultifieldQueryParser
to fix some not wanted behaviour that I already discussed in the mailing 
list. 
These problems that user face with creating the right qeury strings, 
(with the special case of untokenized fileds) togheter
with MultifieldQueryParser problems, MultiSearcher problems ... I think 
that all together suggest the idea of creating a
QueryParser class hierarchy.

  What do you think about that?

  All the best,

 Sergiu


>
> On Oct 21, 2004, at 2:38 AM, Morus Walter wrote:
>
>> Erik Hatcher writes:
>>
>>> however perhaps it should be.  Or perhaps there are other options to
>>> solve this recurring dilemma folks have with Field.Keyword indexed
>>> fields and QueryParser?
>>>
>> I think one could introduce a special syntax in query parser for
>> keyword fields. Query parser wouldn't analyze them at all in this case.
>> Something like
>> field#Keyword
>> or
>> field#"keyword containing blanks"
>>
>> I haven't thought through all consequences for
>> field#(keywordA keywordB otherfield:noKeyword)
>> but I think it should be doable.
>>
>> Doesn't make query parser simpler, on the other hand.
>>
>> Morus
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

I don't like the idea of users having to know how a field was indexed 
though.  That seems to defeat the purpose of a general-purpose 
QueryParser.

	Erik

On Oct 21, 2004, at 2:38 AM, Morus Walter wrote:

> Erik Hatcher writes:
>
>> however perhaps it should be.  Or perhaps there are other options to
>> solve this recurring dilemma folks have with Field.Keyword indexed
>> fields and QueryParser?
>>
> I think one could introduce a special syntax in query parser for
> keyword fields. Query parser wouldn't analyze them at all in this case.
> Something like
> field#Keyword
> or
> field#"keyword containing blanks"
>
> I haven't thought through all consequences for
> field#(keywordA keywordB otherfield:noKeyword)
> but I think it should be doable.
>
> Doesn't make query parser simpler, on the other hand.
>
> Morus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by Morus Walter <mo...@tanto.de>.

Erik Hatcher writes:

> however perhaps it should be.  Or perhaps there are other options to 
> solve this recurring dilemma folks have with Field.Keyword indexed 
> fields and QueryParser?
> 
I think one could introduce a special syntax in query parser for
keyword fields. Query parser wouldn't analyze them at all in this case.
Something like 
field#Keyword
or
field#"keyword containing blanks"

I haven't thought through all consequences for
field#(keywordA keywordB otherfield:noKeyword)
but I think it should be doable.

Doesn't make query parser simpler, on the other hand.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by Sergiu Gordea <gs...@ifit.uni-klu.ac.at>.

Rupinder Singh Mazara wrote:

>hi
>
>the basic problem here is that there  are data source which contain
>a) id, b) text c) title d) authors AND  d) subject heading
>  
>
>text, title and authors need to be tokenized
>
>the subject heading can be one or more words,
>  
>
the subject must be also tokennized, otherwise you cannot get any 
results that doesn't match the Term exaclty

 so ... for example, let's asume you have the folowing titles:
"George Trash Elections"
"George Trash"

if you search for "George Trash" and your title is not tokenized you 
will get just the second document (I hope I'm
not making any mistake when I say that, anyway it can be easily tested).

>anyone searching such datasource is expected to know the subject headings ,
>if the user is trying to find all articles that have the phrases
>"Jhon Kerry" and "Goerge Bush" as well as that are classified as "Election
>2004"
>it is possible that there are other documents that are classified as "Nation
>Service Records"
>or "Tax Returns" etc...
>  
>
how is there represented in the GUI as a select box? or input field?
if it is select box, if you have the concept of unique domain concept  
.. you can use a  a not tokenized string, or even a numerical
representation, but I think it is not your case.
In the case of input fields .. again I suggest you to tokenize the string

>so the object is to find documents that have the above mentioned phrases as
>well as one one
>of the subject classifiers, so as to pull out the most meaning full
>documents
>
>  
>
no problem ... once again .. use
+subject:"my searched subject"

>the subject classifiers pretain to domain knowledge, and it is possible that
>2 or more
>subject classification headings are composed of the same set of words, but
>the sequence
>in which they appear can drastically alter the meaning hence tokenizing the
>subject field
>is not exactly a healthy solution.
>  
>
the tokenization doesn't change the word order, in the case you use a 
PhraseQuery you will get the correct results

+title:"George Bush"
doesn't return documents with the title
"Bush George"

>also such search tools are meant for people who know / understand  this
>classification system
>  
>
:)) This is a general truth the the result are better when the people 
know what they are searching for :)

>Taxonomy of animals can be taken as one such example,
>
>hope this helps define the problem
>
>
>  
>
I cannot see anything special in your problem.
Before strating to implement a complex solution probably will be better 
to give it a chance to the simple one ...
I ensure you that you won't loose anything, and even if you decide to 
implement complex solutions you will have
a lot of reusable code.

 so ... Have fun,

  Sergiu

PS: if you can provide an example with a false positive please ... 
provide us the case


>
>
>  
>
>>I still don't understand what is wrong with the Idea of indexing the
>>title in a separate field and searching with a Phrase query
>>+title:"Elections 2004" ?
>>I think that the real problem is that the title is not tokenized and the
>>title contains more then "Elections 2004"
>>
>>I think it is worthing to give a try to this solution.
>>
>>Or maybe I don't understand the problem correctly ...
>>
>>All the best,
>>
>>Sergiu
>>
>>
>>
>>
>>
>>    
>>
>>>      
>>>
>>>>Aviran
>>>>http://aviran.mordos.com
>>>>
>>>>-----Original Message-----
>>>>From: Morus Walter [mailto:morus.walter@tanto.de]
>>>>Sent: Wednesday, October 20, 2004 2:25 AM
>>>>To: Lucene Users List
>>>>Subject: RE: Null or no analyzer
>>>>
>>>>
>>>>Aviran writes:
>>>>
>>>>        
>>>>
>>>>>You can use WhiteSpaceAnalyzer
>>>>>
>>>>>          
>>>>>
>>>>Can he? If "Elections 2004" is one token in the subject field (keyword),
>>>>this will fail, since WhiteSpeceAnalyzer will tokenize that to
>>>>`Elections'
>>>>and `2004'.
>>>>So I guess he has to write an identity analyzer himself unless there
>>>>is one
>>>>provided (which doesn't seem to be the case). The only alternatives
>>>>are not
>>>>using query parser or extending query parser for a key word syntax,
>>>>as far
>>>>as I can see.
>>>>
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>        
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Null or no analyzer

Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.

hi

the basic problem here is that there  are data source which contain
a) id, b) text c) title d) authors AND  d) subject heading

text, title and authors need to be tokenized

the subject heading can be one or more words,
anyone searching such datasource is expected to know the subject headings ,
if the user is trying to find all articles that have the phrases
"Jhon Kerry" and "Goerge Bush" as well as that are classified as "Election
2004"
it is possible that there are other documents that are classified as "Nation
Service Records"
or "Tax Returns" etc...

so the object is to find documents that have the above mentioned phrases as
well as one one
of the subject classifiers, so as to pull out the most meaning full
documents

the subject classifiers pretain to domain knowledge, and it is possible that
2 or more
subject classification headings are composed of the same set of words, but
the sequence
in which they appear can drastically alter the meaning hence tokenizing the
subject field
is not exactly a healthy solution.

also such search tools are meant for people who know / understand  this
classification system
Taxonomy of animals can be taken as one such example,

hope this helps define the problem





>>
>I still don't understand what is wrong with the Idea of indexing the
>title in a separate field and searching with a Phrase query
>+title:"Elections 2004" ?
>I think that the real problem is that the title is not tokenized and the
>title contains more then "Elections 2004"
>
>I think it is worthing to give a try to this solution.
>
>Or maybe I don't understand the problem correctly ...
>
>All the best,
>
> Sergiu
>
>
>
>
>
>>
>>
>>>
>>> Aviran
>>> http://aviran.mordos.com
>>>
>>> -----Original Message-----
>>> From: Morus Walter [mailto:morus.walter@tanto.de]
>>> Sent: Wednesday, October 20, 2004 2:25 AM
>>> To: Lucene Users List
>>> Subject: RE: Null or no analyzer
>>>
>>>
>>> Aviran writes:
>>>
>>>> You can use WhiteSpaceAnalyzer
>>>>
>>> Can he? If "Elections 2004" is one token in the subject field (keyword),
>>> this will fail, since WhiteSpeceAnalyzer will tokenize that to
>>> `Elections'
>>> and `2004'.
>>> So I guess he has to write an identity analyzer himself unless there
>>> is one
>>> provided (which doesn't seem to be the case). The only alternatives
>>> are not
>>> using query parser or extending query parser for a key word syntax,
>>> as far
>>> as I can see.
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by sergiu gordea <gs...@ifit.uni-klu.ac.at>.

Erik Hatcher wrote:

>
> On Oct 20, 2004, at 9:55 AM, Aviran wrote:
>
>> AFIK if the term "Election 2004" will be between quotation marks this 
>> should
>> work fine.
>
>
> No, it won't.  The Analyzer will analyze it, and the 
> WhitespaceAnalyzer would split it into two tokens [Election] and [2004].
>
> This is a tricky situation with no clear *best* way to do this sort of 
> thing.  However, given what I've seen of this thread so far I'd 
> recommend using the PerFieldAnalyzerWrapper and associate the fields 
> indexed as Field.Keyword with a KeywordAnalyzer.  There have been some 
> variants of this posted on the list - it is not included in the API, 
> however perhaps it should be.  Or perhaps there are other options to 
> solve this recurring dilemma folks have with Field.Keyword indexed 
> fields and QueryParser?
>
>     Erik
>
I still don't understand what is wrong with the Idea of indexing the 
title in a separate field and searching with a Phrase query
+title:"Elections 2004" ?
I think that the real problem is that the title is not tokenized and the 
title contains more then "Elections 2004"

I think it is worthing to give a try to this solution.

Or maybe I don't understand the problem correctly ...

All the best,

 Sergiu
 




>
>
>>
>> Aviran
>> http://aviran.mordos.com
>>
>> -----Original Message-----
>> From: Morus Walter [mailto:morus.walter@tanto.de]
>> Sent: Wednesday, October 20, 2004 2:25 AM
>> To: Lucene Users List
>> Subject: RE: Null or no analyzer
>>
>>
>> Aviran writes:
>>
>>> You can use WhiteSpaceAnalyzer
>>>
>> Can he? If "Elections 2004" is one token in the subject field (keyword),
>> this will fail, since WhiteSpeceAnalyzer will tokenize that to 
>> `Elections'
>> and `2004'.
>> So I guess he has to write an identity analyzer himself unless there 
>> is one
>> provided (which doesn't seem to be the case). The only alternatives 
>> are not
>> using query parser or extending query parser for a key word syntax, 
>> as far
>> as I can see.
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Null or no analyzer

Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.

Hi Erik

 I think the best solutuion is to have a NullAnalayzer class that
 allows a simple pass through

 The query parser then can be passed with a PerFieldAnalayzer that knows
 when to select NullAnalayzer or some other  based on the Field:"data... "
Field2:"pp"
format  this is something that the query parser is already geared up to do

regards

 Rupinder

>-----Original Message-----
>From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>Sent: 20 October 2004 16:29
>To: Lucene Users List
>Subject: Re: Null or no analyzer
>
>
>
>On Oct 20, 2004, at 9:55 AM, Aviran wrote:
>> AFIK if the term "Election 2004" will be between quotation marks this
>> should
>> work fine.
>
>No, it won't.  The Analyzer will analyze it, and the WhitespaceAnalyzer
>would split it into two tokens [Election] and [2004].
>
>This is a tricky situation with no clear *best* way to do this sort of
>thing.  However, given what I've seen of this thread so far I'd
>recommend using the PerFieldAnalyzerWrapper and associate the fields
>indexed as Field.Keyword with a KeywordAnalyzer.  There have been some
>variants of this posted on the list - it is not included in the API,
>however perhaps it should be.  Or perhaps there are other options to
>solve this recurring dilemma folks have with Field.Keyword indexed
>fields and QueryParser?
>
>	Erik
>
>
>
>>
>> Aviran
>> http://aviran.mordos.com
>>
>> -----Original Message-----
>> From: Morus Walter [mailto:morus.walter@tanto.de]
>> Sent: Wednesday, October 20, 2004 2:25 AM
>> To: Lucene Users List
>> Subject: RE: Null or no analyzer
>>
>>
>> Aviran writes:
>>> You can use WhiteSpaceAnalyzer
>>>
>> Can he? If "Elections 2004" is one token in the subject field
>> (keyword),
>> this will fail, since WhiteSpeceAnalyzer will tokenize that to
>> `Elections'
>> and `2004'.
>> So I guess he has to write an identity analyzer himself unless there
>> is one
>> provided (which doesn't seem to be the case). The only alternatives
>> are not
>> using query parser or extending query parser for a key word syntax, as
>> far
>> as I can see.
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Null or no analyzer

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Oct 20, 2004, at 9:55 AM, Aviran wrote:
> AFIK if the term "Election 2004" will be between quotation marks this 
> should
> work fine.

No, it won't.  The Analyzer will analyze it, and the WhitespaceAnalyzer 
would split it into two tokens [Election] and [2004].

This is a tricky situation with no clear *best* way to do this sort of 
thing.  However, given what I've seen of this thread so far I'd 
recommend using the PerFieldAnalyzerWrapper and associate the fields 
indexed as Field.Keyword with a KeywordAnalyzer.  There have been some 
variants of this posted on the list - it is not included in the API, 
however perhaps it should be.  Or perhaps there are other options to 
solve this recurring dilemma folks have with Field.Keyword indexed 
fields and QueryParser?

	Erik

>
> Aviran
> http://aviran.mordos.com
>
> -----Original Message-----
> From: Morus Walter [mailto:morus.walter@tanto.de]
> Sent: Wednesday, October 20, 2004 2:25 AM
> To: Lucene Users List
> Subject: RE: Null or no analyzer
>
>
> Aviran writes:
>> You can use WhiteSpaceAnalyzer
>>
> Can he? If "Elections 2004" is one token in the subject field 
> (keyword),
> this will fail, since WhiteSpeceAnalyzer will tokenize that to 
> `Elections'
> and `2004'.
> So I guess he has to write an identity analyzer himself unless there 
> is one
> provided (which doesn't seem to be the case). The only alternatives 
> are not
> using query parser or extending query parser for a key word syntax, as 
> far
> as I can see.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Null or no analyzer

Posted by Aviran <am...@infosciences.com>.

AFIK if the term "Election 2004" will be between quotation marks this should
work fine.

Aviran
http://aviran.mordos.com

-----Original Message-----
From: Morus Walter [mailto:morus.walter@tanto.de] 
Sent: Wednesday, October 20, 2004 2:25 AM
To: Lucene Users List
Subject: RE: Null or no analyzer


Aviran writes:
> You can use WhiteSpaceAnalyzer
> 
Can he? If "Elections 2004" is one token in the subject field (keyword), 
this will fail, since WhiteSpeceAnalyzer will tokenize that to `Elections' 
and `2004'.
So I guess he has to write an identity analyzer himself unless there is one
provided (which doesn't seem to be the case). The only alternatives are not
using query parser or extending query parser for a key word syntax, as far
as I can see.




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Null or no analyzer

Posted by Morus Walter <mo...@tanto.de>.

Aviran writes:
> You can use WhiteSpaceAnalyzer
> 
Can he? If "Elections 2004" is one token in the subject field (keyword), 
this will fail, since WhiteSpeceAnalyzer will tokenize that to `Elections' 
and `2004'.
So I guess he has to write an identity analyzer himself unless there is
one provided (which doesn't seem to be the case).
The only alternatives are not using query parser or extending query parser
for a key word syntax, as far as I can see.

Morus
> 
> -----Original Message-----
> From: Rupinder Singh Mazara [mailto:rsmazara@ebi.ac.uk] 
> Sent: Tuesday, October 19, 2004 11:23 AM
> To: Lucene Users List
> Subject: Null or no analyzer
> 
> 
> Hi All
> 
>   I have a question regarding selection of Analyzer's during query parsing
> 
> 
>   i have three field in my index db_id, full_text, subject
>   all three are indexed, however while indexing I specified to lucene to
> index db_id and subject but not tokenize them
> 
>   I want to give a single search box in my application to enable searching
> for documents
>   some query can look lile  "motor cross rally" this will get fed to
> QueryParser to do the relevent parsing
> 
>   however if the user enters  Jhon Kerry  subject:"Elections 2004" I want to
> make sure that No analyzer is used fro the subject field ? how can that be
> done.
> 
>   this is because I expect the users to know the subject from a List of
> controlled vocabularies and also I am searching for  documents that have the
> exact subject I tried using the "PerFieldAnalyzerWrapper", but how do I get
> hold a Analyzer that  does nothing but pass the text trough to the Searcher
> ?
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Null or no analyzer

Posted by Aviran <am...@infosciences.com>.

You can use WhiteSpaceAnalyzer

Aviran
http://aviran.mordos.com

-----Original Message-----
From: Rupinder Singh Mazara [mailto:rsmazara@ebi.ac.uk] 
Sent: Tuesday, October 19, 2004 11:23 AM
To: Lucene Users List
Subject: Null or no analyzer


Hi All

  I have a question regarding selection of Analyzer's during query parsing


  i have three field in my index db_id, full_text, subject
  all three are indexed, however while indexing I specified to lucene to
index db_id and subject but not tokenize them

  I want to give a single search box in my application to enable searching
for documents
  some query can look lile  "motor cross rally" this will get fed to
QueryParser to do the relevent parsing

  however if the user enters  Jhon Kerry  subject:"Elections 2004" I want to
make sure that No analyzer is used fro the subject field ? how can that be
done.

  this is because I expect the users to know the subject from a List of
controlled vocabularies and also I am searching for  documents that have the
exact subject I tried using the "PerFieldAnalyzerWrapper", but how do I get
hold a Analyzer that  does nothing but pass the text trough to the Searcher
?




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org