You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Xavier To <to...@courrier.uqam.ca> on 2007/02/05 20:09:06 UTC

Re : Re: Re : Re: Re : Re: Problem with a search engine

Thanks ! 

I thought that would be the case too, but it's not. "2003" is just stored in the "contents" field as everything else. The only field indexed is the "contents" field. Since only the "contents" field is indexed, everything that is searched should be found. The number problem does restrict itself to dates but let's say I search for "4wd", it will ignore the number and search for the rest, even if the query is "4wd" before and after the QueryParser manipulation...

Xavier Tô
Bacc. en Informatique et Génie Logiciel
to.xavier@courrier.uqam.ca
(450)434-8905

----- Message d'origine -----
De: Chiradeep Vittal <ra...@yahoo.com>
Date: Lundi, Février 5, 2007 1:57 pm
Objet: Re: Re : Re: Re : Re: Problem with a search engine

> Perhaps the number (dates?) are being indexed in a separate field? 
> Lucene will only search the default field with the queries you have 
> shown. If, for instance the year was being stored in the "year" 
> field, then your query should be 
> report AND year:2003
> 
> HTH
> 
> ----- Original Message ----
> From: Xavier To <to...@courrier.uqam.ca>
> To: java-user@lucene.apache.org
> Sent: Monday, February 5, 2007 10:03:51 AM
> Subject: Re : Re: Re : Re: Problem with a search engine
> 
> Thanks for your help !
> 
> Wow, I never expected that many replies. Cool !
> 
> I did try to print out the query before and after it gets processed 
> by QueryParser and let say my query is "2003", before and after it 
> will be "2003". If I put "report 2003" the query will be, before 
> and after getting into the parser, "report 2003". Problem is that 
> documents are indexed with "2003" or so Luke says but they are 
> never found by our search engine. So even if I search "report AND 
> 2003" I will only get results for the "report" token and nothing 
> for "2003" even if there are 7 documents found by Luke, using the 
> StandardAnalyzer (which is the one we're using). I would restart a 
> new search engine from scratch, not refactoring Java to Lucene but 
> using Lucene from the start, but it would require so many 
> modification elsewhere....
> 
> Xavier Tô
> Bacc. en Informatique et Génie Logiciel
> to.xavier@courrier.uqam.ca
> (450)434-8905
> 
> ----- Message d'origine -----
> De: Erick Erickson <er...@gmail.com>
> Date: Lundi, Février 5, 2007 12:37 pm
> Objet: Re: Re : Re: Problem with a search engine
> 
> > Have you tried looking at the actual query submitted with 
> > Query.toString()?That might give you an insight into what is 
> > actually being submitted to
> > Lucene and a place to start.
> > 
> > Also be aware that QueryParser, the default operator is OR which 
> > can produce
> > unexpected results if you assume AND.
> > 
> > Best
> > Erick
> > 
> > On 2/5/07, Xavier To <to...@courrier.uqam.ca> wrote:
> > >
> > > Thanks for your help,
> > >
> > > As I stated before, the numbers, whether pure or not, are 
> > indexed, for I
> > > can search them with luke. But supposing what you're saying was 
> > the case,
> > > the search for "10-year" should return 4 items (according to 
> the 
> > number of
> > > occurence found by luke). Problem is that the number of 
> documents 
> > returned> is 6, for it ignored the "10" and searched for "-year".
> > >
> > > Xavier Tô
> > > Bacc. en Informatique et Génie Logiciel
> > > to.xavier@courrier.uqam.ca
> > > (450)434-8905
> > >
> > > ----- Message d'origine -----
> > > De: Mark Miller <ma...@gmail.com>
> > > Date: Lundi, Février 5, 2007 11:11 am
> > > Objet: Re: Problem with a search engine
> > >
> > > > StandardAnalyzer does not index pure numbers. It will index
> > > > alphanumerictokens and numbers that are connected with one of:
> > > > "_"|"-"|"/"|"."|"," If
> > > > you wish to index pure numbers you might want to add another 
> > regex to
> > > > StandardAnalyzer that recognizes a series of digits - don't 
> forget> > > to add
> > > > the new token type to the grammar lower in the 
> > StandardTokenizer.jj> > file.
> > > > - Mark
> > > >
> > > > On 2/5/07, Xavier To <to...@courrier.uqam.ca> wrote:
> > > > >
> > > > > Thanks for taking time to answer me. The problem is that 
> I'm not
> > > > allowed> to post code due to a confidentiality contract that 
> I was
> > > > required to sign.
> > > > > I'll try to see if I can get a special permission to post code
> > > > since I'm
> > > > > wasting so much time trying to find the answer to this.
> > > > >
> > > > > I tried looking for each time the query is touched and numbers
> > > > are still
> > > > > present in the query. I don't know if it's the analyzer, 
> but if
> > > > it was,
> > > > > woundl't the numbers be cut out of the index completely ? 
> As I
> > > > said in my
> > > > > 1st post, they are "findable" with Lukeall. If I read 
> right, the
> > > > > FrenchAnalyzer included in lucene is supposed to be based on
> > > > > StandardAnalyzer so I really fail to see what is going wrong.
> > > > Might it be
> > > > > the fact that the tokenizer used is Stringtokenizer and not
> > > > Tokenstream ?
> > > > > The numbers are tokenized, and in the returned query they are
> > > > present....>
> > > > > I really don't know where they get zapped out of existence...
> > > > >
> > > > > Thanks again for helping.
> > > > >
> > > > > Xavier Tô
> > > > > Bacc. en Informatique et Génie Logiciel
> > > > > to.xavier@courrier.uqam.ca
> > > > > (450)434-8905
> > > > >
> > > > >
> > > > > ------------------------------------------------------------
> --
> > ----
> > > > --------------------
> > > > >
> > > > > Hard to tell without seeing any code.  Perhaps numbers are 
> being> > > removed> from the query string
> > > > > during search.
> > > > > Make sure the same or at least "compatible" Analyzer is used
> > > > during both
> > > > > indexing and querying.
> > > > > Grab the code from Lucene in Action .... hm, lucenebook.com 
> may> > > be down at
> > > > > the moment, but
> > > > > that's where you can get the code normally.  The code 
> > includes some
> > > > > classes that let you run
> > > > > a query string through a set of Analyzers and see how each of
> > > > them behaves
> > > > > and what it does
> > > > > to a query.
> > > > >
> > > > > Otis
> > > > >
> > > > > ----- Original Message ----
> > > > > From: "To, Xavier" <Xa...@axa-canada.com>
> > > > > To: java-user@lucene.apache.org
> > > > > Sent: Wednesday, January 31, 2007 12:21:27 AM
> > > > > Subject: Problem with a search engine
> > > > >
> > > > >
> > > > > Hi, I recently started an internship and I've been asked to 
> fix> > > their> search engine so numbers are searched. At first, I 
> thought> > > it was the
> > > > > Analyzer that wasn't working right, but we're using
> > > > StandardAnalyzer and
> > > > > the numbers are indexed (I checked with Lukeall). Then I 
> thought> > > they> are not tokenized during the search, but they 
> are. They just
> > > > seem to be
> > > > > ignored for some reason. Did anyone experienced something 
> > similar> > ? If
> > > > > so, how can I fix this ? It's probably something that would 
> jump> > > in my
> > > > > face if it was alive, but I just can't see it. Can anyone 
> > help me
> > > > ? It
> > > > > would be very much appreciated.
> > > > >
> > > > >
> > > > > Xavier T�
> > > > > Stagiaire
> > > > > D�veloppement - Maintenance & �volution
> > > > > AXA Canada Tech
> > > > > 2020, rue University, bureau 700
> > > > > Montr�al(Qu�bec)H3A 2A5
> > > > > T�l. :  (514) 282-6817, poste 2224
> > > > > T�l�c. :  (514) 282-6017
> > > > > Courriel : Xavier.To@axa-canada.com <Xa...@axa-canada.com>
> > > > >   _____
> > > > >
> > > > > "Ce message est confidentiel, � l'usage exclusif du 
> destinataire> > > > ci-dessus et son contenu ne repr�sente en aucun 
> cas un 
> > engagement> > de la
> > > > > part de AXA, sauf en cas de stipulation expresse et par 
> �crit de
> > > > la part
> > > > > de AXA. Toute publication, utilisation ou diffusion, m�me 
> > partielle,> > > doit �tre autoris�e pr�alablement. Si vous n'�tes 
> pas> > > destinataire de ce
> > > > > message, merci d'en avertir imm�diatement l'exp�diteur."
> > > > >
> > > > > "This e-mail message is confidential, for the exclusive use 
> > of the
> > > > > addressee and its contents shall not constitute a 
> commitment 
> > by AXA,
> > > > > except as otherwise specifically provided in writing by 
> AXA. Any
> > > > > unauthorized disclosure, use or dissemination, either whole or
> > > > partial,> is prohibited. If you are not the intended 
> recipient of
> > > > the message,
> > > > > please notify the sender immediately."
> > > > >
> > > > > ------------------------------------------------------------
> --
> > ----
> > > > ---
> > > > > To unsubscribe, e-mail: java-user-
> unsubscribe@lucene.apache.org> > > > For additional commands, e-
> mail: java-user-
> > help@lucene.apache.org> > >
> > > > >
> > > > >
> > > > >
> > > > > ------------------------------------------------------------
> --
> > ----
> > > > ---
> > > > > To unsubscribe, e-mail: java-user-
> unsubscribe@lucene.apache.org> > > > For additional commands, e-
> mail: java-user-
> > help@lucene.apache.org> > >
> > > > >
> > > >
> > >
> > >
> > > ----------------------------------------------------------------
> --
> > ---
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> 
> 
> --------------------------------------------------------------------
> -
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> --------------------------------------------------------------------
> -
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org