You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by Andy Berryman <to...@gmail.com> on 2007/02/14 22:50:30 UTC

Looking for some help with Programmatic vs Parse Query Building

In my application, I was previously building queries as a string and I'm
having to convert over to the API because of the need to use the Wildcard
Query.  I'm running into a few searching issues and they all seem to center
around the fact that the field is of *TEXT* type which means it is Analyzed
when indexed.

Assume that my field name is *Title* and it is of *TEXT* type.  Also assume
that I am using the StandardAnalyzer.

I have a document stored in the index that had the original text of "I was
on the cat-walk".  During the index process, I know that the stop words are
removed and that certain characters are stripped.  So basically, the end
result was that the terms ... "I", "cat", and "walk" ... were stored in the
index.

My previous code was doing the simplest case to get the Query by just
building ... *Title:"I was on the cat-walk"* ... and passing that into the
Parse method.  Since the analyzer is part of that method call, it was doing
all of the necessary stripping within the query for me and thus the search
was working just fine.  It was returning the Query ... *Title:"i cat walk"*.

With the new code, I'm now buidling the query like this ... TermQuery tq =
new TermQuery(new Term("Title", "I was on the cat-walk")) ... And this is
NOT working.  And the reason is because there is no analysis being done on
the string being searched.  I can certainly write a loop pretty simply to do
the stripping of the stop words, but I dont really know what to do about the
special characters.

The main problem I'm looking into is that my end-users are unable to search
for just "cat-walk" and get results.  But if they search for "cat walk",
they get the result you would expect.

Hopefully someone out there has tackled this issue before and can show me an
example of how to do this without having to re-invent the wheel.

Thanks
Andy

Re: Looking for some help with Programmatic vs Parse Query Building

Posted by Andy Berryman <to...@gmail.com>.

To anyone curious of the solution that I came to ...

=====================================================
   public static string[]
AnalyzeTerms(Lucene.Net.Analysis.Analyzeranalyzer, string terms)
   {
       System.Collections.Generic.List<string> returnObj = new
List<string>();
       Lucene.Net.Analysis.Token t = null;
       Lucene.Net.Analysis.TokenStream ts = analyzer.TokenStream(new
System.IO.StringReader(terms));
       while ((t = ts.Next()) != null)
       {
           returnObj.Add(t.TermText());
       }
       return returnObj.ToArray();
   }
=====================================================

Not completely obvious how to do that kind of thing "out-of-the-box", but
eventually I got to it.  :-)

Andy


On 2/15/07, Jokin Cuadrado <jo...@gmail.com> wrote:
>
> why don't you use the StandardAnalyzer?
>
> as far as i know, you can create an instance of the analyzer and get
> the tokens off the query.
>
> watch how is it done in the queryparser an reproduce it.
>
> --
> Jokin.
>
> On 2/14/07, Andy Berryman <to...@gmail.com> wrote:
> > I noticed that my message appeared in raw text with some "*" in
> it.  That
> > was due to the rich text editor I was using to type the email.  Please
> > ignore those.
> >
> > Andy
> >
> >
> > On 2/14/07, Andy Berryman <to...@gmail.com> wrote:
> > >
> > > In my application, I was previously building queries as a string and
> I'm
> > > having to convert over to the API because of the need to use the
> Wildcard
> > > Query.  I'm running into a few searching issues and they all seem to
> center
> > > around the fact that the field is of *TEXT* type which means it is
> > > Analyzed when indexed.
> > >
> > > Assume that my field name is *Title* and it is of *TEXT* type.  Also
> > > assume that I am using the StandardAnalyzer.
> > >
> > > I have a document stored in the index that had the original text of "I
> was
> > > on the cat-walk".  During the index process, I know that the stop
> words
> > > are removed and that certain characters are stripped.  So basically,
> the end
> > > result was that the terms ... "I", "cat", and "walk" ... were stored
> in
> > > the index.
> > >
> > > My previous code was doing the simplest case to get the Query by just
> > > building ... *Title:"I was on the cat-walk"* ... and passing that into
> the
> > > Parse method.  Since the analyzer is part of that method call, it was
> doing
> > > all of the necessary stripping within the query for me and thus the
> search
> > > was working just fine.  It was returning the Query ... *Title:"i cat
> walk"
> > > *.
> > >
> > > With the new code, I'm now buidling the query like this ... TermQuery
> tq =
> > > new TermQuery (new Term("Title", "I was on the cat-walk")) ... And
> this is
> > > NOT working.  And the reason is because there is no analysis being
> done on
> > > the string being searched.  I can certainly write a loop pretty simply
> to do
> > > the stripping of the stop words, but I dont really know what to do
> about the
> > > special characters.
> > >
> > > The main problem I'm looking into is that my end-users are unable to
> > > search for just "cat-walk" and get results.  But if they search for
> "cat
> > > walk", they get the result you would expect.
> > >
> > > Hopefully someone out there has tackled this issue before and can show
> me
> > > an example of how to do this without having to re-invent the wheel.
> > >
> > > Thanks
> > > Andy
> > >
> >
>

Re: Looking for some help with Programmatic vs Parse Query Building

Posted by Jokin Cuadrado <jo...@gmail.com>.

why don't you use the StandardAnalyzer?

as far as i know, you can create an instance of the analyzer and get
the tokens off the query.

watch how is it done in the queryparser an reproduce it.

-- 
Jokin.

On 2/14/07, Andy Berryman <to...@gmail.com> wrote:
> I noticed that my message appeared in raw text with some "*" in it.  That
> was due to the rich text editor I was using to type the email.  Please
> ignore those.
>
> Andy
>
>
> On 2/14/07, Andy Berryman <to...@gmail.com> wrote:
> >
> > In my application, I was previously building queries as a string and I'm
> > having to convert over to the API because of the need to use the Wildcard
> > Query.  I'm running into a few searching issues and they all seem to center
> > around the fact that the field is of *TEXT* type which means it is
> > Analyzed when indexed.
> >
> > Assume that my field name is *Title* and it is of *TEXT* type.  Also
> > assume that I am using the StandardAnalyzer.
> >
> > I have a document stored in the index that had the original text of "I was
> > on the cat-walk".  During the index process, I know that the stop words
> > are removed and that certain characters are stripped.  So basically, the end
> > result was that the terms ... "I", "cat", and "walk" ... were stored in
> > the index.
> >
> > My previous code was doing the simplest case to get the Query by just
> > building ... *Title:"I was on the cat-walk"* ... and passing that into the
> > Parse method.  Since the analyzer is part of that method call, it was doing
> > all of the necessary stripping within the query for me and thus the search
> > was working just fine.  It was returning the Query ... *Title:"i cat walk"
> > *.
> >
> > With the new code, I'm now buidling the query like this ... TermQuery tq =
> > new TermQuery (new Term("Title", "I was on the cat-walk")) ... And this is
> > NOT working.  And the reason is because there is no analysis being done on
> > the string being searched.  I can certainly write a loop pretty simply to do
> > the stripping of the stop words, but I dont really know what to do about the
> > special characters.
> >
> > The main problem I'm looking into is that my end-users are unable to
> > search for just "cat-walk" and get results.  But if they search for "cat
> > walk", they get the result you would expect.
> >
> > Hopefully someone out there has tackled this issue before and can show me
> > an example of how to do this without having to re-invent the wheel.
> >
> > Thanks
> > Andy
> >
>

Re: Looking for some help with Programmatic vs Parse Query Building

Posted by Andy Berryman <to...@gmail.com>.

I noticed that my message appeared in raw text with some "*" in it.  That
was due to the rich text editor I was using to type the email.  Please
ignore those.

Andy


On 2/14/07, Andy Berryman <to...@gmail.com> wrote:
>
> In my application, I was previously building queries as a string and I'm
> having to convert over to the API because of the need to use the Wildcard
> Query.  I'm running into a few searching issues and they all seem to center
> around the fact that the field is of *TEXT* type which means it is
> Analyzed when indexed.
>
> Assume that my field name is *Title* and it is of *TEXT* type.  Also
> assume that I am using the StandardAnalyzer.
>
> I have a document stored in the index that had the original text of "I was
> on the cat-walk".  During the index process, I know that the stop words
> are removed and that certain characters are stripped.  So basically, the end
> result was that the terms ... "I", "cat", and "walk" ... were stored in
> the index.
>
> My previous code was doing the simplest case to get the Query by just
> building ... *Title:"I was on the cat-walk"* ... and passing that into the
> Parse method.  Since the analyzer is part of that method call, it was doing
> all of the necessary stripping within the query for me and thus the search
> was working just fine.  It was returning the Query ... *Title:"i cat walk"
> *.
>
> With the new code, I'm now buidling the query like this ... TermQuery tq =
> new TermQuery (new Term("Title", "I was on the cat-walk")) ... And this is
> NOT working.  And the reason is because there is no analysis being done on
> the string being searched.  I can certainly write a loop pretty simply to do
> the stripping of the stop words, but I dont really know what to do about the
> special characters.
>
> The main problem I'm looking into is that my end-users are unable to
> search for just "cat-walk" and get results.  But if they search for "cat
> walk", they get the result you would expect.
>
> Hopefully someone out there has tackled this issue before and can show me
> an example of how to do this without having to re-invent the wheel.
>
> Thanks
> Andy
>

Re: Looking for some help with Programmatic vs Parse Query Building

Posted by Andy Berryman <to...@gmail.com>.

I noticed that my message appeared in raw text with some "*" in it.  That
was due to the rich text editor I was using to type the email.  Please
ignore those.

Andy


On 2/14/07, Andy Berryman <to...@gmail.com> wrote:
>
> In my application, I was previously building queries as a string and I'm
> having to convert over to the API because of the need to use the Wildcard
> Query.  I'm running into a few searching issues and they all seem to center
> around the fact that the field is of *TEXT* type which means it is
> Analyzed when indexed.
>
> Assume that my field name is *Title* and it is of *TEXT* type.  Also
> assume that I am using the StandardAnalyzer.
>
> I have a document stored in the index that had the original text of "I was
> on the cat-walk".  During the index process, I know that the stop words
> are removed and that certain characters are stripped.  So basically, the end
> result was that the terms ... "I", "cat", and "walk" ... were stored in
> the index.
>
> My previous code was doing the simplest case to get the Query by just
> building ... *Title:"I was on the cat-walk"* ... and passing that into the
> Parse method.  Since the analyzer is part of that method call, it was doing
> all of the necessary stripping within the query for me and thus the search
> was working just fine.  It was returning the Query ... *Title:"i cat walk"
> *.
>
> With the new code, I'm now buidling the query like this ... TermQuery tq =
> new TermQuery (new Term("Title", "I was on the cat-walk")) ... And this is
> NOT working.  And the reason is because there is no analysis being done on
> the string being searched.  I can certainly write a loop pretty simply to do
> the stripping of the stop words, but I dont really know what to do about the
> special characters.
>
> The main problem I'm looking into is that my end-users are unable to
> search for just "cat-walk" and get results.  But if they search for "cat
> walk", they get the result you would expect.
>
> Hopefully someone out there has tackled this issue before and can show me
> an example of how to do this without having to re-invent the wheel.
>
> Thanks
> Andy
>