You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@lucenenet.apache.org by Floyd Wu <fl...@gmail.com> on 2009/03/04 09:31:36 UTC

Question about StandardAnalyzer.cs

Hi all,
My problem is I have a field and the field is set to be  Indexed & Stored.
The index value is Z123456.
But when I using StandardAnalyzer to search this field, it seems  that
StandarAnalyzer will transaform my query text "Z123456" to "z123456". After
walk through source code, I found following lines:
  public override TokenStream TokenStream(System.String fieldName,
System.IO.TextReader reader)
  {
   StandardTokenizer tokenStream = new StandardTokenizer(reader,
replaceInvalidAcronym);
   tokenStream.SetMaxTokenLength(maxTokenLength);
   TokenStream result = new StandardFilter(tokenStream);
   result = new LowerCaseFilter(result);
   result = new StopFilter(result, stopSet);
   return result;
  }

Why using LoweCasefilter() here? If I comment out this line, will I have any
potential problems?
I think my "Z123456" to "z123456" is transformed by this filter.

Re: Question about StandardAnalyzer.cs

Posted by Jokin Cuadrado <jo...@gmail.com>.

that you are indexing and searching the same term, so lucene founds it.
Make the test with Z123456 and post the code, we will tell you were is the
fault.

On Thu, Mar 5, 2009 at 10:25 AM, Floyd Wu <fl...@gmail.com> wrote:

> To simplify the question, I di another test. I create index with the
> original document but this time I set "Z123456" to "z123456" and then put
> it
> into lucene index. Fire the query and I got what I want. What does it mean?
>
>
>
>
> 2009/3/5 Jokin Cuadrado <jo...@gmail.com>
>
> > could expand a bit more the code? at least i wan to see where you
> > instantiate the analyzer, where you open the writer, what is the term you
> > use as key for update the document and how you create the document fields
> > also, for discard another kind of problems and isolate the problem, you
> can
> > make something like this in pseudocode:
> >
> > Create a new index
> > add 1 document (with just 1 indexed, stored and tokenized field
> containing
> > "Z123456")
> > close index
> > open index
> > search document
> > close
> >
> > and test if it works, if don't, post your code and we will see what is
> > happening.
> >
> >
> > On Thu, Mar 5, 2009 at 8:58 AM, Floyd Wu <fl...@gmail.com> wrote:
> >
> > > Hi Jokin,
> > >
> > > Thanks for your reply, and I'm very sure that using analyzer (from SVN
> > trun
> > > compiled one) to index my document. The following is the code snippet
> > >
> > >                m_Writer.UpdateDocument(
> > >                    term,
> > >                    LuceneDocumentConverter.ToDocument(content),
> > >                    Analyzer);
> > > pretty simple, and I pass the analyzer into.
> > > I don't know why.
> > >
> > > 2009/3/5 Jokin Cuadrado <jo...@gmail.com>
> > >
> > > > First of all, the field stored value is different from the indexed
> > > > terms value, wich of them are you telling to us? if you remove the
> > > > lowercase filter it works, so I,m pretty sure that you are not doing
> > > > that at index writing time, so you are not using the standaranalyzer,
> > > > or you have used a version without the lowercase filter. Might you
> > > > post the snippet of the index creator code?
> > > >
> > > >
> > > > On 3/5/09, Floyd Wu <fl...@gmail.com> wrote:
> > > > > Hi Michael,
> > > > > I'm sure that I use StandardAnalyzer when indexing. The problem is
> I
> > > need
> > > > to
> > > > > get search result when I query "Z123456" to my index filed named
> > > > "author_id"
> > > > > and currently this field value is "Z123456" shown by Luke-0.8.1 in
> > > index.
> > > > >
> > > > > I'm stuck here for a month. Please help on this.
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > 2009/3/5 Michael Mitiaguin <mi...@gmail.com>
> > > > >
> > > > >> As mentioned in this thread could you re-check that you explicitly
> > >  use
> > > > >> StandardAnalyzer when indexing.
> > > > >> I must admit though I am still using 2.0.4
> > > > >>
> > > > >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> > > > >>
> > > > >> In Luke if  you to select plugins > Analyzer tool  >
> > StandardAnalyzer
> > > > >> it also makes lowercase
> > > > >> Original text : Z123456  tokens found : z123456
> > > > >>
> > > > >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <fl...@gmail.com>
> > wrote:
> > > > >>
> > > > >> > I'm sure the application and Luke use the same analyzer,
> > > > >> > StandardAnalyer.
> > > > >> > But I can't search "Z123456" and I don't know why. As log as I
> > > > >> > commentted
> > > > >> > out StandardAnalyzer.cs
> > > > >> > line: result = new LowerCaseFilter(result);
> > > > >> > The result will be what I want.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > > > >> >
> > > > >> > > using luke you could use another analyzers as well, so use the
> > > > keyword
> > > > >> > > analyzer for example. But regards your application, you must
> use
> > > the
> > > > >> same
> > > > >> > > analyzer whe you make your index and when you query it.
> > > > >> > >
> > > > >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <floyd.wu@gmail.com
> >
> > > > wrote:
> > > > >> > >
> > > > >> > > > But the current situation is: I can't search any result with
> > > > >> "Z123456"
> > > > >> > > when
> > > > >> > > > I type "Z123456" or "z123456".
> > > > >> > > >
> > > > >> > > > I'm using StandardAnalyzer and by using luke, the value
> > indexed
> > > is
> > > > >> > > > "Z123456".
> > > > >> > > > How can I fix this problem?
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > > > >> > > >
> > > > >> > > > > the rationale behind using the lowercase filter, is that
> it
> > > > would
> > > > >> > mach
> > > > >> > > > when
> > > > >> > > > > you search both of Z123456 and z132456, so the searchs are
> > > case
> > > > >> > > > > insensitive,
> > > > >> > > > > however, as with any filter, you must use the same
> analyzer
> > > when
> > > > >> > > indexing
> > > > >> > > > > your documents, Are you doing that?
> > > > >> > > > >
> > > > >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <
> > floyd.wu@gmail.com>
> > > > >> wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi all,
> > > > >> > > > > > My problem is I have a field and the field is set to be
> > > >  Indexed
> > > > >> &
> > > > >> > > > > Stored.
> > > > >> > > > > > The index value is Z123456.
> > > > >> > > > > > But when I using StandardAnalyzer to search this field,
> it
> > > > seems
> > > > >> > >  that
> > > > >> > > > > > StandarAnalyzer will transaform my query text "Z123456"
> to
> > > > >> > "z123456".
> > > > >> > > > > After
> > > > >> > > > > > walk through source code, I found following lines:
> > > > >> > > > > >  public override TokenStream TokenStream(System.String
> > > > >> > > > > > fieldName,
> > > > >> > > > > > System.IO.TextReader reader)
> > > > >> > > > > >  {
> > > > >> > > > > >   StandardTokenizer tokenStream = new
> > > > StandardTokenizer(reader,
> > > > >> > > > > > replaceInvalidAcronym);
> > > > >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > > > >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> > > > >> > > > > >   result = new LowerCaseFilter(result);
> > > > >> > > > > >   result = new StopFilter(result, stopSet);
> > > > >> > > > > >   return result;
> > > > >> > > > > >  }
> > > > >> > > > > >
> > > > >> > > > > > Why using LoweCasefilter() here? If I comment out this
> > line,
> > > > >> > > > > > will
> > > > >> I
> > > > >> > > > have
> > > > >> > > > > > any
> > > > >> > > > > > potential problems?
> > > > >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> > > > filter.
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Jokin
> > > > >> > > > > Sent from: Sant cugat del valles  Spain.
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Jokin
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > > >
> > > > --
> > > > Jokin
> > > >
> > >
> >
> >
> >
> > --
> > Jokin
> >
>



-- 
Jokin
Sent from: Barcelona Catalonia Spain.

Re: Question about StandardAnalyzer.cs

Posted by Floyd Wu <fl...@gmail.com>.

To simplify the question, I di another test. I create index with the
original document but this time I set "Z123456" to "z123456" and then put it
into lucene index. Fire the query and I got what I want. What does it mean?




2009/3/5 Jokin Cuadrado <jo...@gmail.com>

> could expand a bit more the code? at least i wan to see where you
> instantiate the analyzer, where you open the writer, what is the term you
> use as key for update the document and how you create the document fields
> also, for discard another kind of problems and isolate the problem, you can
> make something like this in pseudocode:
>
> Create a new index
> add 1 document (with just 1 indexed, stored and tokenized field containing
> "Z123456")
> close index
> open index
> search document
> close
>
> and test if it works, if don't, post your code and we will see what is
> happening.
>
>
> On Thu, Mar 5, 2009 at 8:58 AM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi Jokin,
> >
> > Thanks for your reply, and I'm very sure that using analyzer (from SVN
> trun
> > compiled one) to index my document. The following is the code snippet
> >
> >                m_Writer.UpdateDocument(
> >                    term,
> >                    LuceneDocumentConverter.ToDocument(content),
> >                    Analyzer);
> > pretty simple, and I pass the analyzer into.
> > I don't know why.
> >
> > 2009/3/5 Jokin Cuadrado <jo...@gmail.com>
> >
> > > First of all, the field stored value is different from the indexed
> > > terms value, wich of them are you telling to us? if you remove the
> > > lowercase filter it works, so I,m pretty sure that you are not doing
> > > that at index writing time, so you are not using the standaranalyzer,
> > > or you have used a version without the lowercase filter. Might you
> > > post the snippet of the index creator code?
> > >
> > >
> > > On 3/5/09, Floyd Wu <fl...@gmail.com> wrote:
> > > > Hi Michael,
> > > > I'm sure that I use StandardAnalyzer when indexing. The problem is I
> > need
> > > to
> > > > get search result when I query "Z123456" to my index filed named
> > > "author_id"
> > > > and currently this field value is "Z123456" shown by Luke-0.8.1 in
> > index.
> > > >
> > > > I'm stuck here for a month. Please help on this.
> > > > Thanks
> > > >
> > > >
> > > >
> > > > 2009/3/5 Michael Mitiaguin <mi...@gmail.com>
> > > >
> > > >> As mentioned in this thread could you re-check that you explicitly
> >  use
> > > >> StandardAnalyzer when indexing.
> > > >> I must admit though I am still using 2.0.4
> > > >>
> > > >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> > > >>
> > > >> In Luke if  you to select plugins > Analyzer tool  >
> StandardAnalyzer
> > > >> it also makes lowercase
> > > >> Original text : Z123456  tokens found : z123456
> > > >>
> > > >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <fl...@gmail.com>
> wrote:
> > > >>
> > > >> > I'm sure the application and Luke use the same analyzer,
> > > >> > StandardAnalyer.
> > > >> > But I can't search "Z123456" and I don't know why. As log as I
> > > >> > commentted
> > > >> > out StandardAnalyzer.cs
> > > >> > line: result = new LowerCaseFilter(result);
> > > >> > The result will be what I want.
> > > >> >
> > > >> >
> > > >> >
> > > >> > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > > >> >
> > > >> > > using luke you could use another analyzers as well, so use the
> > > keyword
> > > >> > > analyzer for example. But regards your application, you must use
> > the
> > > >> same
> > > >> > > analyzer whe you make your index and when you query it.
> > > >> > >
> > > >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > But the current situation is: I can't search any result with
> > > >> "Z123456"
> > > >> > > when
> > > >> > > > I type "Z123456" or "z123456".
> > > >> > > >
> > > >> > > > I'm using StandardAnalyzer and by using luke, the value
> indexed
> > is
> > > >> > > > "Z123456".
> > > >> > > > How can I fix this problem?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > > >> > > >
> > > >> > > > > the rationale behind using the lowercase filter, is that it
> > > would
> > > >> > mach
> > > >> > > > when
> > > >> > > > > you search both of Z123456 and z132456, so the searchs are
> > case
> > > >> > > > > insensitive,
> > > >> > > > > however, as with any filter, you must use the same analyzer
> > when
> > > >> > > indexing
> > > >> > > > > your documents, Are you doing that?
> > > >> > > > >
> > > >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <
> floyd.wu@gmail.com>
> > > >> wrote:
> > > >> > > > >
> > > >> > > > > > Hi all,
> > > >> > > > > > My problem is I have a field and the field is set to be
> > >  Indexed
> > > >> &
> > > >> > > > > Stored.
> > > >> > > > > > The index value is Z123456.
> > > >> > > > > > But when I using StandardAnalyzer to search this field, it
> > > seems
> > > >> > >  that
> > > >> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
> > > >> > "z123456".
> > > >> > > > > After
> > > >> > > > > > walk through source code, I found following lines:
> > > >> > > > > >  public override TokenStream TokenStream(System.String
> > > >> > > > > > fieldName,
> > > >> > > > > > System.IO.TextReader reader)
> > > >> > > > > >  {
> > > >> > > > > >   StandardTokenizer tokenStream = new
> > > StandardTokenizer(reader,
> > > >> > > > > > replaceInvalidAcronym);
> > > >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > > >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> > > >> > > > > >   result = new LowerCaseFilter(result);
> > > >> > > > > >   result = new StopFilter(result, stopSet);
> > > >> > > > > >   return result;
> > > >> > > > > >  }
> > > >> > > > > >
> > > >> > > > > > Why using LoweCasefilter() here? If I comment out this
> line,
> > > >> > > > > > will
> > > >> I
> > > >> > > > have
> > > >> > > > > > any
> > > >> > > > > > potential problems?
> > > >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> > > filter.
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Jokin
> > > >> > > > > Sent from: Sant cugat del valles  Spain.
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Jokin
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> > >
> > > --
> > > Jokin
> > >
> >
>
>
>
> --
> Jokin
>

Re: Question about StandardAnalyzer.cs

Posted by Jokin Cuadrado <jo...@gmail.com>.

could expand a bit more the code? at least i wan to see where you
instantiate the analyzer, where you open the writer, what is the term you
use as key for update the document and how you create the document fields
also, for discard another kind of problems and isolate the problem, you can
make something like this in pseudocode:

Create a new index
add 1 document (with just 1 indexed, stored and tokenized field containing
"Z123456")
close index
open index
search document
close

and test if it works, if don't, post your code and we will see what is
happening.


On Thu, Mar 5, 2009 at 8:58 AM, Floyd Wu <fl...@gmail.com> wrote:

> Hi Jokin,
>
> Thanks for your reply, and I'm very sure that using analyzer (from SVN trun
> compiled one) to index my document. The following is the code snippet
>
>                m_Writer.UpdateDocument(
>                    term,
>                    LuceneDocumentConverter.ToDocument(content),
>                    Analyzer);
> pretty simple, and I pass the analyzer into.
> I don't know why.
>
> 2009/3/5 Jokin Cuadrado <jo...@gmail.com>
>
> > First of all, the field stored value is different from the indexed
> > terms value, wich of them are you telling to us? if you remove the
> > lowercase filter it works, so I,m pretty sure that you are not doing
> > that at index writing time, so you are not using the standaranalyzer,
> > or you have used a version without the lowercase filter. Might you
> > post the snippet of the index creator code?
> >
> >
> > On 3/5/09, Floyd Wu <fl...@gmail.com> wrote:
> > > Hi Michael,
> > > I'm sure that I use StandardAnalyzer when indexing. The problem is I
> need
> > to
> > > get search result when I query "Z123456" to my index filed named
> > "author_id"
> > > and currently this field value is "Z123456" shown by Luke-0.8.1 in
> index.
> > >
> > > I'm stuck here for a month. Please help on this.
> > > Thanks
> > >
> > >
> > >
> > > 2009/3/5 Michael Mitiaguin <mi...@gmail.com>
> > >
> > >> As mentioned in this thread could you re-check that you explicitly
>  use
> > >> StandardAnalyzer when indexing.
> > >> I must admit though I am still using 2.0.4
> > >>
> > >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> > >>
> > >> In Luke if  you to select plugins > Analyzer tool  > StandardAnalyzer
> > >> it also makes lowercase
> > >> Original text : Z123456  tokens found : z123456
> > >>
> > >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <fl...@gmail.com> wrote:
> > >>
> > >> > I'm sure the application and Luke use the same analyzer,
> > >> > StandardAnalyer.
> > >> > But I can't search "Z123456" and I don't know why. As log as I
> > >> > commentted
> > >> > out StandardAnalyzer.cs
> > >> > line: result = new LowerCaseFilter(result);
> > >> > The result will be what I want.
> > >> >
> > >> >
> > >> >
> > >> > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > >> >
> > >> > > using luke you could use another analyzers as well, so use the
> > keyword
> > >> > > analyzer for example. But regards your application, you must use
> the
> > >> same
> > >> > > analyzer whe you make your index and when you query it.
> > >> > >
> > >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > But the current situation is: I can't search any result with
> > >> "Z123456"
> > >> > > when
> > >> > > > I type "Z123456" or "z123456".
> > >> > > >
> > >> > > > I'm using StandardAnalyzer and by using luke, the value indexed
> is
> > >> > > > "Z123456".
> > >> > > > How can I fix this problem?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > >> > > >
> > >> > > > > the rationale behind using the lowercase filter, is that it
> > would
> > >> > mach
> > >> > > > when
> > >> > > > > you search both of Z123456 and z132456, so the searchs are
> case
> > >> > > > > insensitive,
> > >> > > > > however, as with any filter, you must use the same analyzer
> when
> > >> > > indexing
> > >> > > > > your documents, Are you doing that?
> > >> > > > >
> > >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com>
> > >> wrote:
> > >> > > > >
> > >> > > > > > Hi all,
> > >> > > > > > My problem is I have a field and the field is set to be
> >  Indexed
> > >> &
> > >> > > > > Stored.
> > >> > > > > > The index value is Z123456.
> > >> > > > > > But when I using StandardAnalyzer to search this field, it
> > seems
> > >> > >  that
> > >> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
> > >> > "z123456".
> > >> > > > > After
> > >> > > > > > walk through source code, I found following lines:
> > >> > > > > >  public override TokenStream TokenStream(System.String
> > >> > > > > > fieldName,
> > >> > > > > > System.IO.TextReader reader)
> > >> > > > > >  {
> > >> > > > > >   StandardTokenizer tokenStream = new
> > StandardTokenizer(reader,
> > >> > > > > > replaceInvalidAcronym);
> > >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> > >> > > > > >   result = new LowerCaseFilter(result);
> > >> > > > > >   result = new StopFilter(result, stopSet);
> > >> > > > > >   return result;
> > >> > > > > >  }
> > >> > > > > >
> > >> > > > > > Why using LoweCasefilter() here? If I comment out this line,
> > >> > > > > > will
> > >> I
> > >> > > > have
> > >> > > > > > any
> > >> > > > > > potential problems?
> > >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> > filter.
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Jokin
> > >> > > > > Sent from: Sant cugat del valles  Spain.
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Jokin
> > >> > >
> > >> >
> > >>
> > >
> >
> >
> > --
> > Jokin
> >
>



-- 
Jokin

Re: Question about StandardAnalyzer.cs

Posted by Floyd Wu <fl...@gmail.com>.

Hi Jokin,

Thanks for your reply, and I'm very sure that using analyzer (from SVN trun
compiled one) to index my document. The following is the code snippet

                m_Writer.UpdateDocument(
                    term,
                    LuceneDocumentConverter.ToDocument(content),
                    Analyzer);
pretty simple, and I pass the analyzer into.
I don't know why.

2009/3/5 Jokin Cuadrado <jo...@gmail.com>

> First of all, the field stored value is different from the indexed
> terms value, wich of them are you telling to us? if you remove the
> lowercase filter it works, so I,m pretty sure that you are not doing
> that at index writing time, so you are not using the standaranalyzer,
> or you have used a version without the lowercase filter. Might you
> post the snippet of the index creator code?
>
>
> On 3/5/09, Floyd Wu <fl...@gmail.com> wrote:
> > Hi Michael,
> > I'm sure that I use StandardAnalyzer when indexing. The problem is I need
> to
> > get search result when I query "Z123456" to my index filed named
> "author_id"
> > and currently this field value is "Z123456" shown by Luke-0.8.1 in index.
> >
> > I'm stuck here for a month. Please help on this.
> > Thanks
> >
> >
> >
> > 2009/3/5 Michael Mitiaguin <mi...@gmail.com>
> >
> >> As mentioned in this thread could you re-check that you explicitly  use
> >> StandardAnalyzer when indexing.
> >> I must admit though I am still using 2.0.4
> >>
> >>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
> >>
> >> In Luke if  you to select plugins > Analyzer tool  > StandardAnalyzer
> >> it also makes lowercase
> >> Original text : Z123456  tokens found : z123456
> >>
> >> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <fl...@gmail.com> wrote:
> >>
> >> > I'm sure the application and Luke use the same analyzer,
> >> > StandardAnalyer.
> >> > But I can't search "Z123456" and I don't know why. As log as I
> >> > commentted
> >> > out StandardAnalyzer.cs
> >> > line: result = new LowerCaseFilter(result);
> >> > The result will be what I want.
> >> >
> >> >
> >> >
> >> > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> >> >
> >> > > using luke you could use another analyzers as well, so use the
> keyword
> >> > > analyzer for example. But regards your application, you must use the
> >> same
> >> > > analyzer whe you make your index and when you query it.
> >> > >
> >> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com>
> wrote:
> >> > >
> >> > > > But the current situation is: I can't search any result with
> >> "Z123456"
> >> > > when
> >> > > > I type "Z123456" or "z123456".
> >> > > >
> >> > > > I'm using StandardAnalyzer and by using luke, the value indexed is
> >> > > > "Z123456".
> >> > > > How can I fix this problem?
> >> > > >
> >> > > >
> >> > > >
> >> > > > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> >> > > >
> >> > > > > the rationale behind using the lowercase filter, is that it
> would
> >> > mach
> >> > > > when
> >> > > > > you search both of Z123456 and z132456, so the searchs are case
> >> > > > > insensitive,
> >> > > > > however, as with any filter, you must use the same analyzer when
> >> > > indexing
> >> > > > > your documents, Are you doing that?
> >> > > > >
> >> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com>
> >> wrote:
> >> > > > >
> >> > > > > > Hi all,
> >> > > > > > My problem is I have a field and the field is set to be
>  Indexed
> >> &
> >> > > > > Stored.
> >> > > > > > The index value is Z123456.
> >> > > > > > But when I using StandardAnalyzer to search this field, it
> seems
> >> > >  that
> >> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
> >> > "z123456".
> >> > > > > After
> >> > > > > > walk through source code, I found following lines:
> >> > > > > >  public override TokenStream TokenStream(System.String
> >> > > > > > fieldName,
> >> > > > > > System.IO.TextReader reader)
> >> > > > > >  {
> >> > > > > >   StandardTokenizer tokenStream = new
> StandardTokenizer(reader,
> >> > > > > > replaceInvalidAcronym);
> >> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> >> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> >> > > > > >   result = new LowerCaseFilter(result);
> >> > > > > >   result = new StopFilter(result, stopSet);
> >> > > > > >   return result;
> >> > > > > >  }
> >> > > > > >
> >> > > > > > Why using LoweCasefilter() here? If I comment out this line,
> >> > > > > > will
> >> I
> >> > > > have
> >> > > > > > any
> >> > > > > > potential problems?
> >> > > > > > I think my "Z123456" to "z123456" is transformed by this
> filter.
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Jokin
> >> > > > > Sent from: Sant cugat del valles  Spain.
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Jokin
> >> > >
> >> >
> >>
> >
>
>
> --
> Jokin
>

Re: Question about StandardAnalyzer.cs

Posted by Jokin Cuadrado <jo...@gmail.com>.

First of all, the field stored value is different from the indexed
terms value, wich of them are you telling to us? if you remove the
lowercase filter it works, so I,m pretty sure that you are not doing
that at index writing time, so you are not using the standaranalyzer,
or you have used a version without the lowercase filter. Might you
post the snippet of the index creator code?


On 3/5/09, Floyd Wu <fl...@gmail.com> wrote:
> Hi Michael,
> I'm sure that I use StandardAnalyzer when indexing. The problem is I need to
> get search result when I query "Z123456" to my index filed named "author_id"
> and currently this field value is "Z123456" shown by Luke-0.8.1 in index.
>
> I'm stuck here for a month. Please help on this.
> Thanks
>
>
>
> 2009/3/5 Michael Mitiaguin <mi...@gmail.com>
>
>> As mentioned in this thread could you re-check that you explicitly  use
>> StandardAnalyzer when indexing.
>> I must admit though I am still using 2.0.4
>>
>>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
>>
>> In Luke if  you to select plugins > Analyzer tool  > StandardAnalyzer
>> it also makes lowercase
>> Original text : Z123456  tokens found : z123456
>>
>> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <fl...@gmail.com> wrote:
>>
>> > I'm sure the application and Luke use the same analyzer,
>> > StandardAnalyer.
>> > But I can't search "Z123456" and I don't know why. As log as I
>> > commentted
>> > out StandardAnalyzer.cs
>> > line: result = new LowerCaseFilter(result);
>> > The result will be what I want.
>> >
>> >
>> >
>> > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
>> >
>> > > using luke you could use another analyzers as well, so use the keyword
>> > > analyzer for example. But regards your application, you must use the
>> same
>> > > analyzer whe you make your index and when you query it.
>> > >
>> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com> wrote:
>> > >
>> > > > But the current situation is: I can't search any result with
>> "Z123456"
>> > > when
>> > > > I type "Z123456" or "z123456".
>> > > >
>> > > > I'm using StandardAnalyzer and by using luke, the value indexed is
>> > > > "Z123456".
>> > > > How can I fix this problem?
>> > > >
>> > > >
>> > > >
>> > > > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
>> > > >
>> > > > > the rationale behind using the lowercase filter, is that it would
>> > mach
>> > > > when
>> > > > > you search both of Z123456 and z132456, so the searchs are case
>> > > > > insensitive,
>> > > > > however, as with any filter, you must use the same analyzer when
>> > > indexing
>> > > > > your documents, Are you doing that?
>> > > > >
>> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > > My problem is I have a field and the field is set to be  Indexed
>> &
>> > > > > Stored.
>> > > > > > The index value is Z123456.
>> > > > > > But when I using StandardAnalyzer to search this field, it seems
>> > >  that
>> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
>> > "z123456".
>> > > > > After
>> > > > > > walk through source code, I found following lines:
>> > > > > >  public override TokenStream TokenStream(System.String
>> > > > > > fieldName,
>> > > > > > System.IO.TextReader reader)
>> > > > > >  {
>> > > > > >   StandardTokenizer tokenStream = new StandardTokenizer(reader,
>> > > > > > replaceInvalidAcronym);
>> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
>> > > > > >   TokenStream result = new StandardFilter(tokenStream);
>> > > > > >   result = new LowerCaseFilter(result);
>> > > > > >   result = new StopFilter(result, stopSet);
>> > > > > >   return result;
>> > > > > >  }
>> > > > > >
>> > > > > > Why using LoweCasefilter() here? If I comment out this line,
>> > > > > > will
>> I
>> > > > have
>> > > > > > any
>> > > > > > potential problems?
>> > > > > > I think my "Z123456" to "z123456" is transformed by this filter.
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Jokin
>> > > > > Sent from: Sant cugat del valles  Spain.
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Jokin
>> > >
>> >
>>
>


-- 
Jokin

Re: Question about StandardAnalyzer.cs

Posted by Floyd Wu <fl...@gmail.com>.

Hi Michael,
I'm sure that I use StandardAnalyzer when indexing. The problem is I need to
get search result when I query "Z123456" to my index filed named "author_id"
and currently this field value is "Z123456" shown by Luke-0.8.1 in index.

I'm stuck here for a month. Please help on this.
Thanks



2009/3/5 Michael Mitiaguin <mi...@gmail.com>

> As mentioned in this thread could you re-check that you explicitly  use
> StandardAnalyzer when indexing.
> I must admit though I am still using 2.0.4
>
>  writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);
>
> In Luke if  you to select plugins > Analyzer tool  > StandardAnalyzer
> it also makes lowercase
> Original text : Z123456  tokens found : z123456
>
> On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > I'm sure the application and Luke use the same analyzer, StandardAnalyer.
> > But I can't search "Z123456" and I don't know why. As log as I commentted
> > out StandardAnalyzer.cs
> > line: result = new LowerCaseFilter(result);
> > The result will be what I want.
> >
> >
> >
> > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> >
> > > using luke you could use another analyzers as well, so use the keyword
> > > analyzer for example. But regards your application, you must use the
> same
> > > analyzer whe you make your index and when you query it.
> > >
> > > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com> wrote:
> > >
> > > > But the current situation is: I can't search any result with
> "Z123456"
> > > when
> > > > I type "Z123456" or "z123456".
> > > >
> > > > I'm using StandardAnalyzer and by using luke, the value indexed is
> > > > "Z123456".
> > > > How can I fix this problem?
> > > >
> > > >
> > > >
> > > > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > > >
> > > > > the rationale behind using the lowercase filter, is that it would
> > mach
> > > > when
> > > > > you search both of Z123456 and z132456, so the searchs are case
> > > > > insensitive,
> > > > > however, as with any filter, you must use the same analyzer when
> > > indexing
> > > > > your documents, Are you doing that?
> > > > >
> > > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi all,
> > > > > > My problem is I have a field and the field is set to be  Indexed
> &
> > > > > Stored.
> > > > > > The index value is Z123456.
> > > > > > But when I using StandardAnalyzer to search this field, it seems
> > >  that
> > > > > > StandarAnalyzer will transaform my query text "Z123456" to
> > "z123456".
> > > > > After
> > > > > > walk through source code, I found following lines:
> > > > > >  public override TokenStream TokenStream(System.String fieldName,
> > > > > > System.IO.TextReader reader)
> > > > > >  {
> > > > > >   StandardTokenizer tokenStream = new StandardTokenizer(reader,
> > > > > > replaceInvalidAcronym);
> > > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > > > > >   TokenStream result = new StandardFilter(tokenStream);
> > > > > >   result = new LowerCaseFilter(result);
> > > > > >   result = new StopFilter(result, stopSet);
> > > > > >   return result;
> > > > > >  }
> > > > > >
> > > > > > Why using LoweCasefilter() here? If I comment out this line, will
> I
> > > > have
> > > > > > any
> > > > > > potential problems?
> > > > > > I think my "Z123456" to "z123456" is transformed by this filter.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jokin
> > > > > Sent from: Sant cugat del valles  Spain.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jokin
> > >
> >
>

Re: Question about StandardAnalyzer.cs

Posted by Michael Mitiaguin <mi...@gmail.com>.

As mentioned in this thread could you re-check that you explicitly  use
StandardAnalyzer when indexing.
I must admit though I am still using 2.0.4

 writer = new IndexWriter(indexdir, new StandardAnalyzer(), true);

In Luke if  you to select plugins > Analyzer tool  > StandardAnalyzer
it also makes lowercase
Original text : Z123456  tokens found : z123456

On Thu, Mar 5, 2009 at 2:45 PM, Floyd Wu <fl...@gmail.com> wrote:

> I'm sure the application and Luke use the same analyzer, StandardAnalyer.
> But I can't search "Z123456" and I don't know why. As log as I commentted
> out StandardAnalyzer.cs
> line: result = new LowerCaseFilter(result);
> The result will be what I want.
>
>
>
> 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
>
> > using luke you could use another analyzers as well, so use the keyword
> > analyzer for example. But regards your application, you must use the same
> > analyzer whe you make your index and when you query it.
> >
> > On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com> wrote:
> >
> > > But the current situation is: I can't search any result with "Z123456"
> > when
> > > I type "Z123456" or "z123456".
> > >
> > > I'm using StandardAnalyzer and by using luke, the value indexed is
> > > "Z123456".
> > > How can I fix this problem?
> > >
> > >
> > >
> > > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> > >
> > > > the rationale behind using the lowercase filter, is that it would
> mach
> > > when
> > > > you search both of Z123456 and z132456, so the searchs are case
> > > > insensitive,
> > > > however, as with any filter, you must use the same analyzer when
> > indexing
> > > > your documents, Are you doing that?
> > > >
> > > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > > My problem is I have a field and the field is set to be  Indexed &
> > > > Stored.
> > > > > The index value is Z123456.
> > > > > But when I using StandardAnalyzer to search this field, it seems
> >  that
> > > > > StandarAnalyzer will transaform my query text "Z123456" to
> "z123456".
> > > > After
> > > > > walk through source code, I found following lines:
> > > > >  public override TokenStream TokenStream(System.String fieldName,
> > > > > System.IO.TextReader reader)
> > > > >  {
> > > > >   StandardTokenizer tokenStream = new StandardTokenizer(reader,
> > > > > replaceInvalidAcronym);
> > > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > > > >   TokenStream result = new StandardFilter(tokenStream);
> > > > >   result = new LowerCaseFilter(result);
> > > > >   result = new StopFilter(result, stopSet);
> > > > >   return result;
> > > > >  }
> > > > >
> > > > > Why using LoweCasefilter() here? If I comment out this line, will I
> > > have
> > > > > any
> > > > > potential problems?
> > > > > I think my "Z123456" to "z123456" is transformed by this filter.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jokin
> > > > Sent from: Sant cugat del valles  Spain.
> > > >
> > >
> >
> >
> >
> > --
> > Jokin
> >
>

Re: Question about StandardAnalyzer.cs

Posted by Floyd Wu <fl...@gmail.com>.

I'm sure the application and Luke use the same analyzer, StandardAnalyer.
But I can't search "Z123456" and I don't know why. As log as I commentted
out StandardAnalyzer.cs
line: result = new LowerCaseFilter(result);
The result will be what I want.



2009/3/4 Jokin Cuadrado <jo...@gmail.com>

> using luke you could use another analyzers as well, so use the keyword
> analyzer for example. But regards your application, you must use the same
> analyzer whe you make your index and when you query it.
>
> On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com> wrote:
>
> > But the current situation is: I can't search any result with "Z123456"
> when
> > I type "Z123456" or "z123456".
> >
> > I'm using StandardAnalyzer and by using luke, the value indexed is
> > "Z123456".
> > How can I fix this problem?
> >
> >
> >
> > 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
> >
> > > the rationale behind using the lowercase filter, is that it would mach
> > when
> > > you search both of Z123456 and z132456, so the searchs are case
> > > insensitive,
> > > however, as with any filter, you must use the same analyzer when
> indexing
> > > your documents, Are you doing that?
> > >
> > > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > > My problem is I have a field and the field is set to be  Indexed &
> > > Stored.
> > > > The index value is Z123456.
> > > > But when I using StandardAnalyzer to search this field, it seems
>  that
> > > > StandarAnalyzer will transaform my query text "Z123456" to "z123456".
> > > After
> > > > walk through source code, I found following lines:
> > > >  public override TokenStream TokenStream(System.String fieldName,
> > > > System.IO.TextReader reader)
> > > >  {
> > > >   StandardTokenizer tokenStream = new StandardTokenizer(reader,
> > > > replaceInvalidAcronym);
> > > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > > >   TokenStream result = new StandardFilter(tokenStream);
> > > >   result = new LowerCaseFilter(result);
> > > >   result = new StopFilter(result, stopSet);
> > > >   return result;
> > > >  }
> > > >
> > > > Why using LoweCasefilter() here? If I comment out this line, will I
> > have
> > > > any
> > > > potential problems?
> > > > I think my "Z123456" to "z123456" is transformed by this filter.
> > > >
> > >
> > >
> > >
> > > --
> > > Jokin
> > > Sent from: Sant cugat del valles  Spain.
> > >
> >
>
>
>
> --
> Jokin
>

Re: Question about StandardAnalyzer.cs

Posted by Jokin Cuadrado <jo...@gmail.com>.

using luke you could use another analyzers as well, so use the keyword
analyzer for example. But regards your application, you must use the same
analyzer whe you make your index and when you query it.

On Wed, Mar 4, 2009 at 10:50 AM, Floyd Wu <fl...@gmail.com> wrote:

> But the current situation is: I can't search any result with "Z123456" when
> I type "Z123456" or "z123456".
>
> I'm using StandardAnalyzer and by using luke, the value indexed is
> "Z123456".
> How can I fix this problem?
>
>
>
> 2009/3/4 Jokin Cuadrado <jo...@gmail.com>
>
> > the rationale behind using the lowercase filter, is that it would mach
> when
> > you search both of Z123456 and z132456, so the searchs are case
> > insensitive,
> > however, as with any filter, you must use the same analyzer when indexing
> > your documents, Are you doing that?
> >
> > On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com> wrote:
> >
> > > Hi all,
> > > My problem is I have a field and the field is set to be  Indexed &
> > Stored.
> > > The index value is Z123456.
> > > But when I using StandardAnalyzer to search this field, it seems  that
> > > StandarAnalyzer will transaform my query text "Z123456" to "z123456".
> > After
> > > walk through source code, I found following lines:
> > >  public override TokenStream TokenStream(System.String fieldName,
> > > System.IO.TextReader reader)
> > >  {
> > >   StandardTokenizer tokenStream = new StandardTokenizer(reader,
> > > replaceInvalidAcronym);
> > >   tokenStream.SetMaxTokenLength(maxTokenLength);
> > >   TokenStream result = new StandardFilter(tokenStream);
> > >   result = new LowerCaseFilter(result);
> > >   result = new StopFilter(result, stopSet);
> > >   return result;
> > >  }
> > >
> > > Why using LoweCasefilter() here? If I comment out this line, will I
> have
> > > any
> > > potential problems?
> > > I think my "Z123456" to "z123456" is transformed by this filter.
> > >
> >
> >
> >
> > --
> > Jokin
> > Sent from: Sant cugat del valles  Spain.
> >
>



-- 
Jokin

Re: Question about StandardAnalyzer.cs

Posted by Floyd Wu <fl...@gmail.com>.

But the current situation is: I can't search any result with "Z123456" when
I type "Z123456" or "z123456".

I'm using StandardAnalyzer and by using luke, the value indexed is
"Z123456".
How can I fix this problem?



2009/3/4 Jokin Cuadrado <jo...@gmail.com>

> the rationale behind using the lowercase filter, is that it would mach when
> you search both of Z123456 and z132456, so the searchs are case
> insensitive,
> however, as with any filter, you must use the same analyzer when indexing
> your documents, Are you doing that?
>
> On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi all,
> > My problem is I have a field and the field is set to be  Indexed &
> Stored.
> > The index value is Z123456.
> > But when I using StandardAnalyzer to search this field, it seems  that
> > StandarAnalyzer will transaform my query text "Z123456" to "z123456".
> After
> > walk through source code, I found following lines:
> >  public override TokenStream TokenStream(System.String fieldName,
> > System.IO.TextReader reader)
> >  {
> >   StandardTokenizer tokenStream = new StandardTokenizer(reader,
> > replaceInvalidAcronym);
> >   tokenStream.SetMaxTokenLength(maxTokenLength);
> >   TokenStream result = new StandardFilter(tokenStream);
> >   result = new LowerCaseFilter(result);
> >   result = new StopFilter(result, stopSet);
> >   return result;
> >  }
> >
> > Why using LoweCasefilter() here? If I comment out this line, will I have
> > any
> > potential problems?
> > I think my "Z123456" to "z123456" is transformed by this filter.
> >
>
>
>
> --
> Jokin
> Sent from: Sant cugat del valles  Spain.
>

Re: Question about StandardAnalyzer.cs

Posted by Jokin Cuadrado <jo...@gmail.com>.

the rationale behind using the lowercase filter, is that it would mach when
you search both of Z123456 and z132456, so the searchs are case insensitive,
however, as with any filter, you must use the same analyzer when indexing
your documents, Are you doing that?

On Wed, Mar 4, 2009 at 9:31 AM, Floyd Wu <fl...@gmail.com> wrote:

> Hi all,
> My problem is I have a field and the field is set to be  Indexed & Stored.
> The index value is Z123456.
> But when I using StandardAnalyzer to search this field, it seems  that
> StandarAnalyzer will transaform my query text "Z123456" to "z123456". After
> walk through source code, I found following lines:
>  public override TokenStream TokenStream(System.String fieldName,
> System.IO.TextReader reader)
>  {
>   StandardTokenizer tokenStream = new StandardTokenizer(reader,
> replaceInvalidAcronym);
>   tokenStream.SetMaxTokenLength(maxTokenLength);
>   TokenStream result = new StandardFilter(tokenStream);
>   result = new LowerCaseFilter(result);
>   result = new StopFilter(result, stopSet);
>   return result;
>  }
>
> Why using LoweCasefilter() here? If I comment out this line, will I have
> any
> potential problems?
> I think my "Z123456" to "z123456" is transformed by this filter.
>



-- 
Jokin
Sent from: Sant cugat del valles  Spain.