You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@lucenenet.apache.org by Floyd Wu <fl...@gmail.com> on 2011/09/06 11:31:21 UTC

[Lucene.Net] How to index/search a file name

Hi everyone,

I have a question that annoying me many times. my situation is that I need
to index file name and need to be searchable using partial file name.

example--> 2009&2010Q2_ABCD_Report.xls (the file name)

When I shot queries

filename:ABCD    no match return.

filename:2010Q2_ABCD     match

filename:Report*    match

I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current filename
field is set to tokenized/indexed/store

What I want is when user type any part of file name that lucene.Net can
match.
(string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)

Please help on this or kindly direct me a way to solve it.

Floyd

Re: [Lucene.Net] How to index/search a file name

Posted by Gustavo Poll <gk...@gmail.com>.

Just to give a feedback, in case someone is interested -

ModifiedStandardAnalyzer class seems to work perfectly as a Standard
Analyzer but accent insensitive... A small difference occured with the last
character, but it does not belong to the portuguese alphabet, so I think
there's no problem in ignoring it in my case...

Thanks Digy!

Test results:

(tokenizing the expression  "Name.Surname@gmail.com 123.456 3,5 AT&T João
Avião Calção ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß")

StandardAnalyzer:

[name.surname@gmail.com] [123.456] [3,5] [at&t] [joão] [avião] [calção]
[güsıöç] [güsiöç] [aß?de?] [??????] [ssß]

ModifiedStandardAnalyzer: (accent insensitive)

[name.surname@gmail.com] [123.456] [3,5] [at&t] [joao] [aviao] [calcao]
[gusioc] [gusioc] [aß?de?] [??????] [ssss]

Thanx
Gustavo Poll

2011/9/6 Gustavo Poll <gk...@gmail.com>

> thanks, I'll do it...
>
> 2011/9/6 Digy <di...@gmail.com>
>
>> That can be a starting point (Just play a little bit with with tokenizers
>> & filters )
>>
>>
>>
>>    public class ModifiedStandardAnalyzer : Analyzer
>>
>>    {
>>
>>        public override TokenStream TokenStream(System.String fieldName,
>> System.IO.TextReader reader)
>>
>>        {
>>
>>            StandardTokenizer tokenStream = new StandardTokenizer(reader,
>> true);
>>
>>            TokenStream result = new StandardFilter(tokenStream);
>>
>>            result = new LowerCaseFilter(result);
>>
>>            result = new ASCIIFoldingFilter(result);
>>
>>            return result;
>>
>>        }
>>
>>    }
>>
>>
>>
>> DIGY
>>
>>
>>
>> -----Original Message-----
>> From: Gustavo Poll [mailto:gkpoll@gmail.com]
>> Sent: Tuesday, September 06, 2011 10:06 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: [Lucene.Net] How to index/search a file name
>>
>>
>>
>> thanks again... Ok, it is not..
>>
>>
>>
>> standard analyzer:
>>
>>
>>
>> [name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç]
>> [aß?de?]
>>
>> [??????] [ssß]
>>
>>
>>
>> UnaccentedWordAnalyzer:
>>
>>
>>
>> [name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
>>
>> [gusioc] [aß?de?] [??????] [ssss]
>>
>>
>>
>>
>>
>> StandardAnalyzer would be perfect to my application if it was accent
>>
>> insensitive... Can anyone tell me please, the easiest way to code such
>>
>> analyzer? (accent insensitive Standard Analyzer)
>>
>>
>>
>> I hear it is not a good idea to make a class that inherits
>> StandardAnalyzer
>>
>> cause StandardAnalyzer should be a final class.. Is this coherent?
>>
>>
>>
>> Appreciate any help please...
>>
>> Gustavo Poll
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 2011/9/6 Digy <di...@gmail.com>
>>
>>
>>
>> > A function is worth a thousand words J
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >        void Test()
>>
>> >
>>
>> >        {
>>
>> >
>>
>> >            Analyzer[] analyzers = new Analyzer[] { new
>> StandardAnalyzer(),
>>
>> > new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>>
>> >
>>
>> >            string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
>>
>> > ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>>
>> >
>>
>> >
>>
>> >
>>
>> >            foreach (Analyzer analyzer in analyzers)
>>
>> >
>>
>> >            {
>>
>> >
>>
>> >                TokenStream ts = analyzer.TokenStream("", new
>>
>> > StringReader(input));
>>
>> >
>>
>> >                Lucene.Net.Analysis.Token t = ts.Next();
>>
>> >
>>
>> >                while (t != null)
>>
>> >
>>
>> >                {
>>
>> >
>>
>> >                    Console.Write("[" + t.TermText() + "] ");
>>
>> >
>>
>> >                    t = ts.Next();
>>
>> >
>>
>> >                }
>>
>> >
>>
>> >                Console.WriteLine(); Console.WriteLine();
>>
>> >
>>
>> >
>>
>> >
>>
>> >            }
>>
>> >
>>
>> >        }
>>
>> >
>>
>> >
>>
>> >
>>
>> > DIGY
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > -----Original Message-----
>>
>> > From: Gustavo Poll [mailto:gkpoll@gmail.com]
>>
>> > Sent: Tuesday, September 06, 2011 9:00 PM
>>
>> > To: lucene-net-user@lucene.apache.org
>>
>> > Subject: Re: [Lucene.Net] How to index/search a file name
>>
>> >
>>
>> >
>>
>> >
>>
>> > thanks DIGY, I have interest in that too... Let me see if i understood:
>>
>> >
>>
>> >
>>
>> >
>>
>> > UnaccentedWordAnalyzer  is like Standard Analyzer, but accent
>> insensitive?
>>
>> >
>>
>> >
>>
>> >
>>
>> > Thanks!
>>
>> >
>>
>> > Gustavo Poll
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > 2011/9/6 digy digy <di...@gmail.com>
>>
>> >
>>
>> >
>>
>> >
>>
>> > > That may help
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > UnaccentedWordAnalyzer @
>>
>> >
>>
>> > >
>>
>> >
>>
>> > >
>>
>> >
>> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>>
>> >
>>
>> > >
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > DIGY
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > > Hi everyone,
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > I have a question that annoying me many times. my situation is that
>> I
>>
>> >
>>
>> > > need
>>
>> >
>>
>> > > > to index file name and need to be searchable using partial file
>> name.
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > When I shot queries
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > filename:ABCD    no match return.
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > filename:2010Q2_ABCD     match
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > filename:Report*    match
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>>
>> >
>>
>> > > > filename
>>
>> >
>>
>> > > > field is set to tokenized/indexed/store
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > What I want is when user type any part of file name that lucene.Net
>> can
>>
>> >
>>
>> > > > match.
>>
>> >
>>
>> > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > Please help on this or kindly direct me a way to solve it.
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > Floyd
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > >
>>
>> >
>>
>> >
>>
>> >
>>
>> > -----
>>
>> >
>>
>> > Bu iletide virüs bulunamadı.
>>
>> >
>>
>> > AVG tarafından kontrol edildi - www.avg.com
>>
>> >
>>
>> > Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi:
>> 06.09.2011
>>
>> >
>>
>> >
>>
>>
>>
>> -----
>>
>> Bu iletide virüs bulunamadı.
>>
>> AVG tarafından kontrol edildi - www.avg.com
>>
>> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi:
>> 06.09.2011
>>
>>
>

Re: [Lucene.Net] How to index/search a file name

Posted by Gustavo Poll <gk...@gmail.com>.

thanks, I'll do it...

2011/9/6 Digy <di...@gmail.com>

> That can be a starting point (Just play a little bit with with tokenizers &
> filters )
>
>
>
>    public class ModifiedStandardAnalyzer : Analyzer
>
>    {
>
>        public override TokenStream TokenStream(System.String fieldName,
> System.IO.TextReader reader)
>
>        {
>
>            StandardTokenizer tokenStream = new StandardTokenizer(reader,
> true);
>
>            TokenStream result = new StandardFilter(tokenStream);
>
>            result = new LowerCaseFilter(result);
>
>            result = new ASCIIFoldingFilter(result);
>
>            return result;
>
>        }
>
>    }
>
>
>
> DIGY
>
>
>
> -----Original Message-----
> From: Gustavo Poll [mailto:gkpoll@gmail.com]
> Sent: Tuesday, September 06, 2011 10:06 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] How to index/search a file name
>
>
>
> thanks again... Ok, it is not..
>
>
>
> standard analyzer:
>
>
>
> [name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]
>
> [??????] [ssß]
>
>
>
> UnaccentedWordAnalyzer:
>
>
>
> [name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
>
> [gusioc] [aß?de?] [??????] [ssss]
>
>
>
>
>
> StandardAnalyzer would be perfect to my application if it was accent
>
> insensitive... Can anyone tell me please, the easiest way to code such
>
> analyzer? (accent insensitive Standard Analyzer)
>
>
>
> I hear it is not a good idea to make a class that inherits StandardAnalyzer
>
> cause StandardAnalyzer should be a final class.. Is this coherent?
>
>
>
> Appreciate any help please...
>
> Gustavo Poll
>
>
>
>
>
>
>
>
>
> 2011/9/6 Digy <di...@gmail.com>
>
>
>
> > A function is worth a thousand words J
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >        void Test()
>
> >
>
> >        {
>
> >
>
> >            Analyzer[] analyzers = new Analyzer[] { new
> StandardAnalyzer(),
>
> > new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>
> >
>
> >            string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
>
> > ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>
> >
>
> >
>
> >
>
> >            foreach (Analyzer analyzer in analyzers)
>
> >
>
> >            {
>
> >
>
> >                TokenStream ts = analyzer.TokenStream("", new
>
> > StringReader(input));
>
> >
>
> >                Lucene.Net.Analysis.Token t = ts.Next();
>
> >
>
> >                while (t != null)
>
> >
>
> >                {
>
> >
>
> >                    Console.Write("[" + t.TermText() + "] ");
>
> >
>
> >                    t = ts.Next();
>
> >
>
> >                }
>
> >
>
> >                Console.WriteLine(); Console.WriteLine();
>
> >
>
> >
>
> >
>
> >            }
>
> >
>
> >        }
>
> >
>
> >
>
> >
>
> > DIGY
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > -----Original Message-----
>
> > From: Gustavo Poll [mailto:gkpoll@gmail.com]
>
> > Sent: Tuesday, September 06, 2011 9:00 PM
>
> > To: lucene-net-user@lucene.apache.org
>
> > Subject: Re: [Lucene.Net] How to index/search a file name
>
> >
>
> >
>
> >
>
> > thanks DIGY, I have interest in that too... Let me see if i understood:
>
> >
>
> >
>
> >
>
> > UnaccentedWordAnalyzer  is like Standard Analyzer, but accent
> insensitive?
>
> >
>
> >
>
> >
>
> > Thanks!
>
> >
>
> > Gustavo Poll
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > 2011/9/6 digy digy <di...@gmail.com>
>
> >
>
> >
>
> >
>
> > > That may help
>
> >
>
> > >
>
> >
>
> > > UnaccentedWordAnalyzer @
>
> >
>
> > >
>
> >
>
> > >
>
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > DIGY
>
> >
>
> > >
>
> >
>
> > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> >
>
> > >
>
> >
>
> > > > Hi everyone,
>
> >
>
> > > >
>
> >
>
> > > > I have a question that annoying me many times. my situation is that I
>
> >
>
> > > need
>
> >
>
> > > > to index file name and need to be searchable using partial file name.
>
> >
>
> > > >
>
> >
>
> > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> >
>
> > > >
>
> >
>
> > > > When I shot queries
>
> >
>
> > > >
>
> >
>
> > > > filename:ABCD    no match return.
>
> >
>
> > > >
>
> >
>
> > > > filename:2010Q2_ABCD     match
>
> >
>
> > > >
>
> >
>
> > > > filename:Report*    match
>
> >
>
> > > >
>
> >
>
> > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>
> >
>
> > > > filename
>
> >
>
> > > > field is set to tokenized/indexed/store
>
> >
>
> > > >
>
> >
>
> > > > What I want is when user type any part of file name that lucene.Net
> can
>
> >
>
> > > > match.
>
> >
>
> > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> >
>
> > > >
>
> >
>
> > > > Please help on this or kindly direct me a way to solve it.
>
> >
>
> > > >
>
> >
>
> > > > Floyd
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> >
>
> >
>
> > -----
>
> >
>
> > Bu iletide virüs bulunamadı.
>
> >
>
> > AVG tarafından kontrol edildi - www.avg.com
>
> >
>
> > Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi:
> 06.09.2011
>
> >
>
> >
>
>
>
> -----
>
> Bu iletide virüs bulunamadı.
>
> AVG tarafından kontrol edildi - www.avg.com
>
> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
>
>

RE: [Lucene.Net] How to index/search a file name

Posted by Digy <di...@gmail.com>.

That can be a starting point (Just play a little bit with with tokenizers & filters )

 

    public class ModifiedStandardAnalyzer : Analyzer

    {

        public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader)

        {

            StandardTokenizer tokenStream = new StandardTokenizer(reader, true);

            TokenStream result = new StandardFilter(tokenStream);

            result = new LowerCaseFilter(result);

            result = new ASCIIFoldingFilter(result);

            return result;

        }

    }

 

DIGY

 

-----Original Message-----
From: Gustavo Poll [mailto:gkpoll@gmail.com] 
Sent: Tuesday, September 06, 2011 10:06 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] How to index/search a file name

 

thanks again... Ok, it is not..

 

standard analyzer:

 

[name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]

[??????] [ssß]

 

UnaccentedWordAnalyzer:

 

[name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]

[gusioc] [aß?de?] [??????] [ssss]

 

 

StandardAnalyzer would be perfect to my application if it was accent

insensitive... Can anyone tell me please, the easiest way to code such

analyzer? (accent insensitive Standard Analyzer)

 

I hear it is not a good idea to make a class that inherits StandardAnalyzer

cause StandardAnalyzer should be a final class.. Is this coherent?

 

Appreciate any help please...

Gustavo Poll

 

 

 

 

2011/9/6 Digy <di...@gmail.com>

 

> A function is worth a thousand words J

> 

> 

> 

> 

> 

>        void Test()

> 

>        {

> 

>            Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(),

> new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };

> 

>            string input = "Name.Surname@gmail.com 123.456 3,5 AT&T

> ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";

> 

> 

> 

>            foreach (Analyzer analyzer in analyzers)

> 

>            {

> 

>                TokenStream ts = analyzer.TokenStream("", new

> StringReader(input));

> 

>                Lucene.Net.Analysis.Token t = ts.Next();

> 

>                while (t != null)

> 

>                {

> 

>                    Console.Write("[" + t.TermText() + "] ");

> 

>                    t = ts.Next();

> 

>                }

> 

>                Console.WriteLine(); Console.WriteLine();

> 

> 

> 

>            }

> 

>        }

> 

> 

> 

> DIGY

> 

> 

> 

> 

> 

> -----Original Message-----

> From: Gustavo Poll [mailto:gkpoll@gmail.com]

> Sent: Tuesday, September 06, 2011 9:00 PM

> To: lucene-net-user@lucene.apache.org

> Subject: Re: [Lucene.Net] How to index/search a file name

> 

> 

> 

> thanks DIGY, I have interest in that too... Let me see if i understood:

> 

> 

> 

> UnaccentedWordAnalyzer  is like Standard Analyzer, but accent insensitive?

> 

> 

> 

> Thanks!

> 

> Gustavo Poll

> 

> 

> 

> 

> 

> 2011/9/6 digy digy <di...@gmail.com>

> 

> 

> 

> > That may help

> 

> >

> 

> > UnaccentedWordAnalyzer @

> 

> >

> 

> >

> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs

> 

> >

> 

> >

> 

> > DIGY

> 

> >

> 

> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:

> 

> >

> 

> > > Hi everyone,

> 

> > >

> 

> > > I have a question that annoying me many times. my situation is that I

> 

> > need

> 

> > > to index file name and need to be searchable using partial file name.

> 

> > >

> 

> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)

> 

> > >

> 

> > > When I shot queries

> 

> > >

> 

> > > filename:ABCD    no match return.

> 

> > >

> 

> > > filename:2010Q2_ABCD     match

> 

> > >

> 

> > > filename:Report*    match

> 

> > >

> 

> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current

> 

> > > filename

> 

> > > field is set to tokenized/indexed/store

> 

> > >

> 

> > > What I want is when user type any part of file name that lucene.Net can

> 

> > > match.

> 

> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)

> 

> > >

> 

> > > Please help on this or kindly direct me a way to solve it.

> 

> > >

> 

> > > Floyd

> 

> > >

> 

> >

> 

> 

> 

> -----

> 

> Bu iletide virüs bulunamadı.

> 

> AVG tarafından kontrol edildi - www.avg.com

> 

> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011

> 

> 

 

-----

Bu iletide virüs bulunamadı.

AVG tarafından kontrol edildi - www.avg.com

Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011

Re: [Lucene.Net] How to index/search a file name

Posted by Gustavo Poll <gk...@gmail.com>.

thanks again... Ok, it is not..

standard analyzer:

[name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]
[??????] [ssß]

UnaccentedWordAnalyzer:

[name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
[gusioc] [aß?de?] [??????] [ssss]


StandardAnalyzer would be perfect to my application if it was accent
insensitive... Can anyone tell me please, the easiest way to code such
analyzer? (accent insensitive Standard Analyzer)

I hear it is not a good idea to make a class that inherits StandardAnalyzer
cause StandardAnalyzer should be a final class.. Is this coherent?

Appreciate any help please...
Gustavo Poll




2011/9/6 Digy <di...@gmail.com>

> A function is worth a thousand words J
>
>
>
>
>
>        void Test()
>
>        {
>
>            Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(),
> new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>
>            string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
> ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>
>
>
>            foreach (Analyzer analyzer in analyzers)
>
>            {
>
>                TokenStream ts = analyzer.TokenStream("", new
> StringReader(input));
>
>                Lucene.Net.Analysis.Token t = ts.Next();
>
>                while (t != null)
>
>                {
>
>                    Console.Write("[" + t.TermText() + "] ");
>
>                    t = ts.Next();
>
>                }
>
>                Console.WriteLine(); Console.WriteLine();
>
>
>
>            }
>
>        }
>
>
>
> DIGY
>
>
>
>
>
> -----Original Message-----
> From: Gustavo Poll [mailto:gkpoll@gmail.com]
> Sent: Tuesday, September 06, 2011 9:00 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] How to index/search a file name
>
>
>
> thanks DIGY, I have interest in that too... Let me see if i understood:
>
>
>
> UnaccentedWordAnalyzer  is like Standard Analyzer, but accent insensitive?
>
>
>
> Thanks!
>
> Gustavo Poll
>
>
>
>
>
> 2011/9/6 digy digy <di...@gmail.com>
>
>
>
> > That may help
>
> >
>
> > UnaccentedWordAnalyzer @
>
> >
>
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
> >
>
> >
>
> > DIGY
>
> >
>
> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> >
>
> > > Hi everyone,
>
> > >
>
> > > I have a question that annoying me many times. my situation is that I
>
> > need
>
> > > to index file name and need to be searchable using partial file name.
>
> > >
>
> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> > >
>
> > > When I shot queries
>
> > >
>
> > > filename:ABCD    no match return.
>
> > >
>
> > > filename:2010Q2_ABCD     match
>
> > >
>
> > > filename:Report*    match
>
> > >
>
> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>
> > > filename
>
> > > field is set to tokenized/indexed/store
>
> > >
>
> > > What I want is when user type any part of file name that lucene.Net can
>
> > > match.
>
> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> > >
>
> > > Please help on this or kindly direct me a way to solve it.
>
> > >
>
> > > Floyd
>
> > >
>
> >
>
>
>
> -----
>
> Bu iletide virüs bulunamadı.
>
> AVG tarafından kontrol edildi - www.avg.com
>
> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
>
>

RE: [Lucene.Net] How to index/search a file name

Posted by Digy <di...@gmail.com>.

A function is worth a thousand words J

 

 

        void Test()

        {

            Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(), new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };

            string input = "Name.Surname@gmail.com 123.456 3,5 AT&T ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";

 

            foreach (Analyzer analyzer in analyzers)

            {

                TokenStream ts = analyzer.TokenStream("", new StringReader(input));

                Lucene.Net.Analysis.Token t = ts.Next();

                while (t != null)

                {

                    Console.Write("[" + t.TermText() + "] ");

                    t = ts.Next();

                }

                Console.WriteLine(); Console.WriteLine();

                    

            }

        }

 

DIGY

 

 

-----Original Message-----
From: Gustavo Poll [mailto:gkpoll@gmail.com] 
Sent: Tuesday, September 06, 2011 9:00 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] How to index/search a file name

 

thanks DIGY, I have interest in that too... Let me see if i understood:

 

UnaccentedWordAnalyzer  is like Standard Analyzer, but accent insensitive?

 

Thanks!

Gustavo Poll

 

 

2011/9/6 digy digy <di...@gmail.com>

 

> That may help

> 

> UnaccentedWordAnalyzer @

> 

> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs

> 

> 

> DIGY

> 

> On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:

> 

> > Hi everyone,

> >

> > I have a question that annoying me many times. my situation is that I

> need

> > to index file name and need to be searchable using partial file name.

> >

> > example--> 2009&2010Q2_ABCD_Report.xls (the file name)

> >

> > When I shot queries

> >

> > filename:ABCD    no match return.

> >

> > filename:2010Q2_ABCD     match

> >

> > filename:Report*    match

> >

> > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current

> > filename

> > field is set to tokenized/indexed/store

> >

> > What I want is when user type any part of file name that lucene.Net can

> > match.

> > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)

> >

> > Please help on this or kindly direct me a way to solve it.

> >

> > Floyd

> >

> 

 

-----

Bu iletide virüs bulunamadı.

AVG tarafından kontrol edildi - www.avg.com

Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011

Re: [Lucene.Net] How to index/search a file name

Posted by Gustavo Poll <gk...@gmail.com>.

thanks DIGY, I have interest in that too... Let me see if i understood:

UnaccentedWordAnalyzer  is like Standard Analyzer, but accent insensitive?

Thanks!
Gustavo Poll


2011/9/6 digy digy <di...@gmail.com>

> That may help
>
> UnaccentedWordAnalyzer @
>
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
>
> DIGY
>
> On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I have a question that annoying me many times. my situation is that I
> need
> > to index file name and need to be searchable using partial file name.
> >
> > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> >
> > When I shot queries
> >
> > filename:ABCD    no match return.
> >
> > filename:2010Q2_ABCD     match
> >
> > filename:Report*    match
> >
> > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > filename
> > field is set to tokenized/indexed/store
> >
> > What I want is when user type any part of file name that lucene.Net can
> > match.
> > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> >
> > Please help on this or kindly direct me a way to solve it.
> >
> > Floyd
> >
>

Re: [Lucene.Net] How to index/search a file name

Posted by Floyd Wu <fl...@gmail.com>.

Hi Digy,

Many thanks.

I try this later. By the way, do you recommend to upgrade to 2.9.4g or stay
in 2.9.3?
I've been use 2.9.3 for long time and seems to no big problem.

Floyd


2011/9/6 digy digy <di...@gmail.com>

> - Just copy and paste to your project :) and use it instead of
> StandardAnalyzer .
> - Yes, it is from 2.9.4.
>
> DIGY
>
> On Tue, Sep 6, 2011 at 12:47 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi Digy,
> >
> > Thank you. But how to apply this to my current project? Is this
> compitable
> > with lucene.net-2.9.x?
> >
> > Floyd
> >
> >
> > 2011/9/6 digy digy <di...@gmail.com>
> >
> > > That may help
> > >
> > > UnaccentedWordAnalyzer @
> > >
> > >
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
> > >
> > >
> > > DIGY
> > >
> > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I have a question that annoying me many times. my situation is that I
> > > need
> > > > to index file name and need to be searchable using partial file name.
> > > >
> > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> > > >
> > > > When I shot queries
> > > >
> > > > filename:ABCD    no match return.
> > > >
> > > > filename:2010Q2_ABCD     match
> > > >
> > > > filename:Report*    match
> > > >
> > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > > > filename
> > > > field is set to tokenized/indexed/store
> > > >
> > > > What I want is when user type any part of file name that lucene.Net
> can
> > > > match.
> > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> > > >
> > > > Please help on this or kindly direct me a way to solve it.
> > > >
> > > > Floyd
> > > >
> > >
> >
>

Re: [Lucene.Net] How to index/search a file name

Posted by digy digy <di...@gmail.com>.

- Just copy and paste to your project :) and use it instead of
StandardAnalyzer .
- Yes, it is from 2.9.4.

DIGY

On Tue, Sep 6, 2011 at 12:47 PM, Floyd Wu <fl...@gmail.com> wrote:

> Hi Digy,
>
> Thank you. But how to apply this to my current project? Is this compitable
> with lucene.net-2.9.x?
>
> Floyd
>
>
> 2011/9/6 digy digy <di...@gmail.com>
>
> > That may help
> >
> > UnaccentedWordAnalyzer @
> >
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
> >
> >
> > DIGY
> >
> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > I have a question that annoying me many times. my situation is that I
> > need
> > > to index file name and need to be searchable using partial file name.
> > >
> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> > >
> > > When I shot queries
> > >
> > > filename:ABCD    no match return.
> > >
> > > filename:2010Q2_ABCD     match
> > >
> > > filename:Report*    match
> > >
> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > > filename
> > > field is set to tokenized/indexed/store
> > >
> > > What I want is when user type any part of file name that lucene.Net can
> > > match.
> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> > >
> > > Please help on this or kindly direct me a way to solve it.
> > >
> > > Floyd
> > >
> >
>

Re: [Lucene.Net] How to index/search a file name

Posted by Floyd Wu <fl...@gmail.com>.

Hi Digy,

Thank you. But how to apply this to my current project? Is this compitable
with lucene.net-2.9.x?

Floyd


2011/9/6 digy digy <di...@gmail.com>

> That may help
>
> UnaccentedWordAnalyzer @
>
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
>
> DIGY
>
> On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I have a question that annoying me many times. my situation is that I
> need
> > to index file name and need to be searchable using partial file name.
> >
> > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> >
> > When I shot queries
> >
> > filename:ABCD    no match return.
> >
> > filename:2010Q2_ABCD     match
> >
> > filename:Report*    match
> >
> > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > filename
> > field is set to tokenized/indexed/store
> >
> > What I want is when user type any part of file name that lucene.Net can
> > match.
> > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> >
> > Please help on this or kindly direct me a way to solve it.
> >
> > Floyd
> >
>

Re: [Lucene.Net] How to index/search a file name

Posted by digy digy <di...@gmail.com>.

That may help

UnaccentedWordAnalyzer @
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs


DIGY

On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:

> Hi everyone,
>
> I have a question that annoying me many times. my situation is that I need
> to index file name and need to be searchable using partial file name.
>
> example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> When I shot queries
>
> filename:ABCD    no match return.
>
> filename:2010Q2_ABCD     match
>
> filename:Report*    match
>
> I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> filename
> field is set to tokenized/indexed/store
>
> What I want is when user type any part of file name that lucene.Net can
> match.
> (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> Please help on this or kindly direct me a way to solve it.
>
> Floyd
>