You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Floyd Wu <fl...@gmail.com> on 2011/09/06 11:31:21 UTC
[Lucene.Net] How to index/search a file name
Hi everyone,
I have a question that annoying me many times. my situation is that I need
to index file name and need to be searchable using partial file name.
example--> 2009&2010Q2_ABCD_Report.xls (the file name)
When I shot queries
filename:ABCD no match return.
filename:2010Q2_ABCD match
filename:Report* match
I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current filename
field is set to tokenized/indexed/store
What I want is when user type any part of file name that lucene.Net can
match.
(string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
Please help on this or kindly direct me a way to solve it.
Floyd
Re: [Lucene.Net] How to index/search a file name
Posted by Gustavo Poll <gk...@gmail.com>.
Just to give a feedback, in case someone is interested -
ModifiedStandardAnalyzer class seems to work perfectly as a Standard
Analyzer but accent insensitive... A small difference occured with the last
character, but it does not belong to the portuguese alphabet, so I think
there's no problem in ignoring it in my case...
Thanks Digy!
Test results:
(tokenizing the expression "Name.Surname@gmail.com 123.456 3,5 AT&T João
Avião Calção ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß")
StandardAnalyzer:
[name.surname@gmail.com] [123.456] [3,5] [at&t] [joão] [avião] [calção]
[güsıöç] [güsiöç] [aß?de?] [??????] [ssß]
ModifiedStandardAnalyzer: (accent insensitive)
[name.surname@gmail.com] [123.456] [3,5] [at&t] [joao] [aviao] [calcao]
[gusioc] [gusioc] [aß?de?] [??????] [ssss]
Thanx
Gustavo Poll
2011/9/6 Gustavo Poll <gk...@gmail.com>
> thanks, I'll do it...
>
> 2011/9/6 Digy <di...@gmail.com>
>
>> That can be a starting point (Just play a little bit with with tokenizers
>> & filters )
>>
>>
>>
>> public class ModifiedStandardAnalyzer : Analyzer
>>
>> {
>>
>> public override TokenStream TokenStream(System.String fieldName,
>> System.IO.TextReader reader)
>>
>> {
>>
>> StandardTokenizer tokenStream = new StandardTokenizer(reader,
>> true);
>>
>> TokenStream result = new StandardFilter(tokenStream);
>>
>> result = new LowerCaseFilter(result);
>>
>> result = new ASCIIFoldingFilter(result);
>>
>> return result;
>>
>> }
>>
>> }
>>
>>
>>
>> DIGY
>>
>>
>>
>> -----Original Message-----
>> From: Gustavo Poll [mailto:gkpoll@gmail.com]
>> Sent: Tuesday, September 06, 2011 10:06 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: [Lucene.Net] How to index/search a file name
>>
>>
>>
>> thanks again... Ok, it is not..
>>
>>
>>
>> standard analyzer:
>>
>>
>>
>> [name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç]
>> [aß?de?]
>>
>> [??????] [ssß]
>>
>>
>>
>> UnaccentedWordAnalyzer:
>>
>>
>>
>> [name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
>>
>> [gusioc] [aß?de?] [??????] [ssss]
>>
>>
>>
>>
>>
>> StandardAnalyzer would be perfect to my application if it was accent
>>
>> insensitive... Can anyone tell me please, the easiest way to code such
>>
>> analyzer? (accent insensitive Standard Analyzer)
>>
>>
>>
>> I hear it is not a good idea to make a class that inherits
>> StandardAnalyzer
>>
>> cause StandardAnalyzer should be a final class.. Is this coherent?
>>
>>
>>
>> Appreciate any help please...
>>
>> Gustavo Poll
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 2011/9/6 Digy <di...@gmail.com>
>>
>>
>>
>> > A function is worth a thousand words J
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > void Test()
>>
>> >
>>
>> > {
>>
>> >
>>
>> > Analyzer[] analyzers = new Analyzer[] { new
>> StandardAnalyzer(),
>>
>> > new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>>
>> >
>>
>> > string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
>>
>> > ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>>
>> >
>>
>> >
>>
>> >
>>
>> > foreach (Analyzer analyzer in analyzers)
>>
>> >
>>
>> > {
>>
>> >
>>
>> > TokenStream ts = analyzer.TokenStream("", new
>>
>> > StringReader(input));
>>
>> >
>>
>> > Lucene.Net.Analysis.Token t = ts.Next();
>>
>> >
>>
>> > while (t != null)
>>
>> >
>>
>> > {
>>
>> >
>>
>> > Console.Write("[" + t.TermText() + "] ");
>>
>> >
>>
>> > t = ts.Next();
>>
>> >
>>
>> > }
>>
>> >
>>
>> > Console.WriteLine(); Console.WriteLine();
>>
>> >
>>
>> >
>>
>> >
>>
>> > }
>>
>> >
>>
>> > }
>>
>> >
>>
>> >
>>
>> >
>>
>> > DIGY
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > -----Original Message-----
>>
>> > From: Gustavo Poll [mailto:gkpoll@gmail.com]
>>
>> > Sent: Tuesday, September 06, 2011 9:00 PM
>>
>> > To: lucene-net-user@lucene.apache.org
>>
>> > Subject: Re: [Lucene.Net] How to index/search a file name
>>
>> >
>>
>> >
>>
>> >
>>
>> > thanks DIGY, I have interest in that too... Let me see if i understood:
>>
>> >
>>
>> >
>>
>> >
>>
>> > UnaccentedWordAnalyzer is like Standard Analyzer, but accent
>> insensitive?
>>
>> >
>>
>> >
>>
>> >
>>
>> > Thanks!
>>
>> >
>>
>> > Gustavo Poll
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > 2011/9/6 digy digy <di...@gmail.com>
>>
>> >
>>
>> >
>>
>> >
>>
>> > > That may help
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > UnaccentedWordAnalyzer @
>>
>> >
>>
>> > >
>>
>> >
>>
>> > >
>>
>> >
>> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>>
>> >
>>
>> > >
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > DIGY
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>>
>> >
>>
>> > >
>>
>> >
>>
>> > > > Hi everyone,
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > I have a question that annoying me many times. my situation is that
>> I
>>
>> >
>>
>> > > need
>>
>> >
>>
>> > > > to index file name and need to be searchable using partial file
>> name.
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > When I shot queries
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > filename:ABCD no match return.
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > filename:2010Q2_ABCD match
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > filename:Report* match
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>>
>> >
>>
>> > > > filename
>>
>> >
>>
>> > > > field is set to tokenized/indexed/store
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > What I want is when user type any part of file name that lucene.Net
>> can
>>
>> >
>>
>> > > > match.
>>
>> >
>>
>> > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > Please help on this or kindly direct me a way to solve it.
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > > > Floyd
>>
>> >
>>
>> > > >
>>
>> >
>>
>> > >
>>
>> >
>>
>> >
>>
>> >
>>
>> > -----
>>
>> >
>>
>> > Bu iletide virüs bulunamadı.
>>
>> >
>>
>> > AVG tarafından kontrol edildi - www.avg.com
>>
>> >
>>
>> > Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi:
>> 06.09.2011
>>
>> >
>>
>> >
>>
>>
>>
>> -----
>>
>> Bu iletide virüs bulunamadı.
>>
>> AVG tarafından kontrol edildi - www.avg.com
>>
>> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi:
>> 06.09.2011
>>
>>
>
Re: [Lucene.Net] How to index/search a file name
Posted by Gustavo Poll <gk...@gmail.com>.
thanks, I'll do it...
2011/9/6 Digy <di...@gmail.com>
> That can be a starting point (Just play a little bit with with tokenizers &
> filters )
>
>
>
> public class ModifiedStandardAnalyzer : Analyzer
>
> {
>
> public override TokenStream TokenStream(System.String fieldName,
> System.IO.TextReader reader)
>
> {
>
> StandardTokenizer tokenStream = new StandardTokenizer(reader,
> true);
>
> TokenStream result = new StandardFilter(tokenStream);
>
> result = new LowerCaseFilter(result);
>
> result = new ASCIIFoldingFilter(result);
>
> return result;
>
> }
>
> }
>
>
>
> DIGY
>
>
>
> -----Original Message-----
> From: Gustavo Poll [mailto:gkpoll@gmail.com]
> Sent: Tuesday, September 06, 2011 10:06 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] How to index/search a file name
>
>
>
> thanks again... Ok, it is not..
>
>
>
> standard analyzer:
>
>
>
> [name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]
>
> [??????] [ssß]
>
>
>
> UnaccentedWordAnalyzer:
>
>
>
> [name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
>
> [gusioc] [aß?de?] [??????] [ssss]
>
>
>
>
>
> StandardAnalyzer would be perfect to my application if it was accent
>
> insensitive... Can anyone tell me please, the easiest way to code such
>
> analyzer? (accent insensitive Standard Analyzer)
>
>
>
> I hear it is not a good idea to make a class that inherits StandardAnalyzer
>
> cause StandardAnalyzer should be a final class.. Is this coherent?
>
>
>
> Appreciate any help please...
>
> Gustavo Poll
>
>
>
>
>
>
>
>
>
> 2011/9/6 Digy <di...@gmail.com>
>
>
>
> > A function is worth a thousand words J
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > void Test()
>
> >
>
> > {
>
> >
>
> > Analyzer[] analyzers = new Analyzer[] { new
> StandardAnalyzer(),
>
> > new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>
> >
>
> > string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
>
> > ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>
> >
>
> >
>
> >
>
> > foreach (Analyzer analyzer in analyzers)
>
> >
>
> > {
>
> >
>
> > TokenStream ts = analyzer.TokenStream("", new
>
> > StringReader(input));
>
> >
>
> > Lucene.Net.Analysis.Token t = ts.Next();
>
> >
>
> > while (t != null)
>
> >
>
> > {
>
> >
>
> > Console.Write("[" + t.TermText() + "] ");
>
> >
>
> > t = ts.Next();
>
> >
>
> > }
>
> >
>
> > Console.WriteLine(); Console.WriteLine();
>
> >
>
> >
>
> >
>
> > }
>
> >
>
> > }
>
> >
>
> >
>
> >
>
> > DIGY
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > -----Original Message-----
>
> > From: Gustavo Poll [mailto:gkpoll@gmail.com]
>
> > Sent: Tuesday, September 06, 2011 9:00 PM
>
> > To: lucene-net-user@lucene.apache.org
>
> > Subject: Re: [Lucene.Net] How to index/search a file name
>
> >
>
> >
>
> >
>
> > thanks DIGY, I have interest in that too... Let me see if i understood:
>
> >
>
> >
>
> >
>
> > UnaccentedWordAnalyzer is like Standard Analyzer, but accent
> insensitive?
>
> >
>
> >
>
> >
>
> > Thanks!
>
> >
>
> > Gustavo Poll
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > 2011/9/6 digy digy <di...@gmail.com>
>
> >
>
> >
>
> >
>
> > > That may help
>
> >
>
> > >
>
> >
>
> > > UnaccentedWordAnalyzer @
>
> >
>
> > >
>
> >
>
> > >
>
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > DIGY
>
> >
>
> > >
>
> >
>
> > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> >
>
> > >
>
> >
>
> > > > Hi everyone,
>
> >
>
> > > >
>
> >
>
> > > > I have a question that annoying me many times. my situation is that I
>
> >
>
> > > need
>
> >
>
> > > > to index file name and need to be searchable using partial file name.
>
> >
>
> > > >
>
> >
>
> > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> >
>
> > > >
>
> >
>
> > > > When I shot queries
>
> >
>
> > > >
>
> >
>
> > > > filename:ABCD no match return.
>
> >
>
> > > >
>
> >
>
> > > > filename:2010Q2_ABCD match
>
> >
>
> > > >
>
> >
>
> > > > filename:Report* match
>
> >
>
> > > >
>
> >
>
> > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>
> >
>
> > > > filename
>
> >
>
> > > > field is set to tokenized/indexed/store
>
> >
>
> > > >
>
> >
>
> > > > What I want is when user type any part of file name that lucene.Net
> can
>
> >
>
> > > > match.
>
> >
>
> > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> >
>
> > > >
>
> >
>
> > > > Please help on this or kindly direct me a way to solve it.
>
> >
>
> > > >
>
> >
>
> > > > Floyd
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> >
>
> >
>
> > -----
>
> >
>
> > Bu iletide virüs bulunamadı.
>
> >
>
> > AVG tarafından kontrol edildi - www.avg.com
>
> >
>
> > Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi:
> 06.09.2011
>
> >
>
> >
>
>
>
> -----
>
> Bu iletide virüs bulunamadı.
>
> AVG tarafından kontrol edildi - www.avg.com
>
> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
>
>
RE: [Lucene.Net] How to index/search a file name
Posted by Digy <di...@gmail.com>.
That can be a starting point (Just play a little bit with with tokenizers & filters )
public class ModifiedStandardAnalyzer : Analyzer
{
public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader)
{
StandardTokenizer tokenStream = new StandardTokenizer(reader, true);
TokenStream result = new StandardFilter(tokenStream);
result = new LowerCaseFilter(result);
result = new ASCIIFoldingFilter(result);
return result;
}
}
DIGY
-----Original Message-----
From: Gustavo Poll [mailto:gkpoll@gmail.com]
Sent: Tuesday, September 06, 2011 10:06 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] How to index/search a file name
thanks again... Ok, it is not..
standard analyzer:
[name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]
[??????] [ssß]
UnaccentedWordAnalyzer:
[name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
[gusioc] [aß?de?] [??????] [ssss]
StandardAnalyzer would be perfect to my application if it was accent
insensitive... Can anyone tell me please, the easiest way to code such
analyzer? (accent insensitive Standard Analyzer)
I hear it is not a good idea to make a class that inherits StandardAnalyzer
cause StandardAnalyzer should be a final class.. Is this coherent?
Appreciate any help please...
Gustavo Poll
2011/9/6 Digy <di...@gmail.com>
> A function is worth a thousand words J
>
>
>
>
>
> void Test()
>
> {
>
> Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(),
> new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>
> string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
> ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>
>
>
> foreach (Analyzer analyzer in analyzers)
>
> {
>
> TokenStream ts = analyzer.TokenStream("", new
> StringReader(input));
>
> Lucene.Net.Analysis.Token t = ts.Next();
>
> while (t != null)
>
> {
>
> Console.Write("[" + t.TermText() + "] ");
>
> t = ts.Next();
>
> }
>
> Console.WriteLine(); Console.WriteLine();
>
>
>
> }
>
> }
>
>
>
> DIGY
>
>
>
>
>
> -----Original Message-----
> From: Gustavo Poll [mailto:gkpoll@gmail.com]
> Sent: Tuesday, September 06, 2011 9:00 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] How to index/search a file name
>
>
>
> thanks DIGY, I have interest in that too... Let me see if i understood:
>
>
>
> UnaccentedWordAnalyzer is like Standard Analyzer, but accent insensitive?
>
>
>
> Thanks!
>
> Gustavo Poll
>
>
>
>
>
> 2011/9/6 digy digy <di...@gmail.com>
>
>
>
> > That may help
>
> >
>
> > UnaccentedWordAnalyzer @
>
> >
>
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
> >
>
> >
>
> > DIGY
>
> >
>
> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> >
>
> > > Hi everyone,
>
> > >
>
> > > I have a question that annoying me many times. my situation is that I
>
> > need
>
> > > to index file name and need to be searchable using partial file name.
>
> > >
>
> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> > >
>
> > > When I shot queries
>
> > >
>
> > > filename:ABCD no match return.
>
> > >
>
> > > filename:2010Q2_ABCD match
>
> > >
>
> > > filename:Report* match
>
> > >
>
> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>
> > > filename
>
> > > field is set to tokenized/indexed/store
>
> > >
>
> > > What I want is when user type any part of file name that lucene.Net can
>
> > > match.
>
> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> > >
>
> > > Please help on this or kindly direct me a way to solve it.
>
> > >
>
> > > Floyd
>
> > >
>
> >
>
>
>
> -----
>
> Bu iletide virüs bulunamadı.
>
> AVG tarafından kontrol edildi - www.avg.com
>
> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
>
>
-----
Bu iletide virüs bulunamadı.
AVG tarafından kontrol edildi - www.avg.com
Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
Re: [Lucene.Net] How to index/search a file name
Posted by Gustavo Poll <gk...@gmail.com>.
thanks again... Ok, it is not..
standard analyzer:
[name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]
[??????] [ssß]
UnaccentedWordAnalyzer:
[name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
[gusioc] [aß?de?] [??????] [ssss]
StandardAnalyzer would be perfect to my application if it was accent
insensitive... Can anyone tell me please, the easiest way to code such
analyzer? (accent insensitive Standard Analyzer)
I hear it is not a good idea to make a class that inherits StandardAnalyzer
cause StandardAnalyzer should be a final class.. Is this coherent?
Appreciate any help please...
Gustavo Poll
2011/9/6 Digy <di...@gmail.com>
> A function is worth a thousand words J
>
>
>
>
>
> void Test()
>
> {
>
> Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(),
> new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>
> string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
> ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>
>
>
> foreach (Analyzer analyzer in analyzers)
>
> {
>
> TokenStream ts = analyzer.TokenStream("", new
> StringReader(input));
>
> Lucene.Net.Analysis.Token t = ts.Next();
>
> while (t != null)
>
> {
>
> Console.Write("[" + t.TermText() + "] ");
>
> t = ts.Next();
>
> }
>
> Console.WriteLine(); Console.WriteLine();
>
>
>
> }
>
> }
>
>
>
> DIGY
>
>
>
>
>
> -----Original Message-----
> From: Gustavo Poll [mailto:gkpoll@gmail.com]
> Sent: Tuesday, September 06, 2011 9:00 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] How to index/search a file name
>
>
>
> thanks DIGY, I have interest in that too... Let me see if i understood:
>
>
>
> UnaccentedWordAnalyzer is like Standard Analyzer, but accent insensitive?
>
>
>
> Thanks!
>
> Gustavo Poll
>
>
>
>
>
> 2011/9/6 digy digy <di...@gmail.com>
>
>
>
> > That may help
>
> >
>
> > UnaccentedWordAnalyzer @
>
> >
>
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
> >
>
> >
>
> > DIGY
>
> >
>
> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> >
>
> > > Hi everyone,
>
> > >
>
> > > I have a question that annoying me many times. my situation is that I
>
> > need
>
> > > to index file name and need to be searchable using partial file name.
>
> > >
>
> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> > >
>
> > > When I shot queries
>
> > >
>
> > > filename:ABCD no match return.
>
> > >
>
> > > filename:2010Q2_ABCD match
>
> > >
>
> > > filename:Report* match
>
> > >
>
> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>
> > > filename
>
> > > field is set to tokenized/indexed/store
>
> > >
>
> > > What I want is when user type any part of file name that lucene.Net can
>
> > > match.
>
> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> > >
>
> > > Please help on this or kindly direct me a way to solve it.
>
> > >
>
> > > Floyd
>
> > >
>
> >
>
>
>
> -----
>
> Bu iletide virüs bulunamadı.
>
> AVG tarafından kontrol edildi - www.avg.com
>
> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
>
>
RE: [Lucene.Net] How to index/search a file name
Posted by Digy <di...@gmail.com>.
A function is worth a thousand words J
void Test()
{
Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(), new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
string input = "Name.Surname@gmail.com 123.456 3,5 AT&T ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
foreach (Analyzer analyzer in analyzers)
{
TokenStream ts = analyzer.TokenStream("", new StringReader(input));
Lucene.Net.Analysis.Token t = ts.Next();
while (t != null)
{
Console.Write("[" + t.TermText() + "] ");
t = ts.Next();
}
Console.WriteLine(); Console.WriteLine();
}
}
DIGY
-----Original Message-----
From: Gustavo Poll [mailto:gkpoll@gmail.com]
Sent: Tuesday, September 06, 2011 9:00 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] How to index/search a file name
thanks DIGY, I have interest in that too... Let me see if i understood:
UnaccentedWordAnalyzer is like Standard Analyzer, but accent insensitive?
Thanks!
Gustavo Poll
2011/9/6 digy digy <di...@gmail.com>
> That may help
>
> UnaccentedWordAnalyzer @
>
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
>
> DIGY
>
> On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I have a question that annoying me many times. my situation is that I
> need
> > to index file name and need to be searchable using partial file name.
> >
> > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> >
> > When I shot queries
> >
> > filename:ABCD no match return.
> >
> > filename:2010Q2_ABCD match
> >
> > filename:Report* match
> >
> > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > filename
> > field is set to tokenized/indexed/store
> >
> > What I want is when user type any part of file name that lucene.Net can
> > match.
> > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> >
> > Please help on this or kindly direct me a way to solve it.
> >
> > Floyd
> >
>
-----
Bu iletide virüs bulunamadı.
AVG tarafından kontrol edildi - www.avg.com
Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
Re: [Lucene.Net] How to index/search a file name
Posted by Gustavo Poll <gk...@gmail.com>.
thanks DIGY, I have interest in that too... Let me see if i understood:
UnaccentedWordAnalyzer is like Standard Analyzer, but accent insensitive?
Thanks!
Gustavo Poll
2011/9/6 digy digy <di...@gmail.com>
> That may help
>
> UnaccentedWordAnalyzer @
>
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
>
> DIGY
>
> On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I have a question that annoying me many times. my situation is that I
> need
> > to index file name and need to be searchable using partial file name.
> >
> > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> >
> > When I shot queries
> >
> > filename:ABCD no match return.
> >
> > filename:2010Q2_ABCD match
> >
> > filename:Report* match
> >
> > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > filename
> > field is set to tokenized/indexed/store
> >
> > What I want is when user type any part of file name that lucene.Net can
> > match.
> > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> >
> > Please help on this or kindly direct me a way to solve it.
> >
> > Floyd
> >
>
Re: [Lucene.Net] How to index/search a file name
Posted by Floyd Wu <fl...@gmail.com>.
Hi Digy,
Many thanks.
I try this later. By the way, do you recommend to upgrade to 2.9.4g or stay
in 2.9.3?
I've been use 2.9.3 for long time and seems to no big problem.
Floyd
2011/9/6 digy digy <di...@gmail.com>
> - Just copy and paste to your project :) and use it instead of
> StandardAnalyzer .
> - Yes, it is from 2.9.4.
>
> DIGY
>
> On Tue, Sep 6, 2011 at 12:47 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi Digy,
> >
> > Thank you. But how to apply this to my current project? Is this
> compitable
> > with lucene.net-2.9.x?
> >
> > Floyd
> >
> >
> > 2011/9/6 digy digy <di...@gmail.com>
> >
> > > That may help
> > >
> > > UnaccentedWordAnalyzer @
> > >
> > >
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
> > >
> > >
> > > DIGY
> > >
> > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I have a question that annoying me many times. my situation is that I
> > > need
> > > > to index file name and need to be searchable using partial file name.
> > > >
> > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> > > >
> > > > When I shot queries
> > > >
> > > > filename:ABCD no match return.
> > > >
> > > > filename:2010Q2_ABCD match
> > > >
> > > > filename:Report* match
> > > >
> > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > > > filename
> > > > field is set to tokenized/indexed/store
> > > >
> > > > What I want is when user type any part of file name that lucene.Net
> can
> > > > match.
> > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> > > >
> > > > Please help on this or kindly direct me a way to solve it.
> > > >
> > > > Floyd
> > > >
> > >
> >
>
Re: [Lucene.Net] How to index/search a file name
Posted by digy digy <di...@gmail.com>.
- Just copy and paste to your project :) and use it instead of
StandardAnalyzer .
- Yes, it is from 2.9.4.
DIGY
On Tue, Sep 6, 2011 at 12:47 PM, Floyd Wu <fl...@gmail.com> wrote:
> Hi Digy,
>
> Thank you. But how to apply this to my current project? Is this compitable
> with lucene.net-2.9.x?
>
> Floyd
>
>
> 2011/9/6 digy digy <di...@gmail.com>
>
> > That may help
> >
> > UnaccentedWordAnalyzer @
> >
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
> >
> >
> > DIGY
> >
> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > I have a question that annoying me many times. my situation is that I
> > need
> > > to index file name and need to be searchable using partial file name.
> > >
> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> > >
> > > When I shot queries
> > >
> > > filename:ABCD no match return.
> > >
> > > filename:2010Q2_ABCD match
> > >
> > > filename:Report* match
> > >
> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > > filename
> > > field is set to tokenized/indexed/store
> > >
> > > What I want is when user type any part of file name that lucene.Net can
> > > match.
> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> > >
> > > Please help on this or kindly direct me a way to solve it.
> > >
> > > Floyd
> > >
> >
>
Re: [Lucene.Net] How to index/search a file name
Posted by Floyd Wu <fl...@gmail.com>.
Hi Digy,
Thank you. But how to apply this to my current project? Is this compitable
with lucene.net-2.9.x?
Floyd
2011/9/6 digy digy <di...@gmail.com>
> That may help
>
> UnaccentedWordAnalyzer @
>
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
>
> DIGY
>
> On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I have a question that annoying me many times. my situation is that I
> need
> > to index file name and need to be searchable using partial file name.
> >
> > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
> >
> > When I shot queries
> >
> > filename:ABCD no match return.
> >
> > filename:2010Q2_ABCD match
> >
> > filename:Report* match
> >
> > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> > filename
> > field is set to tokenized/indexed/store
> >
> > What I want is when user type any part of file name that lucene.Net can
> > match.
> > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
> >
> > Please help on this or kindly direct me a way to solve it.
> >
> > Floyd
> >
>
Re: [Lucene.Net] How to index/search a file name
Posted by digy digy <di...@gmail.com>.
That may help
UnaccentedWordAnalyzer @
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
DIGY
On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <fl...@gmail.com> wrote:
> Hi everyone,
>
> I have a question that annoying me many times. my situation is that I need
> to index file name and need to be searchable using partial file name.
>
> example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> When I shot queries
>
> filename:ABCD no match return.
>
> filename:2010Q2_ABCD match
>
> filename:Report* match
>
> I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
> filename
> field is set to tokenized/indexed/store
>
> What I want is when user type any part of file name that lucene.Net can
> match.
> (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> Please help on this or kindly direct me a way to solve it.
>
> Floyd
>