You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bill Snyder <ws...@gmail.com> on 2006/04/14 16:37:06 UTC

Syntax help

Hello,

We am using Lucene to facilitate searching of our applications log files. I
am noticing some inconsistencies in result sets when searching on certain
fields.

One field we index is the file path. I am using a simple query like
"location:Z:\logs\someLogFile.log". However, I can never get path searches
like this to come back with any results. Tried escaping the backslashes and
colon. Nothing seems to work. I missing something here in my syntax?

We also index the file name. However, on file names that have mixed case or
multiple extensions (logfile.D20060303.T234234) I cannot get results either.
Weird.

I haven't worked with Lucene very long, so I expect I am missing something
simple here.

If you need more info, let me know!
Many Thanks!

--Bill

Re: Syntax help

Posted by karl wettin <ka...@snigel.net>.
14 apr 2006 kl. 16.37 skrev Bill Snyder:
>
> One field we index is the file path. I am using a simple query like
> "location:Z:\logs\someLogFile.log". However, I can never get path  
> searches
> like this to come back with any results. Tried escaping the  
> backslashes and
> colon. Nothing seems to work. I missing something here in my syntax?

Can you open your index with Luke and see what the index looks like?

If it looks right, what does the code look like that retrieve the  
field value?
If not, what does the code look like that set the field value?

In case everything seems fine, do some debugging and report what values
you send to Lucene and what you get out.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Syntax help

Posted by Bill Snyder <ws...@gmail.com>.
Thanks! OK, how do I get the file separator to be part of the term? Luke
shows the parsed query as ignoring the file separator.

so location:Z\:\\/install/logs\\jetspeedservices.log

becomes location:"z install logs jetspeedservices.log"

--Bill



On 4/14/06, Rajesh Munavalli <fi...@gmail.com> wrote:
>
> On 4/14/06, Bill Snyder <ws...@gmail.com> wrote:
> >
> > AHA!  I am using the Search tab and have enteres the query :
> >
> > location:Z:\install\logs\archive.log.D20060406.T141958
> >
> > the query details says the query was parsed to
> >
> > location:z
> >
> > so if I escape the colon I see the new parsed query as
> >
> > location:"z installlogsarchive.log.d20060406.t141958"
> >
> > So, lucence does not store the file path exactly?! It converts it all
> > lower
> > case! Is there some property I should turn on?
>
>
> In the StandardAnalyzer, the LowerCaseFilter converts everything into
> lower
> case. You can skip that step.
>
> Plus, it is not storing the backslash. Should I be escaping these in the
> > index before storing them? It seems so.
>
> Yes
>
> -Bill
>
> On 4/14/06, Bill Snyder <ws...@gmail.com> wrote:
> >
> > Oh, cool. Look at that. A neat tool made with thinlets. I had not heard
> of
> > this...I'll see if it helps me figure out whats going on.
> >
> > --Bill
> >
> >
> > On 4/14/06, Rajesh Munavalli <fi...@gmail.com> wrote:
> > >
> > > It would be helpful to download Luke (http://www.getopt.org/luke/) and
> > > analyze whats getting indexed. Have you tried that?
> > >
> > > On 4/14/06, Bill Snyder < wsnyder6@gmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > We am using Lucene to facilitate searching of our applications log
> > > files.
> > > > I
> > > > am noticing some inconsistencies in result sets when searching on
> > > certain
> > > > fields.
> > > >
> > > > One field we index is the file path. I am using a simple query like
> > > > "location:Z:\logs\someLogFile.log". However, I can never get path
> > > searches
> > > > like this to come back with any results. Tried escaping the
> > > backslashes
> > > > and
> > > > colon. Nothing seems to work. I missing something here in my syntax?
> > > >
> > > > We also index the file name. However, on file names that have mixed
> > > case
> > > > or
> > > > multiple extensions (logfile.D20060303.T234234 ) I cannot get
> results
> > > > either.
> > > > Weird.
> > > >
> > > > I haven't worked with Lucene very long, so I expect I am missing
> > > something
> > > > simple here.
> > > >
> > > > If you need more info, let me know!
> > > > Many Thanks!
> > > >
> > > > --Bill
> > > >
> > > >
> > >
> > >
> >
>
>

Re: Syntax help

Posted by Rajesh Munavalli <fi...@gmail.com>.
On 4/14/06, Bill Snyder <ws...@gmail.com> wrote:
>
> AHA!  I am using the Search tab and have enteres the query :
>
> location:Z:\install\logs\archive.log.D20060406.T141958
>
> the query details says the query was parsed to
>
> location:z
>
> so if I escape the colon I see the new parsed query as
>
> location:"z installlogsarchive.log.d20060406.t141958"
>
> So, lucence does not store the file path exactly?! It converts it all
> lower
> case! Is there some property I should turn on?


In the StandardAnalyzer, the LowerCaseFilter converts everything into lower
case. You can skip that step.

Plus, it is not storing the backslash. Should I be escaping these in the
> index before storing them? It seems so.

Yes

-Bill

On 4/14/06, Bill Snyder <ws...@gmail.com> wrote:
>
> Oh, cool. Look at that. A neat tool made with thinlets. I had not heard of
> this...I'll see if it helps me figure out whats going on.
>
> --Bill
>
>
> On 4/14/06, Rajesh Munavalli <fi...@gmail.com> wrote:
> >
> > It would be helpful to download Luke (http://www.getopt.org/luke/) and
> > analyze whats getting indexed. Have you tried that?
> >
> > On 4/14/06, Bill Snyder < wsnyder6@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > We am using Lucene to facilitate searching of our applications log
> > files.
> > > I
> > > am noticing some inconsistencies in result sets when searching on
> > certain
> > > fields.
> > >
> > > One field we index is the file path. I am using a simple query like
> > > "location:Z:\logs\someLogFile.log". However, I can never get path
> > searches
> > > like this to come back with any results. Tried escaping the
> > backslashes
> > > and
> > > colon. Nothing seems to work. I missing something here in my syntax?
> > >
> > > We also index the file name. However, on file names that have mixed
> > case
> > > or
> > > multiple extensions (logfile.D20060303.T234234 ) I cannot get results
> > > either.
> > > Weird.
> > >
> > > I haven't worked with Lucene very long, so I expect I am missing
> > something
> > > simple here.
> > >
> > > If you need more info, let me know!
> > > Many Thanks!
> > >
> > > --Bill
> > >
> > >
> >
> >
>

Re: Syntax help

Posted by Bill Snyder <ws...@gmail.com>.
On 4/14/06, Erick Erickson <er...@gmail.com> wrote:
>
> Something that took me a while to get was that the analyzer is important
> BOTH in the indexing phase and in the searching phase (assuming you're
> using
> the QueryParser). For you experiment, you probably want to use the
> WhitespaceAnalyzer. See page 119 of "Lucene in Action".

The other three most-common analyzers divide text at nonletter characters,
> which will do bad things to your path names.....
>
> Also note that you can use the PerFieldAnalyzerWrapper to use, say, the
> WhitespaceAnalyzer on the file-path field and other analyzers on other
> fields, you're not locked into using the same analyzer for all fields.
>
> Best
> Erick
>
>
> BTW, I really recommend a copy of "Lucene in Action"......


PerFieldAnalyzerWrapper looks like what I want!

I've heard nothing but good things about the book and will have to pick it
up!

Thanks for the help everyone!

Re: Syntax help

Posted by Erick Erickson <er...@gmail.com>.
Something that took me a while to get was that the analyzer is important
BOTH in the indexing phase and in the searching phase (assuming you're using
the QueryParser). For you experiment, you probably want to use the
WhitespaceAnalyzer. See page 119 of "Lucene in Action".

The other three most-common analyzers divide text at nonletter characters,
which will do bad things to your path names.....

Also note that you can use the PerFieldAnalyzerWrapper to use, say, the
WhitespaceAnalyzer on the file-path field and other analyzers on other
fields, you're not locked into using the same analyzer for all fields.

Best
Erick


BTW, I really recommend a copy of "Lucene in Action"......

Re: Syntax help

Posted by Bill Snyder <ws...@gmail.com>.
oops, thought that you were just referring to the lowercase...
:)

On 4/14/06, karl wettin <ka...@snigel.net> wrote:
>
>
> 14 apr 2006 kl. 17.22 skrev karl wettin:
> >
> > It is the Analyzer that does that. Try creating your IndexSearcher
> > with a KeywordAnalyzer (it think).
>
> err
>
> It is the Analyzer that does that. Try using a KeywordAnalyzer (it
> think).
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Syntax help

Posted by karl wettin <ka...@snigel.net>.
14 apr 2006 kl. 17.22 skrev karl wettin:
>
> It is the Analyzer that does that. Try creating your IndexSearcher  
> with a KeywordAnalyzer (it think).

err

It is the Analyzer that does that. Try using a KeywordAnalyzer (it  
think).

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Syntax help

Posted by karl wettin <ka...@snigel.net>.
14 apr 2006 kl. 17.11 skrev Bill Snyder:

>
> so if I escape the colon I see the new parsed query as
>
> location:"z installlogsarchive.log.d20060406.t141958"
>
> So, lucence does not store the file path exactly?! It converts it  
> all lower
> case! Is there some property I should turn on?

It is the Analyzer that does that. Try creating your IndexSearcher  
with a KeywordAnalyzer (it think).

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Syntax help

Posted by Bill Snyder <ws...@gmail.com>.
AHA!  I am using the Search tab and have enteres the query :

location:Z:\install\logs\archive.log.D20060406.T141958

the query details says the query was parsed to

location:z

so if I escape the colon I see the new parsed query as

location:"z installlogsarchive.log.d20060406.t141958"

So, lucence does not store the file path exactly?! It converts it all lower
case! Is there some property I should turn on?

Plus, it is not storing the backslash. Should I be escaping these in the
index before storing them? It seems so.

-Bill

On 4/14/06, Bill Snyder <ws...@gmail.com> wrote:
>
> Oh, cool. Look at that. A neat tool made with thinlets. I had not heard of
> this...I'll see if it helps me figure out whats going on.
>
> --Bill
>
>
> On 4/14/06, Rajesh Munavalli <fi...@gmail.com> wrote:
> >
> > It would be helpful to download Luke (http://www.getopt.org/luke/) and
> > analyze whats getting indexed. Have you tried that?
> >
> > On 4/14/06, Bill Snyder < wsnyder6@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > We am using Lucene to facilitate searching of our applications log
> > files.
> > > I
> > > am noticing some inconsistencies in result sets when searching on
> > certain
> > > fields.
> > >
> > > One field we index is the file path. I am using a simple query like
> > > "location:Z:\logs\someLogFile.log". However, I can never get path
> > searches
> > > like this to come back with any results. Tried escaping the
> > backslashes
> > > and
> > > colon. Nothing seems to work. I missing something here in my syntax?
> > >
> > > We also index the file name. However, on file names that have mixed
> > case
> > > or
> > > multiple extensions (logfile.D20060303.T234234 ) I cannot get results
> > > either.
> > > Weird.
> > >
> > > I haven't worked with Lucene very long, so I expect I am missing
> > something
> > > simple here.
> > >
> > > If you need more info, let me know!
> > > Many Thanks!
> > >
> > > --Bill
> > >
> > >
> >
> >
>

Re: Syntax help

Posted by Bill Snyder <ws...@gmail.com>.
Oh, cool. Look at that. A neat tool made with thinlets. I had not heard of
this...I'll see if it helps me figure out whats going on.

--Bill

On 4/14/06, Rajesh Munavalli <fi...@gmail.com> wrote:
>
> It would be helpful to download Luke (http://www.getopt.org/luke/) and
> analyze whats getting indexed. Have you tried that?
>
> On 4/14/06, Bill Snyder <ws...@gmail.com> wrote:
> >
> > Hello,
> >
> > We am using Lucene to facilitate searching of our applications log
> files.
> > I
> > am noticing some inconsistencies in result sets when searching on
> certain
> > fields.
> >
> > One field we index is the file path. I am using a simple query like
> > "location:Z:\logs\someLogFile.log". However, I can never get path
> searches
> > like this to come back with any results. Tried escaping the backslashes
> > and
> > colon. Nothing seems to work. I missing something here in my syntax?
> >
> > We also index the file name. However, on file names that have mixed case
> > or
> > multiple extensions (logfile.D20060303.T234234) I cannot get results
> > either.
> > Weird.
> >
> > I haven't worked with Lucene very long, so I expect I am missing
> something
> > simple here.
> >
> > If you need more info, let me know!
> > Many Thanks!
> >
> > --Bill
> >
> >
>
>

Re: Syntax help

Posted by Rajesh Munavalli <fi...@gmail.com>.
It would be helpful to download Luke (http://www.getopt.org/luke/) and
analyze whats getting indexed. Have you tried that?

On 4/14/06, Bill Snyder <ws...@gmail.com> wrote:
>
> Hello,
>
> We am using Lucene to facilitate searching of our applications log files.
> I
> am noticing some inconsistencies in result sets when searching on certain
> fields.
>
> One field we index is the file path. I am using a simple query like
> "location:Z:\logs\someLogFile.log". However, I can never get path searches
> like this to come back with any results. Tried escaping the backslashes
> and
> colon. Nothing seems to work. I missing something here in my syntax?
>
> We also index the file name. However, on file names that have mixed case
> or
> multiple extensions (logfile.D20060303.T234234) I cannot get results
> either.
> Weird.
>
> I haven't worked with Lucene very long, so I expect I am missing something
> simple here.
>
> If you need more info, let me know!
> Many Thanks!
>
> --Bill
>
>