You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Robert Brown <rb...@redbaritone.com> on 2004/07/05 18:21:58 UTC

Underscore character and case issue

I traverse a series of files under a parent directory (similar to the 
demo sample) and store the filename in a Document Keyword field called 
'Filename'.  I am using the StandardAnalyzer for both building the index 
and searching the index.

I have two things I am trying to understand:

1. A search does not find files if they contain capitalization.  If I 
search for a known file in the index (N3151.txt) with a search string of 
'N3151.txt' it is not found.  As a workaround, I am storing the filename 
in a different "Unstored" field and converting the filename to lowercase 
for the Filename field.  It behaves like the search filtered to 
lowercase but the index did not.  Do I have to explicitly use 
.toLowerCase() on a field during indexing or am I building my index 
incorrectly?

2. A few of my filenames have underscores in them (e.g. readme_v32.txt) 
and I am having a hard time finding API documentation that relates to 
this character.  I cannot find a filename when typing the name exactly 
but am able to see it with a wildcard search string (e.g. readme*.txt or 
r*v32.txt).  What do I do to handle the underscore?  Is this a weight 
problem, something to do with the QueryParserConstants, or something 
entirely different?

Thanks for your help out there!

				R


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Underscore character and case issue

Posted by Robert Brown <rb...@redbaritone.com>.
> Luke runs just fine with 1.3.1.  If you're using Windows, try highlighting
> it with Windows Explorer, right-clicking on it, choosing the "Open with.."
> menu option and selecting "javaw".

That's the ticket...now why didn't I think of that.  Thank you!

				R


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Underscore character and case issue

Posted by Terry Steichen <te...@net-frame.com>.
Luke runs just fine with 1.3.1.  If you're using Windows, try highlighting
it with Windows Explorer, right-clicking on it, choosing the "Open with.."
menu option and selecting "javaw".

Regards,

Terry

----- Original Message ----- 
From: "Andrzej Bialecki" <ab...@getopt.org>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Monday, July 05, 2004 1:45 PM
Subject: Re: Underscore character and case issue


> Robert Brown wrote:
>
> > F:\Apache\Lucene\AddOns\Luke\v0.5>java -fullversion
> > java full version "1.3.1_10-b03"
> >
> > F:\Lucene\AddOns\Luke\v0.5>
>
> I never tested it with anything below 1.4 ...
>
> -- 
> Best regards,
> Andrzej Bialecki
>
> -------------------------------------------------
> Software Architect, System Integration Specialist
> CEN/ISSS EC Workshop, ECIMF project chair
> EU FP6 E-Commerce Expert/Evaluator
> -------------------------------------------------
> FreeBSD developer (http://www.freebsd.org)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Underscore character and case issue

Posted by Andrzej Bialecki <ab...@getopt.org>.
Robert Brown wrote:

> F:\Apache\Lucene\AddOns\Luke\v0.5>java -fullversion
> java full version "1.3.1_10-b03"
> 
> F:\Lucene\AddOns\Luke\v0.5>

I never tested it with anything below 1.4 ...

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Underscore character and case issue

Posted by Robert Brown <rb...@redbaritone.com>.
>> I traverse a series of files under a parent directory (similar to the 
>> demo sample) and store the filename in a Document Keyword field called 
>> 'Filename'.  I am using the StandardAnalyzer for both building the 
>> index and searching the index.
>
> ... and here lies your problem. StandardAnalyzer lowercases the tokens, 
> and strips most of the non-letters from tokens. I suggest using Luke 
> (http://www.getopt.org/luke) to look inside your index, and see the 
> terms as they ended up in the index, and to try out some other analyzers 
> to see which is the most appropriate in your case.

I looked at this at the beginning and could not get it to run initially.

Is this a SDK version problem perhaps?

F:\Apache\Lucene\AddOns\Luke\v0.5>java -jar lukeall.jar
Exception in thread "main" java.lang.NoSuchMethodError
         at org.getopt.luke.ClassFinder.findClassesThatExtend(Unknown 
Source)
         at 
org.getopt.luke.ClassFinder.getInstantiableSubclasses(Unknown Source)
         at org.getopt.luke.Luke.<init>(Unknown Source)
         at org.getopt.luke.Luke.main(Unknown Source)

F:\Apache\Lucene\AddOns\Luke\v0.5>java -classpath luke.jar;lucene.jar 
org.getopt.luke.Luke
Exception in thread "main" java.lang.NoSuchMethodError
         at org.getopt.luke.ClassFinder.findClassesThatExtend(Unknown 
Source)
         at 
org.getopt.luke.ClassFinder.getInstantiableSubclasses(Unknown Source)
         at org.getopt.luke.Luke.<init>(Unknown Source)
         at org.getopt.luke.Luke.main(Unknown Source)

F:\Apache\Lucene\AddOns\Luke\v0.5>java -fullversion
java full version "1.3.1_10-b03"

F:\Lucene\AddOns\Luke\v0.5>

				R


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Underscore character and case issue

Posted by Andrzej Bialecki <ab...@getopt.org>.
Robert Brown wrote:
> I traverse a series of files under a parent directory (similar to the 
> demo sample) and store the filename in a Document Keyword field called 
> 'Filename'.  I am using the StandardAnalyzer for both building the index 
> and searching the index.

... and here lies your problem. StandardAnalyzer lowercases the tokens, 
and strips most of the non-letters from tokens. I suggest using Luke 
(http://www.getopt.org/luke) to look inside your index, and see the 
terms as they ended up in the index, and to try out some other analyzers 
to see which is the most appropriate in your case.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org