You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by lu...@nitwit.de on 2004/03/05 12:15:40 UTC

Storing numbers

Hi!

I want to store numbers (id) in my index:

	long id = 1069421083284;
	doc.add(Field.UnStored("in", String.valueOf(id)));	

But searching for "id:1069421083284" doesn't return any hits.

Well, did I misunderstand something? UnStored is the number is stored but not 
index (analyzed), isn't it? Anyway, Field.Text doesn't work either.

TIA
Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by lu...@nitwit.de.
On Friday 05 March 2004 12:27, Morus Walter wrote:
> > 	doc.add(Field.UnStored("in", String.valueOf(id)));
> >
> > But searching for "id:1069421083284" doesn't return any hits.
>
> If your field is named 'in' you shouldn't search in 'id'. Right?
>
> Well, indexing and analyzing are different things.
> UnStored means, the number is not stored (as the name says) but indexed.
> And IIRC it's analyzed before indexing. Shouldn't make a difference for
> a single number.
>
> What I'd use in this case is an unstored keyword (given that you really
> don't want to have the id returned from lucene, which is the consequence of
> not storing).

Sorry, typo :-)

I do have severeal docs in index and each doc does have an id. And I just want 
to find a particular doc by its id. 

	doc.add(Field.UnIndexed("id", String.valueOf(id)));

doesn't work either. And as I mentioned not even Field.Text does work....

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Morus Walter <mo...@tanto.de>.
lucene@nitwit.de writes:
> Hi!
> 
> I want to store numbers (id) in my index:
> 
> 	long id = 1069421083284;
> 	doc.add(Field.UnStored("in", String.valueOf(id)));	
> 
> But searching for "id:1069421083284" doesn't return any hits.

If your field is named 'in' you shouldn't search in 'id'. Right?

> 
> Well, did I misunderstand something? UnStored is the number is stored but not 
> index (analyzed), isn't it? Anyway, Field.Text doesn't work either.
> 
Well, indexing and analyzing are different things.
UnStored means, the number is not stored (as the name says) but indexed.
And IIRC it's analyzed before indexing. Shouldn't make a difference for
a single number.

What I'd use in this case is an unstored keyword (given that you really don't
want to have the id returned from lucene, which is the consequence of
not storing).
I'm not sure if there's a method to create such a field, but you can do it
by setting the flags directly.

HTH
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Storing numbers

Posted by Olga Dadasheva <ol...@harvard.edu>.
Try this link and scroll to top:
http://www.sys-con.com/story/?storyid=37296&DE=1#RES

Thank you, Tim - excelent article.



-----Original Message-----
From: lucene@nitwit.de [mailto:lucene@nitwit.de]
Sent: Wednesday, March 10, 2004 10:23 AM
To: Lucene Users List
Subject: Re: Storing numbers


On Tuesday 09 March 2004 20:51, Timothy Stone wrote:
> Michael Giles wrote:
> > Tim,
> >
> > Looks like you can only access it with a subscription.  :(  Sounds good,
> > though.
> >
> Really? I don't have a subscription. Got to it via the archives actually
> now that I think about it:
>
> Try Volume 7, Issue 12.

I also need an subscription for:
http://www.sys-con.com/story/search.cfm?pub=1&ss=lucene

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by lu...@nitwit.de.
On Tuesday 09 March 2004 20:51, Timothy Stone wrote:
> Michael Giles wrote:
> > Tim,
> >
> > Looks like you can only access it with a subscription.  :(  Sounds good,
> > though.
> >
> Really? I don't have a subscription. Got to it via the archives actually
> now that I think about it:
>
> Try Volume 7, Issue 12.

I also need an subscription for: 
http://www.sys-con.com/story/search.cfm?pub=1&ss=lucene

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Timothy Stone <ci...@petmystone.com>.
Michael Giles wrote:

> Tim,
> 
> Looks like you can only access it with a subscription.  :(  Sounds good, 
> though.
> 
> -Mike

Really? I don't have a subscription. Got to it via the archives actually 
now that I think about it:

Try Volume 7, Issue 12.

Sorry about that bad URL. But Sys-Con must set a cookie (yep) following 
the sub splash. Try the link again. I just deleted my cookie, got a 
sub-splash and then tried the archive again and it worked.

Odd, but it works. Get it before sys-con is on to us. :)

Tim

> 
> At 02:39 PM 3/9/2004, you wrote:
> 
>> lucene@nitwit.de wrote:
>>
>>> Hi!
>>> I want to store numbers (id) in my index:
>>>         long id = 1069421083284;
>>>         doc.add(Field.UnStored("in", String.valueOf(id)));
>>> But searching for "id:1069421083284" doesn't return any hits.
>>> Well, did I misunderstand something? UnStored is the number is stored 
>>> but not index (analyzed), isn't it? Anyway, Field.Text doesn't work 
>>> either.
>>> TIA
>>> Timo
>>
>>
>> Craig Walls wrote an excellent article in JDJ at the end of 2002 
>> regarding Lucene (not shown in any of the resources BTW). He documents 
>> using Lucene along side a database as well as provides two classes 
>> (and others unrelated) that extend the functionality of the 
>> StopAnalyzer to include numbers and or alpha numerics.
>>
>> Check out the article at: 
>> http://www.sys-con.com/story/print.cfm?storyid=37296
>>
>> HTH,
>> Tim
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
> 
> ________________________________________________________________________
> Save and share anything you find online - Furl @ http://www.furl.net 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Michael Giles <mg...@visionstudio.com>.
Tim,

Looks like you can only access it with a subscription.  :(  Sounds good, 
though.

-Mike

At 02:39 PM 3/9/2004, you wrote:
>lucene@nitwit.de wrote:
>
>>Hi!
>>I want to store numbers (id) in my index:
>>         long id = 1069421083284;
>>         doc.add(Field.UnStored("in", String.valueOf(id)));
>>But searching for "id:1069421083284" doesn't return any hits.
>>Well, did I misunderstand something? UnStored is the number is stored but 
>>not index (analyzed), isn't it? Anyway, Field.Text doesn't work either.
>>TIA
>>Timo
>
>Craig Walls wrote an excellent article in JDJ at the end of 2002 regarding 
>Lucene (not shown in any of the resources BTW). He documents using Lucene 
>along side a database as well as provides two classes (and others 
>unrelated) that extend the functionality of the StopAnalyzer to include 
>numbers and or alpha numerics.
>
>Check out the article at: http://www.sys-con.com/story/print.cfm?storyid=37296
>
>HTH,
>Tim
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>

________________________________________________________________________
Save and share anything you find online - Furl @ http://www.furl.net  

Re: Storing numbers

Posted by Timothy Stone <ci...@petmystone.com>.
lucene@nitwit.de wrote:

> Hi!
> 
> I want to store numbers (id) in my index:
> 
> 	long id = 1069421083284;
> 	doc.add(Field.UnStored("in", String.valueOf(id)));	
> 
> But searching for "id:1069421083284" doesn't return any hits.
> 
> Well, did I misunderstand something? UnStored is the number is stored but not 
> index (analyzed), isn't it? Anyway, Field.Text doesn't work either.
> 
> TIA
> Timo

Craig Walls wrote an excellent article in JDJ at the end of 2002 
regarding Lucene (not shown in any of the resources BTW). He documents 
using Lucene along side a database as well as provides two classes (and 
others unrelated) that extend the functionality of the StopAnalyzer to 
include numbers and or alpha numerics.

Check out the article at: 
http://www.sys-con.com/story/print.cfm?storyid=37296

HTH,
Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Claude Devarenne <cl...@library.ucsf.edu>.
Hi,

I thought it is the StopAnalyzer that weeds out numbers.  StandardAnalyzer 
keeps in them in I believe as I ran into the same issue using the jsp demo 
code and just replaced StopAnalyzer  with StandardAnalyzer  and numbers 
were searchable.  This assumes you index with a StandardAnalyzer  though.

Claude

At 06:42 AM 3/5/2004 -0800, Otis Gospodnetic wrote:
>Either store it as a Keyword Field, which does not get Analyzed, or use
>that per-field Analyzer wrapper class.
>Your problem is most likely that you are using something like
>StandardAnalyzer that, I believe, throws out numbers from its input
>before indexing (i.e. your numbers are not getting indexed in the first
>place).  Try with Field.Keyword.
>
>Otis
>
>--- lucene@nitwit.de wrote:
> > Hi!
> >
> > I want to store numbers (id) in my index:
> >
> >       long id = 1069421083284;
> >       doc.add(Field.UnStored("in", String.valueOf(id)));
> >
> > But searching for "id:1069421083284" doesn't return any hits.
> >
> > Well, did I misunderstand something? UnStored is the number is stored
> > but not
> > index (analyzed), isn't it? Anyway, Field.Text doesn't work either.
> >
> > TIA
> > Timo
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 7, 2004, at 6:27 AM, lucene@nitwit.de wrote:
> On Fri, 5 Mar 2004 19:18:04 -0500, Erik Hatcher 
> <er...@ehatchersolutions.com>
> wrote:
>
>> Thanks for the idea for a good example for the upcoming Lucene in 
>> Action
>> book... it's been added!
>
> Thanks for mentioning me in the book ;)

Well, I actually already had a comment in the book about why you'd 
override getRangeQuery, and it said this:

   * handle number ranges by padding to match how numbers were indexed

You did give me the incentive to flesh this out into an example.

I also created a variant of this to parse range queries like this 
field:[1/1/04 TO 12/31/04] into YYYYMMDD syntax so it becomes 
field:[20040101 TO 20041231].  This is very handy when dealing with 
dates in a typically more sensible YYYYMMDD format and allowing users 
to deal with them naturally also.

> What about boolean fields? It's certainly not a good idea to use 
> "true" or
> "false" strings...

What about them?  It all depends on how you want users to be able to 
query based on that flag.  Do you want them to say field:true?  
field:on?  field:yes?  How you translate things in QueryParser is up to 
you - and this may of course have some impact on how you index.  You 
could use "0" and "1" instead, and do the translation in a QueryParser 
subclass if you like.

> BTW, isn't it slow to treat everything as strings?

Ummm, yeah.... Lucene is real slow!  :)

You tell us.... is it slow with your data and environment?  If so, give 
us some more details on the scenario.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by lu...@nitwit.de.
On Fri, 5 Mar 2004 19:18:04 -0500, Erik Hatcher <er...@ehatchersolutions.com> 
wrote:

> Thanks for the idea for a good example for the upcoming Lucene in Action  
> book... it's been added!

Thanks for mentioning me in the book ;)

What about boolean fields? It's certainly not a good idea to use "true" or 
"false" strings...

BTW, isn't it slow to treat everything as strings?

Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Doug Cutting <cu...@apache.org>.
Erik Hatcher wrote:
>   private static final DecimalFormat formatter =
>       new DecimalFormat("00000"); // make this as wide as you need

For ints, ten digits is probably safest.  Since Lucene uses prefix 
compression on the term dictionary, you don't pay a penalty at search 
time for long shared prefixes.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 5, 2004, at 4:16 PM, Erik Hatcher wrote:
> Another quite cool option is to subclass QueryParser, and override 
> getRangeQuery.  Do the padding there.  This will allow users to type 
> in normal looking numbers, and the padding happens automatically.  
> You'll need to be sure that numbers padded during indexing matches 
> what getRangeQuery does (oh, say through a common function :).

Ok, here is a solution to storing integers and being able to use 
QueryParser cleanly.  First a utility to pad the numbers:

public class NumberUtils {
   private static final DecimalFormat formatter =
       new DecimalFormat("00000"); // make this as wide as you need

   public static String pad(int n) {
     return formatter.format(n);
   }
}

Index the relevant fields using the pad function:

       doc.add(Field.Keyword("id", NumberUtils.pad(i)));

Create a custom QueryParser subclass:

public class CustomQueryParser extends QueryParser {
   public CustomQueryParser(String field, Analyzer analyzer) {
     super(field, analyzer);
   }

   protected Query getRangeQuery(String field, Analyzer analyzer,
                                 String part1, String part2,
                                 boolean inclusive)
       throws ParseException {
     if ("id".equals(field)) {
       try {
         int num1 = Integer.parseInt(part1);
         int num2 = Integer.parseInt(part2);
         return new RangeQuery(new Term(field, NumberUtils.pad(num1)),
                               new Term(field, NumberUtils.pad(num2)),
                               inclusive);
       } catch (NumberFormatException e) {
         throw new ParseException(e.getMessage());
       }
     }

     return super.getRangeQuery(field, analyzer, part1, part2,
         inclusive);
   }
}

Only the "id" field is treated special, but your logic may vary.

Then use the custom QueryParser:

     CustomQueryParser parser =
         new CustomQueryParser("field", analyzer);

     Query query = parser.parse("id:[37 TO 346]");

     assertEquals("padded", "id:[00037 TO 00346]",
                            query.toString("field"));

Thanks for the idea for a good example for the upcoming Lucene in 
Action book... it's been added!

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Another quite cool option is to subclass QueryParser, and override 
getRangeQuery.  Do the padding there.  This will allow users to type in 
normal looking numbers, and the padding happens automatically.  You'll 
need to be sure that numbers padded during indexing matches what 
getRangeQuery does (oh, say through a common function :).

In fact, this is a great example for LIA.  I'll add it!  And I'll post 
the code back here in a day or so after I write it.

	Erik


On Mar 5, 2004, at 12:34 PM, lucene@nitwit.de wrote:

> On Friday 05 March 2004 18:01, Erik Hatcher wrote:
>> "000000000001" for example.  Be sure all numbers have the same width
>> and zero padded.
>
> And what about a range like 100 TO 1000?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
On Fri, 5 Mar 2004 lucene@nitwit.de wrote:

> On Friday 05 March 2004 18:01, Erik Hatcher wrote:
> > "000000000001" for example.  Be sure all numbers have the same width
> > and zero padded.
>
> And what about a range like 100 TO 1000?

You mean 0100 To 1000 or 000000000000100 to 000000000001000 ;)

sv


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by lu...@nitwit.de.
On Friday 05 March 2004 18:01, Erik Hatcher wrote:
> "000000000001" for example.  Be sure all numbers have the same width
> and zero padded.

And what about a range like 100 TO 1000?

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
Weird idea, how about transforming your long into a Date and using a
DateFilter to use a ranged query?

sv

On Fri, 5 Mar 2004, Erik Hatcher wrote:

> Terms in Lucene are text.  If you want to deal with number ranges, you
> need to pad them.
>
> "000000000001" for example.  Be sure all numbers have the same width
> and zero padded.
>
> Lucene use lexicographical ordering, so you must be sure things collate
> in this way.
>
> 	Erik
>
> On Mar 5, 2004, at 11:46 AM, lucene@nitwit.de wrote:
>
> > On Friday 05 March 2004 15:42, Otis Gospodnetic wrote:
> >> Try with Field.Keyword.
> >
> > Ok, works.
> >
> > Another problem: Range searches don't work.
> >
> > 	"id:(1 TO 1069421083284)"
> >
> > does return only 1 hit - 1069421083284.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Terms in Lucene are text.  If you want to deal with number ranges, you 
need to pad them.

"000000000001" for example.  Be sure all numbers have the same width 
and zero padded.

Lucene use lexicographical ordering, so you must be sure things collate 
in this way.

	Erik

On Mar 5, 2004, at 11:46 AM, lucene@nitwit.de wrote:

> On Friday 05 March 2004 15:42, Otis Gospodnetic wrote:
>> Try with Field.Keyword.
>
> Ok, works.
>
> Another problem: Range searches don't work.
>
> 	"id:(1 TO 1069421083284)"
>
> does return only 1 hit - 1069421083284.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by lu...@nitwit.de.
On Friday 05 March 2004 15:42, Otis Gospodnetic wrote:
> Try with Field.Keyword.

Ok, works.

Another problem: Range searches don't work.

	"id:(1 TO 1069421083284)"

does return only 1 hit - 1069421083284.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Storing numbers

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Either store it as a Keyword Field, which does not get Analyzed, or use
that per-field Analyzer wrapper class.
Your problem is most likely that you are using something like
StandardAnalyzer that, I believe, throws out numbers from its input
before indexing (i.e. your numbers are not getting indexed in the first
place).  Try with Field.Keyword.

Otis

--- lucene@nitwit.de wrote:
> Hi!
> 
> I want to store numbers (id) in my index:
> 
> 	long id = 1069421083284;
> 	doc.add(Field.UnStored("in", String.valueOf(id)));	
> 
> But searching for "id:1069421083284" doesn't return any hits.
> 
> Well, did I misunderstand something? UnStored is the number is stored
> but not 
> index (analyzed), isn't it? Anyway, Field.Text doesn't work either.
> 
> TIA
> Timo
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org