You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Black <bl...@apple.com> on 2004/04/01 21:48:39 UTC

Nested category strategy

Hey All,

I'm trying to figure out the best approach to something.

Each document I index has an array of categories which looks like the 
following example....

/Science/Medicine/Serology/blood gas
/Biology/Fluids/Blood/

etc.

Anyway, there's a couple things I'm trying to deal with.

1. The fact that we have an undefined array size.  I can't just shove 
these into a single field.  I could explode them into multiple fields 
on the fly like category_1, category_2. etc. etc

2. The fact that a search will need to be performed like " category: 
/Science/Medicine/*" would need to return all items within that 
category.

Thanks in advance to anyone who can give me some help here.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Nested category strategy

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 1, 2004, at 2:48 PM, David Black wrote:
> Each document I index has an array of categories which looks like the 
> following example....
>
> /Science/Medicine/Serology/blood gas
> /Biology/Fluids/Blood/
>
> etc.
>
> Anyway, there's a couple things I'm trying to deal with.
>
> 1. The fact that we have an undefined array size.  I can't just shove 
> these into a single field.  I could explode them into multiple fields 
> on the fly like category_1, category_2. etc. etc

Yes, you could use a single field.... and add them individually.

    doc.add(Field.Keyword("category", category1))
    doc.add(Field.Keyword("category", category2))

Or am I missing something about your scenario?

> 2. The fact that a search will need to be performed like " category: 
> /Science/Medicine/*" would need to return all items within that 
> category.

Keep in mind the issues you may encounter using QueryParser with such a 
field selector.  Choose your analyzer(s) carefully!

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Nested category strategy

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
Another possibility is to add all combinations in a single field.

addField("category", "/Science/");
addField("category", "/Science/Medicine");
addField("category", "/Science/Foo");
addField("category", "/Biology");

Your wildcard search should work, and you shouldn't have the problem with
a search "/Science/*".

HTH,
sv

On Thu, 1 Apr 2004, Tate Avery wrote:

>
> Could you put them all into a tab-delimited string and store that as a
> single field, then use a TabTokenizer on the field to search?
>
> And, if you need to, do a .split("\t") on the field value in order to break
> them back up into individual categories.
>
>
>
>
> -----Original Message-----
> From: David Black [mailto:black@apple.com]
> Sent: Thursday, April 01, 2004 2:49 PM
> To: lucene-user@jakarta.apache.org
> Subject: Nested category strategy
>
>
> Hey All,
>
> I'm trying to figure out the best approach to something.
>
> Each document I index has an array of categories which looks like the
> following example....
>
> /Science/Medicine/Serology/blood gas
> /Biology/Fluids/Blood/
>
> etc.
>
> Anyway, there's a couple things I'm trying to deal with.
>
> 1. The fact that we have an undefined array size.  I can't just shove
> these into a single field.  I could explode them into multiple fields
> on the fly like category_1, category_2. etc. etc
>
> 2. The fact that a search will need to be performed like " category:
> /Science/Medicine/*" would need to return all items within that
> category.
>
> Thanks in advance to anyone who can give me some help here.
>
> Thanks
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Nested category strategy

Posted by Tate Avery <ta...@nstein.com>.
Could you put them all into a tab-delimited string and store that as a
single field, then use a TabTokenizer on the field to search?

And, if you need to, do a .split("\t") on the field value in order to break
them back up into individual categories.




-----Original Message-----
From: David Black [mailto:black@apple.com]
Sent: Thursday, April 01, 2004 2:49 PM
To: lucene-user@jakarta.apache.org
Subject: Nested category strategy


Hey All,

I'm trying to figure out the best approach to something.

Each document I index has an array of categories which looks like the
following example....

/Science/Medicine/Serology/blood gas
/Biology/Fluids/Blood/

etc.

Anyway, there's a couple things I'm trying to deal with.

1. The fact that we have an undefined array size.  I can't just shove
these into a single field.  I could explode them into multiple fields
on the fly like category_1, category_2. etc. etc

2. The fact that a search will need to be performed like " category:
/Science/Medicine/*" would need to return all items within that
category.

Thanks in advance to anyone who can give me some help here.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org