You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rohit Lodha <ro...@gmail.com> on 2005/08/22 08:27:19 UTC

Hierarchical Documents

Hi All,

Currently, Documents cannot contain other documents. I have a Graph of 
Objects (Documents) to search in.

I could flatten them and search but...

Is there any nice way to do it?

Rohit

Re: Hierarchical Documents

Posted by "orens (sent by Nabble.com)" <li...@nabble.com>.
Hi All, 

In the article they mention source code for their work, but I couldn't find it. Did anyone implement the ideas mentiond in the article?

Thanks,
Orens
--
Sent from the Lucene - Java Users forum at Nabble.com:
http://www.nabble.com/Hierarchical-Documents-t242604.html#a728669

Re: Hierarchical Documents

Posted by Dan Funk <fu...@BATTELLE.ORG>.
People indexing XML documents tend to deal with the same kind of problem,
 there is an excellent article at the URL below showing how they handled 
some fairly
complex hierarchical queries.

http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html

Rohit Lodha wrote:

>Hi All,
>
>Currently, Documents cannot contain other documents. I have a Graph of 
>Objects (Documents) to search in.
>
>I could flatten them and search but...
>
>Is there any nice way to do it?
>
>Rohit
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Hierarchical Documents

Posted by Pa...@saaconsultants.com.




I have been struggling with this sort of problem for some time and still
haven't got an ideal solution.

Initially I was going to go for the approach Erik has suggested for similar
reasons - it allowed me to search within categories and within sub
categories of those categories very simply. Unfortunately in my system the
categorisation is dynamic. If a user wants to rename a category then all
the documents within those categories or any of the subcategories would
have to be reindexed. Similarly, if the user moves a category from one
place to another then again this can result in lots of reindexing.

The reindexing problem can be avoided by using identifiers for the
categories. I simply index a list of the ids for the categories the
document belongs to when indexing the document. This allows me to
dynamically change and restructure the category hierarchy without impacting
the index. The downside is that for searching I need to map the category
path to the category id. This I do using a database look-up prior to
searching Lucene. To carry out sub category searching I will be looking up
all the sub categories in the database using a wildcard and then probably
building either Boolean expression (for small numbers of subcategories) or
a Lucene filter combined with the ConstantScoreQuery (from the sandbox) for
large number of subcategories - the user having to understand that
searching lots of subcategories may take some time. In my case the
subcategory searching is not the norm so this isn't too much of a problem.

There's some useful articles here
http://www.intelligententerprise.com/001020/celko1_1.jhtml?_requestid=36367
and here http://www.dbazine.com/oracle/or-articles/tropashko4 that discuss
tree structures in SQL that I found useful.

Regards

PaulI.

Erik Hatcher <er...@ehatchersolutions.com> wrote on 23/08/2005 10:34:16:

>
> On Aug 22, 2005, at 2:27 AM, Rohit Lodha wrote:
> > Currently, Documents cannot contain other documents. I have a Graph of
> > Objects (Documents) to search in.
> >
> > I could flatten them and search but...
> >
> > Is there any nice way to do it?
>
> I have used a technique of encoding a hierarchical path (like "/
> category/subcategory/sub-subcategory") into an indexed untokenized
> field.  This allows for flat or hierarchical queries.  A hierarchical
> PrefixQuery of "/category" returns all documents from "/category"
> downward, for example.
>
> I suspect others have done more interesting techniques for this sort
> of thing - I'd love to hear about them.
>
>      Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Hierarchical Documents

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 22, 2005, at 2:27 AM, Rohit Lodha wrote:
> Currently, Documents cannot contain other documents. I have a Graph of
> Objects (Documents) to search in.
>
> I could flatten them and search but...
>
> Is there any nice way to do it?

I have used a technique of encoding a hierarchical path (like "/ 
category/subcategory/sub-subcategory") into an indexed untokenized  
field.  This allows for flat or hierarchical queries.  A hierarchical  
PrefixQuery of "/category" returns all documents from "/category"  
downward, for example.

I suspect others have done more interesting techniques for this sort  
of thing - I'd love to hear about them.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org