You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by LucasMeadows <me...@yahoo.com> on 2010/01/09 01:04:53 UTC

Indexing pages and chapters of a book

I have a large number of text files (books) that I am trying to make
searchable with Lucene 2.3.2.

I would like search results to display the page and chapter in which a match
with the search term occurred.

My question is whether it is possible to add structural data (xml perhaps)
to the files so that they can be indexed in a way that captures the
relationship of the terms to the pages and chapters that contain them.

Many thanks in advance!
-- 
View this message in context: http://old.nabble.com/Indexing-pages-and-chapters-of-a-book-tp27084145p27084145.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing pages and chapters of a book

Posted by Erick Erickson <er...@gmail.com>.
Sure, you can add any data to any document that you want,
probably stored but not indexed in this case. It could even
be a serialized Java object. Or an XML packet or a
stringized map. Or... whatever suits your fancy. If it's not
indexed, only stored it'll make your index larger but have
a negligible impact on search performance.

The trick is getting token offsets to put in your meta data.
You'll have to get the term positions and store them, but
it's do-able.

 HTH
Erick

On Fri, Jan 8, 2010 at 7:04 PM, LucasMeadows <me...@yahoo.com> wrote:

>
> I have a large number of text files (books) that I am trying to make
> searchable with Lucene 2.3.2.
>
> I would like search results to display the page and chapter in which a
> match
> with the search term occurred.
>
> My question is whether it is possible to add structural data (xml perhaps)
> to the files so that they can be indexed in a way that captures the
> relationship of the terms to the pages and chapters that contain them.
>
> Many thanks in advance!
> --
> View this message in context:
> http://old.nabble.com/Indexing-pages-and-chapters-of-a-book-tp27084145p27084145.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>