You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by syedfa <fa...@gmail.com> on 2008/04/07 19:13:00 UTC

Indexing and Searching from within a single Document

Dear Fellow Java/Lucene developers:

I am writing an application where a user is able to search for keywords from
within a single book.  When the user conducts a search, he/she should
receive a set of results that show the sentence/phrase within the book where
the keyword is found.  Unfortunately, all of the examples that I have for
searching using Lucene discuss the concept of searching multiple documents,
instead of within a single document.  I have written an application that
creates an index of this book, but I now want to search it.  In the result
set, I would like the keyword(s) to be highlighted using Lucene's
HighLighter feature.  Once the user clicks on the hit from the result set
list that they are looking for, the application should take them directly
within to that section of the book where that keyword is found.  The book
that I have indexed is in xml format.

My question is, how would I write an application that allows me to search a
single document, and present a set of results to the user that list portions
of text from the book that contains the user's keyword, instead of
presenting a list of document titles where that keyword is found?

Any help would be greatly appreciated.  Thanks to all who reply.

Sincerely;
Fayyaz
-- 
View this message in context: http://www.nabble.com/Indexing-and-Searching-from-within-a-single-Document-tp16537558p16537558.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing and Searching from within a single Document

Posted by Karl Wettin <ka...@gmail.com>.
You want to break down your book in mutiple documents, perhaps one per 
paragraph or so?

I hope this helps.

     karl



syedfa skrev:
> Dear Fellow Java/Lucene developers:
> 
> I am writing an application where a user is able to search for keywords from
> within a single book.  When the user conducts a search, he/she should
> receive a set of results that show the sentence/phrase within the book where
> the keyword is found.  Unfortunately, all of the examples that I have for
> searching using Lucene discuss the concept of searching multiple documents,
> instead of within a single document.  I have written an application that
> creates an index of this book, but I now want to search it.  In the result
> set, I would like the keyword(s) to be highlighted using Lucene's
> HighLighter feature.  Once the user clicks on the hit from the result set
> list that they are looking for, the application should take them directly
> within to that section of the book where that keyword is found.  The book
> that I have indexed is in xml format.
> 
> My question is, how would I write an application that allows me to search a
> single document, and present a set of results to the user that list portions
> of text from the book that contains the user's keyword, instead of
> presenting a list of document titles where that keyword is found?
> 
> Any help would be greatly appreciated.  Thanks to all who reply.
> 
> Sincerely;
> Fayyaz


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene vs. Database indexing (RE: Indexing and Searching from within a single Document)

Posted by Chris Lu <ch...@gmail.com>.
Agree with Nick that Jing's task doesn't really need Lucene if mostly
range search is needed. Database is good for range search.

But for his search on 'Select * from table where Name like
"%mymymy%"'. It's not a scalable solution for database. And using
Lucene makes a lot of sense.

The easiest way is to create a Lucene index, and apply range search on
the index.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request)
got 2.6 Million Euro funding!


On Tue, Apr 8, 2008 at 6:56 AM, Duan, Nick <ND...@mcdonaldbradley.com> wrote:
> I think this should be a new thread since it's a different problem.
>
>  Based on your description, I don't see any compelling reasons for you to
>  use Lucene just for indexing purposes, since you are not indexing text
>  docs as you indicated.  Claiming database of lacking performance is not
>  accurate and objective.  Your search queries have only the name field
>  and the two number fields in the where clause.  So you can perform
>  indexing on those two fields using database's own indexing mechanism,
>  i.e. the "create index ..." statement, if these fields are not already
>  declared as primary keys.  You will see dramatic performance improvement
>  of your search queries.
>
>  ND
>
>  -----Original Message-----
>  From: jing_gao@agilent.com [mailto:jing_gao@agilent.com]
>  Sent: Monday, April 07, 2008 1:19 PM
>  To: java-user@lucene.apache.org
>  Subject: RE: Indexing and Searching from within a single Document
>
>  Hi,
>
>  I have a similar question. Not heard back from anyone yet.
>
>
>  Dear Lucene experts,
>
>  I'm currently evaluating options for our search tool.
>
>  The need is:
>  I have millions of entries in database, each entry is in such format
>  (more or less)
>
>  ID      Name    Description     start (number)  stop(number)
>
>
>  Currently my application uses the database to do search, queries are in
>  the following format:
>
>  Select * from table where Name like "%mymymy%"
>
>  Select * from table where start >5 and stop <50000
>
>
>
>  I would like very much to use Lucene for such search, for the reason:
>  1. database performance is not ideal;
>  2. data is growing to be too big, I want to move to file system,
>  3. Currently everything is on server, user access through a web
>  application. I want to provide rich client tool, in which case I would
>  rather not to bother with database installations on client machine.
>  Database is my last option;
>  4. Lucene sounds very cool, I want to use a different technology than
>  database, which we are very familiar with already.
>
>
>
>  I read the book, played with the demo. My question is:
>  As you see, I'm not indexing or querying out documents, I'm interested
>  in one row of data. If I want to use Lucene, how should I do it? Do I
>  have to store my data as documents? Since I have millions and millions
>  of rows in database, if I store each row as a document, it'd be millions
>  of documents.
>
>
>  What do you suggest?
>
>  Thank you!
>  Jing
>
>  -----Original Message-----
>  From: syedfa [mailto:fayyazuddin@gmail.com]
>  Sent: Monday, April 07, 2008 10:13 AM
>  To: java-user@lucene.apache.org
>  Subject: Indexing and Searching from within a single Document
>
>
>  Dear Fellow Java/Lucene developers:
>
>  I am writing an application where a user is able to search for keywords
>  from
>  within a single book.  When the user conducts a search, he/she should
>  receive a set of results that show the sentence/phrase within the book
>  where
>  the keyword is found.  Unfortunately, all of the examples that I have
>  for
>  searching using Lucene discuss the concept of searching multiple
>  documents,
>  instead of within a single document.  I have written an application that
>  creates an index of this book, but I now want to search it.  In the
>  result
>  set, I would like the keyword(s) to be highlighted using Lucene's
>  HighLighter feature.  Once the user clicks on the hit from the result
>  set
>  list that they are looking for, the application should take them
>  directly
>  within to that section of the book where that keyword is found.  The
>  book
>  that I have indexed is in xml format.
>
>  My question is, how would I write an application that allows me to
>  search a
>  single document, and present a set of results to the user that list
>  portions
>  of text from the book that contains the user's keyword, instead of
>  presenting a list of document titles where that keyword is found?
>
>  Any help would be greatly appreciated.  Thanks to all who reply.
>
>  Sincerely;
>  Fayyaz
>  --
>  View this message in context:
>  http://www.nabble.com/Indexing-and-Searching-from-within-a-single-Docume
>  nt-tp16537558p16537558.html
>  Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Lucene vs. Database indexing (RE: Indexing and Searching from within a single Document)

Posted by "Duan, Nick" <ND...@mcdonaldbradley.com>.
I think this should be a new thread since it's a different problem.

Based on your description, I don't see any compelling reasons for you to
use Lucene just for indexing purposes, since you are not indexing text
docs as you indicated.  Claiming database of lacking performance is not
accurate and objective.  Your search queries have only the name field
and the two number fields in the where clause.  So you can perform
indexing on those two fields using database's own indexing mechanism,
i.e. the "create index ..." statement, if these fields are not already
declared as primary keys.  You will see dramatic performance improvement
of your search queries.

ND 

-----Original Message-----
From: jing_gao@agilent.com [mailto:jing_gao@agilent.com] 
Sent: Monday, April 07, 2008 1:19 PM
To: java-user@lucene.apache.org
Subject: RE: Indexing and Searching from within a single Document

Hi,

I have a similar question. Not heard back from anyone yet.


Dear Lucene experts,

I'm currently evaluating options for our search tool.

The need is:
I have millions of entries in database, each entry is in such format
(more or less)

ID	Name	Description	start (number)	stop(number)


Currently my application uses the database to do search, queries are in
the following format:

Select * from table where Name like "%mymymy%"

Select * from table where start >5 and stop <50000



I would like very much to use Lucene for such search, for the reason:
1. database performance is not ideal;
2. data is growing to be too big, I want to move to file system,
3. Currently everything is on server, user access through a web
application. I want to provide rich client tool, in which case I would
rather not to bother with database installations on client machine.
Database is my last option;
4. Lucene sounds very cool, I want to use a different technology than
database, which we are very familiar with already.



I read the book, played with the demo. My question is:
As you see, I'm not indexing or querying out documents, I'm interested
in one row of data. If I want to use Lucene, how should I do it? Do I
have to store my data as documents? Since I have millions and millions
of rows in database, if I store each row as a document, it'd be millions
of documents.


What do you suggest?

Thank you!
Jing

-----Original Message-----
From: syedfa [mailto:fayyazuddin@gmail.com] 
Sent: Monday, April 07, 2008 10:13 AM
To: java-user@lucene.apache.org
Subject: Indexing and Searching from within a single Document


Dear Fellow Java/Lucene developers:

I am writing an application where a user is able to search for keywords
from
within a single book.  When the user conducts a search, he/she should
receive a set of results that show the sentence/phrase within the book
where
the keyword is found.  Unfortunately, all of the examples that I have
for
searching using Lucene discuss the concept of searching multiple
documents,
instead of within a single document.  I have written an application that
creates an index of this book, but I now want to search it.  In the
result
set, I would like the keyword(s) to be highlighted using Lucene's
HighLighter feature.  Once the user clicks on the hit from the result
set
list that they are looking for, the application should take them
directly
within to that section of the book where that keyword is found.  The
book
that I have indexed is in xml format.

My question is, how would I write an application that allows me to
search a
single document, and present a set of results to the user that list
portions
of text from the book that contains the user's keyword, instead of
presenting a list of document titles where that keyword is found?

Any help would be greatly appreciated.  Thanks to all who reply.

Sincerely;
Fayyaz
-- 
View this message in context:
http://www.nabble.com/Indexing-and-Searching-from-within-a-single-Docume
nt-tp16537558p16537558.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Indexing and Searching from within a single Document

Posted by Dino Korah <dc...@gmail.com>.
Lucene is a library to index data.
Its up to you to drive it the way you want.
Think about the search result, how would your user like to see the
information that the search page has brought up. Do they want to know page
numbers, or is it section number, or it could be even sentence number.
Depending on that breakup and stick the positional information like chapter,
paragraph, section number, sentence number etc along with the fragment of
the book and make a document.

Hope that helped.! 

-----Original Message-----
From: Mathieu Lecarme [mailto:mathieu@garambrogne.net] 
Sent: 08 April 2008 09:00
To: java-user@lucene.apache.org
Subject: Re: Indexing and Searching from within a single Document

jing_gao@agilent.com a écrit :
> The need is:
> I have millions of entries in database, each entry is in such format 
> (more or less)
>
> ID	Name	Description	start (number)	stop(number)
>
>
> Currently my application uses the database to do search, queries are in
the following format:
>
> Select * from table where Name like "%mymymy%"
>
> Select * from table where start >5 and stop <50000
>
>
>
> I would like very much to use Lucene for such search, for the reason:
> 1. database performance is not ideal;
> 2. data is growing to be too big, I want to move to file system, 3. 
> Currently everything is on server, user access through a web 
> application. I want to provide rich client tool, in which case I would
rather not to bother with database installations on client machine. Database
is my last option; 4. Lucene sounds very cool, I want to use a different
technology than database, which we are very familiar with already.
>
>
>
> I read the book, played with the demo. My question is:
> As you see, I'm not indexing or querying out documents, I'm interested in
one row of data. If I want to use Lucene, how should I do it? Do I have to
store my data as documents? Since I have millions and millions of rows in
database, if I store each row as a document, it'd be millions of documents.
>   
Each line is a Document, and row are Field. Field like Description will be
stored (and even compress).

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing and Searching from within a single Document

Posted by Mathieu Lecarme <ma...@garambrogne.net>.
jing_gao@agilent.com a écrit :
> The need is:
> I have millions of entries in database, each entry is in such format (more or less)
>
> ID	Name	Description	start (number)	stop(number)
>
>
> Currently my application uses the database to do search, queries are in the following format:
>
> Select * from table where Name like "%mymymy%"
>
> Select * from table where start >5 and stop <50000
>
>
>
> I would like very much to use Lucene for such search, for the reason:
> 1. database performance is not ideal;
> 2. data is growing to be too big, I want to move to file system,
> 3. Currently everything is on server, user access through a web application. I want to provide rich client tool, in which case I would rather not to bother with database installations on client machine. Database is my last option;
> 4. Lucene sounds very cool, I want to use a different technology than database, which we are very familiar with already.
>
>
>
> I read the book, played with the demo. My question is:
> As you see, I'm not indexing or querying out documents, I'm interested in one row of data. If I want to use Lucene, how should I do it? Do I have to store my data as documents? Since I have millions and millions of rows in database, if I store each row as a document, it'd be millions of documents.
>   
Each line is a Document, and row are Field. Field like Description will 
be stored (and even compress).

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Indexing and Searching from within a single Document

Posted by ji...@agilent.com.
Hi,

I have a similar question. Not heard back from anyone yet.


Dear Lucene experts,

I'm currently evaluating options for our search tool.

The need is:
I have millions of entries in database, each entry is in such format (more or less)

ID	Name	Description	start (number)	stop(number)


Currently my application uses the database to do search, queries are in the following format:

Select * from table where Name like "%mymymy%"

Select * from table where start >5 and stop <50000



I would like very much to use Lucene for such search, for the reason:
1. database performance is not ideal;
2. data is growing to be too big, I want to move to file system,
3. Currently everything is on server, user access through a web application. I want to provide rich client tool, in which case I would rather not to bother with database installations on client machine. Database is my last option;
4. Lucene sounds very cool, I want to use a different technology than database, which we are very familiar with already.



I read the book, played with the demo. My question is:
As you see, I'm not indexing or querying out documents, I'm interested in one row of data. If I want to use Lucene, how should I do it? Do I have to store my data as documents? Since I have millions and millions of rows in database, if I store each row as a document, it'd be millions of documents.


What do you suggest?

Thank you!
Jing

-----Original Message-----
From: syedfa [mailto:fayyazuddin@gmail.com] 
Sent: Monday, April 07, 2008 10:13 AM
To: java-user@lucene.apache.org
Subject: Indexing and Searching from within a single Document


Dear Fellow Java/Lucene developers:

I am writing an application where a user is able to search for keywords from
within a single book.  When the user conducts a search, he/she should
receive a set of results that show the sentence/phrase within the book where
the keyword is found.  Unfortunately, all of the examples that I have for
searching using Lucene discuss the concept of searching multiple documents,
instead of within a single document.  I have written an application that
creates an index of this book, but I now want to search it.  In the result
set, I would like the keyword(s) to be highlighted using Lucene's
HighLighter feature.  Once the user clicks on the hit from the result set
list that they are looking for, the application should take them directly
within to that section of the book where that keyword is found.  The book
that I have indexed is in xml format.

My question is, how would I write an application that allows me to search a
single document, and present a set of results to the user that list portions
of text from the book that contains the user's keyword, instead of
presenting a list of document titles where that keyword is found?

Any help would be greatly appreciated.  Thanks to all who reply.

Sincerely;
Fayyaz
-- 
View this message in context: http://www.nabble.com/Indexing-and-Searching-from-within-a-single-Document-tp16537558p16537558.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Indexing and Searching from within a single Document

Posted by DKorah <dc...@gmail.com>.
Lucene is a library to index data.
Its up to you to drive it the way you want.
Think about the search result, how would your user like to see the
information that the search page has brought up. Do they want to know page
numbers, or is it section number, or it could be even sentence number.
Depending on that breakup and stick the positional information like chapter,
paragraph, section number, sentence number etc along with the fragment of
the book and make a document. Dont forget to add a metadata to identify the
book.

Hope that helped.


jing_gao wrote:
> 
> Hi,
> 
> Some indexing tools give configurable options, you can use separators in a
> single documents (such as "//", "%%%"), and indexing engine would treat
> each block as a separate document.
> Does Lucene have this type of functionalities?
> 
> Thanks!
> Jing
> 
> -----Original Message-----
> From: syedfa [mailto:fayyazuddin@gmail.com] 
> Sent: Monday, April 07, 2008 10:13 AM
> To: java-user@lucene.apache.org
> Subject: Indexing and Searching from within a single Document
> 
> 
> Dear Fellow Java/Lucene developers:
> 
> I am writing an application where a user is able to search for keywords
> from
> within a single book.  When the user conducts a search, he/she should
> receive a set of results that show the sentence/phrase within the book
> where
> the keyword is found.  Unfortunately, all of the examples that I have for
> searching using Lucene discuss the concept of searching multiple
> documents,
> instead of within a single document.  I have written an application that
> creates an index of this book, but I now want to search it.  In the result
> set, I would like the keyword(s) to be highlighted using Lucene's
> HighLighter feature.  Once the user clicks on the hit from the result set
> list that they are looking for, the application should take them directly
> within to that section of the book where that keyword is found.  The book
> that I have indexed is in xml format.
> 
> My question is, how would I write an application that allows me to search
> a
> single document, and present a set of results to the user that list
> portions
> of text from the book that contains the user's keyword, instead of
> presenting a list of document titles where that keyword is found?
> 
> Any help would be greatly appreciated.  Thanks to all who reply.
> 
> Sincerely;
> Fayyaz
> -- 
> View this message in context:
> http://www.nabble.com/Indexing-and-Searching-from-within-a-single-Document-tp16537558p16537558.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Indexing-and-Searching-from-within-a-single-Document-tp16537558p16558732.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Indexing and Searching from within a single Document

Posted by ji...@agilent.com.
Hi,

Some indexing tools give configurable options, you can use separators in a single documents (such as "//", "%%%"), and indexing engine would treat each block as a separate document.
Does Lucene have this type of functionalities?

Thanks!
Jing

-----Original Message-----
From: syedfa [mailto:fayyazuddin@gmail.com] 
Sent: Monday, April 07, 2008 10:13 AM
To: java-user@lucene.apache.org
Subject: Indexing and Searching from within a single Document


Dear Fellow Java/Lucene developers:

I am writing an application where a user is able to search for keywords from
within a single book.  When the user conducts a search, he/she should
receive a set of results that show the sentence/phrase within the book where
the keyword is found.  Unfortunately, all of the examples that I have for
searching using Lucene discuss the concept of searching multiple documents,
instead of within a single document.  I have written an application that
creates an index of this book, but I now want to search it.  In the result
set, I would like the keyword(s) to be highlighted using Lucene's
HighLighter feature.  Once the user clicks on the hit from the result set
list that they are looking for, the application should take them directly
within to that section of the book where that keyword is found.  The book
that I have indexed is in xml format.

My question is, how would I write an application that allows me to search a
single document, and present a set of results to the user that list portions
of text from the book that contains the user's keyword, instead of
presenting a list of document titles where that keyword is found?

Any help would be greatly appreciated.  Thanks to all who reply.

Sincerely;
Fayyaz
-- 
View this message in context: http://www.nabble.com/Indexing-and-Searching-from-within-a-single-Document-tp16537558p16537558.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org