You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ilya Zavorin <iz...@caci.com> on 2011/11/23 21:18:27 UTC

Lucene on Android: indexing, searching and highlighting

Hello everyone,

I need to write a Lucene-based search and retrieval app for Android. Unfortunately, I am new to both Android development and Lucene, so I am going up two learning curves at the same time.

My app needs to do the following:
1. I have a collection of docs that I index
2. I have a set of queries I run against the docs/index
3. I need to find all "good" docs where at least one of the queries occurs.
4. I also need to find where in each of the good docs a hit occurs, i.e. I need to "highlight" all the occurring queries in each of the docs. So maybe I need to compute pairs of pointers with each pair showing where a hit starts and ends, or something similar.

I will be using Lucene 3.4.0. It also looks like for #4 I will need to use highlighting capabilities.

My main question is whether I should expect any performance problems at any of the indexing/searching/highlighting steps? Can I use the lucene and highlighting jars (lucene-core-3.4.0.jar and lucene-highlighter-3.4.0.jar) "out of the box"?

Also, is there any sample code that would show how Lucene components should be invoked on Android?

Thank you,

Ilya Zavorin



Design qs: search for multiple terms in document collection

Posted by Ilya Zavorin <iz...@caci.com>.
I am trying to make some high- (and not so high) level design decisions for my app that is supposed to check a collection of documents against a set of terms/queries. Basically, I need to perform a triage of sorts when I would find only those docs in the collection which have occurrences of at least one term from the term list. For those docs, I also need to find where in the document each occurrence is, since I then need to collect a small amount of surrounding text for a more detailed analysis.

Clearly, I will need to index the document collection using indexing classes of Lucene. This is pretty straighforward. 

Then I will need to use the highlighting classes. In some sample cose I found online, a query is first searched for and hits are returned. Then docids are extracted for the hits and query is highlighted. Some questions:

Q1: Does Lucene perform essentially the same searching operation twice, first to find hits, then to highlight? If so, does this mean that if I expect most of the docs in my collection to contain at least one of the search terms, it might be faster for me to skip searching and simply go over all docs, applying highlighting? Then for those docs where no hits occurred I would simply get an empty list of relevant fragments. 

Q2: Is the same scoring mechanism used during search and during highlighting? That is, can I be sure that if I get a hit during search, the corresponding document indeed contains my query that will then be found dyuring highlighting?

Q3: Are there any mechanisms in Lucene that would facilitate merging of highlighting results for two different queries against a single document? 

Q4: I did some small tests of highlighting and noticed that some of the fragments returned for a query contained highlighted text that was quite far from the original query. For instance, I was looking for a 3-word term and it highlighted a sequence of only 2 of these 3 words. How can I control how close highlighted fragments should be to the original query?



Thanks much,

Ilya Zavorin



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene on Android: indexing, searching and highlighting

Posted by Ian Lea <ia...@gmail.com>.
As far as I'm aware recent versions of lucene, including the
highlighter, should work out of the box.

I'd guess that highlighting would be the most resource intensive and
therefore troublesome bit.

I'm not aware of any sample code showing lucene working on Android,
but from my very limited experience of Android development I don't
expect there to be any issues in invoking it.  There are likely to be
some lifecycle complications - you don't want Android killing your app
when it is part way through an index update just because the app lost
focus.

If it was me, outside Android I'd write some self-contained classes
wrapping lucene and doing what is needed, keeping the lucene activity
short and sweet, and the whole thing as small and simple as possible,
test them to destruction, then drop them into Android and see what
happens.


Good luck!

--
Ian.


On Wed, Nov 23, 2011 at 8:18 PM, Ilya Zavorin <iz...@caci.com> wrote:
> Hello everyone,
>
> I need to write a Lucene-based search and retrieval app for Android. Unfortunately, I am new to both Android development and Lucene, so I am going up two learning curves at the same time.
>
> My app needs to do the following:
> 1. I have a collection of docs that I index
> 2. I have a set of queries I run against the docs/index
> 3. I need to find all "good" docs where at least one of the queries occurs.
> 4. I also need to find where in each of the good docs a hit occurs, i.e. I need to "highlight" all the occurring queries in each of the docs. So maybe I need to compute pairs of pointers with each pair showing where a hit starts and ends, or something similar.
>
> I will be using Lucene 3.4.0. It also looks like for #4 I will need to use highlighting capabilities.
>
> My main question is whether I should expect any performance problems at any of the indexing/searching/highlighting steps? Can I use the lucene and highlighting jars (lucene-core-3.4.0.jar and lucene-highlighter-3.4.0.jar) "out of the box"?
>
> Also, is there any sample code that would show how Lucene components should be invoked on Android?
>
> Thank you,
>
> Ilya Zavorin
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org