You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Luis Rodrigo Aguado <lr...@isoco.com> on 2006/11/24 12:31:05 UTC
Hit.getDocument performance
Hi all,
I am having a performance bottleneck that is driving me crazy. Maybe
anyone there has a clue of the source...
I am working with an index of 2400 pdf files. For each of them, I
index the contents, and I store the filename and the creation date.
Nothing else. The resulting index is about 6Mb.
The application generates several queries for each user input, and,
depending on the queries I launch, it may take up to 10 minutes to get
the results!!! It depends on the number of hits, being around 1500 docs
the highest number of hits I have tested. After a profiling session I
have located the Hits.getDocument as the primary source of time (and
memory) waste.
Is this reasonable? Maybe I did something wrong to create the
index? Are there any workarounds that you imagine?
Thanks in advance!!!
Luis.
Re: Hit.getDocument performance
Posted by Luis Rodrigo Aguado <lr...@isoco.com>.
I have just read in the API doc that going through the Hits returned is
not really adviceable. However, I am not developing the final
application, but a middleware that accesses Lucene, so I would not want
to take the decision to cut the number of docs returned, but let the
application do that. Is there any way to bypass this limitation?
Thanks!
Luis Rodrigo Aguado escribió:
> Hi all,
>
> I am having a performance bottleneck that is driving me crazy.
> Maybe anyone there has a clue of the source...
>
> I am working with an index of 2400 pdf files. For each of them, I
> index the contents, and I store the filename and the creation date.
> Nothing else. The resulting index is about 6Mb.
>
> The application generates several queries for each user input, and,
> depending on the queries I launch, it may take up to 10 minutes to get
> the results!!! It depends on the number of hits, being around 1500
> docs the highest number of hits I have tested. After a profiling
> session I have located the Hits.getDocument as the primary source of
> time (and memory) waste.
>
> Is this reasonable? Maybe I did something wrong to create the
> index? Are there any workarounds that you imagine?
>
> Thanks in advance!!!
>
> Luis.
>
>
> ------------------------------------------------------------------------
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.430 / Virus Database: 268.14.14/548 - Release Date: 23/11/2006 15:22
>
--
*Luis Rodrigo Aguado*
Innovation and R&D
Research Manager
lrodrigo(at)isoco.com
#T +34913349777
C/Pedro de Valdivia, 10
28006, Madrid, Spain
* *
*iSOCO** *
intelligent software for the networked economy
www.isoco.com <http://www.isoco.com/>
Este mensaje se dirige exclusivamente a su destinatario y puede contener
información privilegiada o confidencial. Si no es vd. el destinatario
indicado, queda notificado de que la utilización, divulgación y/o copia
sin autorización está prohibida en virtud de la legislación vigente. Si
ha recibido este mensaje por error, le rogamos que nos lo comunique
inmediatamente por esta misma vía y proceda a su destrucción.
This message is intended exclusively for its addressee and may contain
information that is CONFIDENTIAL and protected by professional
privilege. If you are not the intended recipient you are hereby notified
that any dissemination, copy or disclosure of this communication is
strictly prohibited by law. If this message has been received in error,
please immediately notify us via e-mail and delete it.