You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Alex <ch...@hotmail.com> on 2008/05/19 20:56:48 UTC

slow FieldCacheImpl.createValue

hi,
I have a ValueSourceQuery that makes use of a stored field. The field contains roughly 27.27 million untokenized terms.
The average length of each term is 8 digits.
The first search always takes around 5 minutes, and it is due to the createValue function in the FieldCacheImpl.
The search is executed on a RAID5 disk array of 15k rpm. 


any hints as to make the fieldcache createvalue faster ? I have tried a bigger cache size for BufferedIndexReader (8kb or more) ,
but the time it took for createValue to execute is still in the realm of 4, 5 minutes. 


thanks

_________________________________________________________________
5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
http://mail.live.com 

Re: slow FieldCacheImpl.createValue

Posted by Chris Lu <ch...@gmail.com>.
This should have a great boost to performance. Any plan to merge it into the
main brance instead of patch?

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Tue, May 20, 2008 at 7:37 AM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> https://issues.apache.org/jira/browse/LUCENE-1278 solves this problem
>
> On Tue, May 20, 2008 at 1:32 AM, Anshum <an...@gmail.com> wrote:
>
> > Hey Alex,
> > I guess you haven't tried warming up the engine before putting it to use.
> > Though one of the simpler implementation, you could try warming up the
> > engine first by sending a few searches and then put it to use (put it
> into
> > the serving machine loop). You could also do a little bit of
> preprocessing
> > while initializing the daemon rather than waiting for the search to hit
> it.
> > I hope I understood the problem correctly here, else would have to look
> > into
> > it.
> >
> > --
> > Anshum
> >
> > 2008/5/20 Alex <ch...@hotmail.com>:
> >
> > > hi,
> > > I have a ValueSourceQuery that makes use of a stored field. The field
> > > contains roughly 27.27 million untokenized terms.
> > > The average length of each term is 8 digits.
> > > The first search always takes around 5 minutes, and it is due to the
> > > createValue function in the FieldCacheImpl.
> > > The search is executed on a RAID5 disk array of 15k rpm.
> > >
> > >
> > > any hints as to make the fieldcache createvalue faster ? I have tried a
> > > bigger cache size for BufferedIndexReader (8kb or more) ,
> > > but the time it took for createValue to execute is still in the realm
> of
> > 4,
> > > 5 minutes.
> > >
> > >
> > > thanks
> > >
> > > _________________________________________________________________
> > > 5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
> > > http://mail.live.com
> > >
> >
> >
> >
> > --
> > --
> > The facts expressed here belong to everybody, the opinions to me.
> > The distinction is yours to draw............
> >
>

Re: slow FieldCacheImpl.createValue

Posted by Jason Rutherglen <ja...@gmail.com>.
https://issues.apache.org/jira/browse/LUCENE-1278 solves this problem

On Tue, May 20, 2008 at 1:32 AM, Anshum <an...@gmail.com> wrote:

> Hey Alex,
> I guess you haven't tried warming up the engine before putting it to use.
> Though one of the simpler implementation, you could try warming up the
> engine first by sending a few searches and then put it to use (put it into
> the serving machine loop). You could also do a little bit of preprocessing
> while initializing the daemon rather than waiting for the search to hit it.
> I hope I understood the problem correctly here, else would have to look
> into
> it.
>
> --
> Anshum
>
> 2008/5/20 Alex <ch...@hotmail.com>:
>
> > hi,
> > I have a ValueSourceQuery that makes use of a stored field. The field
> > contains roughly 27.27 million untokenized terms.
> > The average length of each term is 8 digits.
> > The first search always takes around 5 minutes, and it is due to the
> > createValue function in the FieldCacheImpl.
> > The search is executed on a RAID5 disk array of 15k rpm.
> >
> >
> > any hints as to make the fieldcache createvalue faster ? I have tried a
> > bigger cache size for BufferedIndexReader (8kb or more) ,
> > but the time it took for createValue to execute is still in the realm of
> 4,
> > 5 minutes.
> >
> >
> > thanks
> >
> > _________________________________________________________________
> > 5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
> > http://mail.live.com
> >
>
>
>
> --
> --
> The facts expressed here belong to everybody, the opinions to me.
> The distinction is yours to draw............
>

RE: slow FieldCacheImpl.createValue

Posted by Alex <ch...@hotmail.com>.
Hi,
thanks for the reply. Yes, after the first slow search, subsequent searches have good performance.

I guess the issue is why exactally, is createValue taking so long, or should it take so long (4 ~ 5 minutes ).
Given roughly 27million terms, each of roughly 8 characters long and few other bytes for the TermInfo record,
a modern disk can easily read over the portion of the index (the .frq portion ) in a few seconds. Also,
when I use tools like dstat, I see bunch of 1kb reads initiated while running createValue. 




> Date: Tue, 20 May 2008 11:02:38 +0530
> From: anshumg@gmail.com
> To: java-user@lucene.apache.org
> Subject: Re: slow FieldCacheImpl.createValue
> 
> Hey Alex,
> I guess you haven't tried warming up the engine before putting it to use.
> Though one of the simpler implementation, you could try warming up the
> engine first by sending a few searches and then put it to use (put it into
> the serving machine loop). You could also do a little bit of preprocessing
> while initializing the daemon rather than waiting for the search to hit it.
> I hope I understood the problem correctly here, else would have to look into
> it.
> 
> --
> Anshum


_________________________________________________________________
用部落格分享照片、影音、趣味小工具和最愛清單,盡情秀出你自己 — Windows Live Spaces
http://spaces.live.com/

Re: slow FieldCacheImpl.createValue

Posted by Anshum <an...@gmail.com>.
Hey Alex,
I guess you haven't tried warming up the engine before putting it to use.
Though one of the simpler implementation, you could try warming up the
engine first by sending a few searches and then put it to use (put it into
the serving machine loop). You could also do a little bit of preprocessing
while initializing the daemon rather than waiting for the search to hit it.
I hope I understood the problem correctly here, else would have to look into
it.

--
Anshum

2008/5/20 Alex <ch...@hotmail.com>:

> hi,
> I have a ValueSourceQuery that makes use of a stored field. The field
> contains roughly 27.27 million untokenized terms.
> The average length of each term is 8 digits.
> The first search always takes around 5 minutes, and it is due to the
> createValue function in the FieldCacheImpl.
> The search is executed on a RAID5 disk array of 15k rpm.
>
>
> any hints as to make the fieldcache createvalue faster ? I have tried a
> bigger cache size for BufferedIndexReader (8kb or more) ,
> but the time it took for createValue to execute is still in the realm of 4,
> 5 minutes.
>
>
> thanks
>
> _________________________________________________________________
> 5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
> http://mail.live.com
>



-- 
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............