You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Aigner, Thomas" <TA...@WescoDist.com> on 2006/12/07 19:14:46 UTC

Reading Performance

 

 

Howdy all,

 

      I have a question on reading many documents and time to do this.
I have a loop on the hits object reading a record, then writing it to a
file.  When there is only 1 user on the Index Searcher, this process to
read say 100,000 takes around 3 seconds.  This is slow, but can be
acceptable.  When a few more users do searchers, this time to just read
from the hits object becomes well over 10 seconds, sometimes even 30+
seconds.  Is there a better way to read through and do something with
the hits information?  And yes, I have to read all of them to do this
particular task.

 

for (int i = 0;(i <= hits.length() - 1); i++) 

{

                        

      if (fw == null) 

      {

            fw = new BufferedWriter( new FileWriter( searchWriteSpec ),
8196) ;

      }

 

      //Write Out records

      String tmpHold = "";

tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get("somefield2");

                              

      fw.write(tmpHold + "\n" );

                              

}

 

Any ideas on how to speed this up especially with multiple users?  Each
user gets their own class which has the above code in it.

 

Thanks,

Tom

Re: Reading Performance

Posted by Andrew Hudson <an...@gmail.com>.

I think I've seen this problem when you use Lucene's built in delete
mechanism,
IndexReader.deleteDocument I believe.  The problem was it was synchronizing
on a java BitSet, which totally killed performance when more than one
process
was using the same IndexReader.  Better way to do deletes is to use your own
BitSet in combination with a HitCollector and just ignore docs that are
deleted.

On 12/8/06, Aigner, Thomas <TA...@wescodist.com> wrote:
>
> I have tried the HitsCollector and the time has improved ~ 3/4 a second
> on the searching.  I still get really bad times when two or more people
> ask for data at the same time.  The problem doesn't seem to be in
> writing the files, it's in getting data from the index when two or more
> people ask for large recordsets back (I can take all the I/O statements
> out and still see the performance bottleneck)
>
> Tom
>
>
> hc = new HitCollector() {
>         BufferedWriter fw2 = null;
>         public void collect(int id, float score) {
>                     try
>                       {
>
>                            Document doc = is.doc(id);
>                            if (fw2 == null)
>                             {
>                                 fw2 = new BufferedWriter( new
> FileWriter( "WHERETOWRITEFILE"), 8196 );
>                             }
>                             fw2.write(doc.get("field1") + "\n");
>                             fw2.flush();
>                             }
>                      catch (Exception ex) {
>
>                         }
>                    }
> };
> is.search(query, hc);
>
>
> -----Original Message-----
> From: Aigner, Thomas [mailto:TAigner@WescoDist.com]
> Sent: Thursday, December 07, 2006 1:36 PM
> To: java-user@lucene.apache.org
> Subject: RE: Reading Performance
>
> Thanks Grant and Erik for your suggestions.  I will try both of them and
> let you know if I see a marked increase in speed.
>
> Tom
>
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
> Sent: Thursday, December 07, 2006 1:24 PM
> To: java-user@lucene.apache.org
> Subject: Re: Reading Performance
>
> Have you done any profiling to identify hotspots in Lucene versus
> your application?
>
> You might look into the FieldSelector code (used in IndexReader) in
> the Trunk version of Lucene could be used to only load the fields you
> are interested when getting the document from disk.  This can be
> useful if you have large fields that are being loaded that you don't
> necessarily need (thus skipping them).
>
> Also, do you need the BufferedWriter construction and check in side
> the loop?  Probably small in comparison to loading, but  It seems
> like it is only created once, why have it in the loop?
>
>
>
> On Dec 7, 2006, at 1:14 PM, Aigner, Thomas wrote:
>
> >
> >
> >
> >
> > Howdy all,
> >
> >
> >
> >       I have a question on reading many documents and time to do this.
> > I have a loop on the hits object reading a record, then writing it
> > to a
> > file.  When there is only 1 user on the Index Searcher, this
> > process to
> > read say 100,000 takes around 3 seconds.  This is slow, but can be
> > acceptable.  When a few more users do searchers, this time to just
> > read
> > from the hits object becomes well over 10 seconds, sometimes even 30+
> > seconds.  Is there a better way to read through and do something with
> > the hits information?  And yes, I have to read all of them to do this
> > particular task.
> >
> >
> >
> > for (int i = 0;(i <= hits.length() - 1); i++)
> >
> > {
> >
> >
> >
> >       if (fw == null)
> >
> >       {
> >
> >             fw = new BufferedWriter( new FileWriter
> > ( searchWriteSpec ),
> > 8196) ;
> >
> >       }
> >
> >
> >
> >       //Write Out records
> >
> >       String tmpHold = "";
> >
> > tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get
> > ("somefield2");
> >
> >
> >
> >       fw.write(tmpHold + "\n" );
> >
> >
> >
> > }
> >
> >
> >
> > Any ideas on how to speed this up especially with multiple users?
> > Each
> > user gets their own class which has the above code in it.
> >
> >
> >
> > Thanks,
> >
> > Tom
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Reading Performance

Posted by Chris Hostetter <ho...@fucit.org>.

: > on the searching.  I still get really bad times when two or more people
: > ask for data at the same time.  The problem doesn't seem to be in
: > writing the files, it's in getting data from the index when two or more
: > people ask for large recordsets back (I can take all the I/O statements
: > out and still see the performance bottleneck)

I'm not sure why it would be particularly bad when dealing with concurrent
requests, but as a general rules you want to avoid calling doc(n) inside
of a HitCollector, particularly given your use case...

: >                            Document doc = is.doc(id);
: >                            if (fw2 == null)
: >                             {
: >                                 fw2 = new BufferedWriter( new
: > FileWriter( "WHERETOWRITEFILE"), 8196 );
: >                             }
: >                             fw2.write(doc.get("field1") + "\n");
: >                             fw2.flush();

...if all you are doing is pulling out a single stored field, then you'll
see much better performance if you index that data in another field
UN_TOKENIZED and read it out of the FieldCache ... alternatley you could
use a FieldSelector so that only the stored value of field1 (and not any
other stored fields) are read by the doc(id) call.

: > is.search(query, hc);

i assume you are, reusing the same IndexSearcher object for all of your
parallel threads right? ... otherwise that may explain the behavior you
are seing as well.


: >
: >
: > -----Original Message-----
: > From: Aigner, Thomas [mailto:TAigner@WescoDist.com]
: > Sent: Thursday, December 07, 2006 1:36 PM
: > To: java-user@lucene.apache.org
: > Subject: RE: Reading Performance
: >
: > Thanks Grant and Erik for your suggestions.  I will try both of them and
: > let you know if I see a marked increase in speed.
: >
: > Tom
: >
: >
: > -----Original Message-----
: > From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
: > Sent: Thursday, December 07, 2006 1:24 PM
: > To: java-user@lucene.apache.org
: > Subject: Re: Reading Performance
: >
: > Have you done any profiling to identify hotspots in Lucene versus
: > your application?
: >
: > You might look into the FieldSelector code (used in IndexReader) in
: > the Trunk version of Lucene could be used to only load the fields you
: > are interested when getting the document from disk.  This can be
: > useful if you have large fields that are being loaded that you don't
: > necessarily need (thus skipping them).
: >
: > Also, do you need the BufferedWriter construction and check in side
: > the loop?  Probably small in comparison to loading, but  It seems
: > like it is only created once, why have it in the loop?
: >
: >
: >
: > On Dec 7, 2006, at 1:14 PM, Aigner, Thomas wrote:
: >
: > >
: > >
: > >
: > >
: > > Howdy all,
: > >
: > >
: > >
: > >       I have a question on reading many documents and time to do this.
: > > I have a loop on the hits object reading a record, then writing it
: > > to a
: > > file.  When there is only 1 user on the Index Searcher, this
: > > process to
: > > read say 100,000 takes around 3 seconds.  This is slow, but can be
: > > acceptable.  When a few more users do searchers, this time to just
: > > read
: > > from the hits object becomes well over 10 seconds, sometimes even 30+
: > > seconds.  Is there a better way to read through and do something with
: > > the hits information?  And yes, I have to read all of them to do this
: > > particular task.
: > >
: > >
: > >
: > > for (int i = 0;(i <= hits.length() - 1); i++)
: > >
: > > {
: > >
: > >
: > >
: > >       if (fw == null)
: > >
: > >       {
: > >
: > >             fw = new BufferedWriter( new FileWriter
: > > ( searchWriteSpec ),
: > > 8196) ;
: > >
: > >       }
: > >
: > >
: > >
: > >       //Write Out records
: > >
: > >       String tmpHold = "";
: > >
: > > tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get
: > > ("somefield2");
: > >
: > >
: > >
: > >       fw.write(tmpHold + "\n" );
: > >
: > >
: > >
: > > }
: > >
: > >
: > >
: > > Any ideas on how to speed this up especially with multiple users?
: > > Each
: > > user gets their own class which has the above code in it.
: > >
: > >
: > >
: > > Thanks,
: > >
: > > Tom
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: > >
: >
: > ------------------------------------------------------
: > Grant Ingersoll
: > http://www.grantingersoll.com/
: >
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
: >
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Reading Performance

Posted by Erick Erickson <er...@gmail.com>.

I'm stumped. It seems like it might be time to haul a profiler out. I'm
particularly surprised because I put together a test system that fired a
bunch of threads at a searcher and saw nothing like you're seeing even up to
the 30 simultaneous requests running.

Do you know whether you're I/O bound, CPU bound, or???

It really sounds like there is some sort of synchronization issue, but
whether it's in Lucene or your code or some other system resource
contention, I have no idea.

I'd probably write a small program that spawned off several search threads
simultaneously to try to narrow it down (do something creative so you don't
just re-execute the same query and, perhaps, get skewed results due to
caching). That way, you could run it all on an isolated box and really pin
where the issue lives rather than having to wade through all the other stuff
that's surely running on your server. I'd really, really, really start with
*only* the lucene search, then add in other parts of my system until it
broke. Of course, if it broke with just the Lucene you wouldn't have added
much <G>. This approach should also give you a much easier environment to
run a profiler on than, say, a server.

It should be fairly simple to do just the Lucene part, but the part about
"adding other parts of your system" may be prohibitively expensive. But it's
a thought.....

Of course, Andrew's suggestion might make it all unnecessary...

Not a huge help I know, but it's the best I can do.....

Erick

On 12/8/06, Aigner, Thomas <TA...@wescodist.com> wrote:
>
> I have tried the HitsCollector and the time has improved ~ 3/4 a second
> on the searching.  I still get really bad times when two or more people
> ask for data at the same time.  The problem doesn't seem to be in
> writing the files, it's in getting data from the index when two or more
> people ask for large recordsets back (I can take all the I/O statements
> out and still see the performance bottleneck)
>
> Tom
>
>
> hc = new HitCollector() {
>         BufferedWriter fw2 = null;
>         public void collect(int id, float score) {
>                     try
>                       {
>
>                            Document doc = is.doc(id);
>                            if (fw2 == null)
>                             {
>                                 fw2 = new BufferedWriter( new
> FileWriter( "WHERETOWRITEFILE"), 8196 );
>                             }
>                             fw2.write(doc.get("field1") + "\n");
>                             fw2.flush();
>                             }
>                      catch (Exception ex) {
>
>                         }
>                    }
> };
> is.search(query, hc);
>
>
> -----Original Message-----
> From: Aigner, Thomas [mailto:TAigner@WescoDist.com]
> Sent: Thursday, December 07, 2006 1:36 PM
> To: java-user@lucene.apache.org
> Subject: RE: Reading Performance
>
> Thanks Grant and Erik for your suggestions.  I will try both of them and
> let you know if I see a marked increase in speed.
>
> Tom
>
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
> Sent: Thursday, December 07, 2006 1:24 PM
> To: java-user@lucene.apache.org
> Subject: Re: Reading Performance
>
> Have you done any profiling to identify hotspots in Lucene versus
> your application?
>
> You might look into the FieldSelector code (used in IndexReader) in
> the Trunk version of Lucene could be used to only load the fields you
> are interested when getting the document from disk.  This can be
> useful if you have large fields that are being loaded that you don't
> necessarily need (thus skipping them).
>
> Also, do you need the BufferedWriter construction and check in side
> the loop?  Probably small in comparison to loading, but  It seems
> like it is only created once, why have it in the loop?
>
>
>
> On Dec 7, 2006, at 1:14 PM, Aigner, Thomas wrote:
>
> >
> >
> >
> >
> > Howdy all,
> >
> >
> >
> >       I have a question on reading many documents and time to do this.
> > I have a loop on the hits object reading a record, then writing it
> > to a
> > file.  When there is only 1 user on the Index Searcher, this
> > process to
> > read say 100,000 takes around 3 seconds.  This is slow, but can be
> > acceptable.  When a few more users do searchers, this time to just
> > read
> > from the hits object becomes well over 10 seconds, sometimes even 30+
> > seconds.  Is there a better way to read through and do something with
> > the hits information?  And yes, I have to read all of them to do this
> > particular task.
> >
> >
> >
> > for (int i = 0;(i <= hits.length() - 1); i++)
> >
> > {
> >
> >
> >
> >       if (fw == null)
> >
> >       {
> >
> >             fw = new BufferedWriter( new FileWriter
> > ( searchWriteSpec ),
> > 8196) ;
> >
> >       }
> >
> >
> >
> >       //Write Out records
> >
> >       String tmpHold = "";
> >
> > tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get
> > ("somefield2");
> >
> >
> >
> >       fw.write(tmpHold + "\n" );
> >
> >
> >
> > }
> >
> >
> >
> > Any ideas on how to speed this up especially with multiple users?
> > Each
> > user gets their own class which has the above code in it.
> >
> >
> >
> > Thanks,
> >
> > Tom
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Reading Performance

Posted by "Aigner, Thomas" <TA...@WescoDist.com>.

I have tried the HitsCollector and the time has improved ~ 3/4 a second
on the searching.  I still get really bad times when two or more people
ask for data at the same time.  The problem doesn't seem to be in
writing the files, it's in getting data from the index when two or more
people ask for large recordsets back (I can take all the I/O statements
out and still see the performance bottleneck)

Tom


hc = new HitCollector() {
	BufferedWriter fw2 = null;
	public void collect(int id, float score) {
		    try 
		      {
		            		
		           Document doc = is.doc(id);
		           if (fw2 == null)
		            {
		            	fw2 = new BufferedWriter( new
FileWriter( "WHERETOWRITEFILE"), 8196 );
		            }
		            fw2.write(doc.get("field1") + "\n");
		            fw2.flush();
		            }
		     catch (Exception ex) {
		            		
		        }
		   }
};		        
is.search(query, hc);


-----Original Message-----
From: Aigner, Thomas [mailto:TAigner@WescoDist.com] 
Sent: Thursday, December 07, 2006 1:36 PM
To: java-user@lucene.apache.org
Subject: RE: Reading Performance

Thanks Grant and Erik for your suggestions.  I will try both of them and
let you know if I see a marked increase in speed.

Tom


-----Original Message-----
From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com] 
Sent: Thursday, December 07, 2006 1:24 PM
To: java-user@lucene.apache.org
Subject: Re: Reading Performance

Have you done any profiling to identify hotspots in Lucene versus  
your application?

You might look into the FieldSelector code (used in IndexReader) in  
the Trunk version of Lucene could be used to only load the fields you  
are interested when getting the document from disk.  This can be  
useful if you have large fields that are being loaded that you don't  
necessarily need (thus skipping them).

Also, do you need the BufferedWriter construction and check in side  
the loop?  Probably small in comparison to loading, but  It seems  
like it is only created once, why have it in the loop?



On Dec 7, 2006, at 1:14 PM, Aigner, Thomas wrote:

>
>
>
>
> Howdy all,
>
>
>
>       I have a question on reading many documents and time to do this.
> I have a loop on the hits object reading a record, then writing it  
> to a
> file.  When there is only 1 user on the Index Searcher, this  
> process to
> read say 100,000 takes around 3 seconds.  This is slow, but can be
> acceptable.  When a few more users do searchers, this time to just  
> read
> from the hits object becomes well over 10 seconds, sometimes even 30+
> seconds.  Is there a better way to read through and do something with
> the hits information?  And yes, I have to read all of them to do this
> particular task.
>
>
>
> for (int i = 0;(i <= hits.length() - 1); i++)
>
> {
>
>
>
>       if (fw == null)
>
>       {
>
>             fw = new BufferedWriter( new FileWriter 
> ( searchWriteSpec ),
> 8196) ;
>
>       }
>
>
>
>       //Write Out records
>
>       String tmpHold = "";
>
> tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get 
> ("somefield2");
>
>
>
>       fw.write(tmpHold + "\n" );
>
>
>
> }
>
>
>
> Any ideas on how to speed this up especially with multiple users?   
> Each
> user gets their own class which has the above code in it.
>
>
>
> Thanks,
>
> Tom
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Reading Performance

Posted by "Aigner, Thomas" <TA...@WescoDist.com>.

Thanks Grant and Erik for your suggestions.  I will try both of them and
let you know if I see a marked increase in speed.

Tom


-----Original Message-----
From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com] 
Sent: Thursday, December 07, 2006 1:24 PM
To: java-user@lucene.apache.org
Subject: Re: Reading Performance

Have you done any profiling to identify hotspots in Lucene versus  
your application?

You might look into the FieldSelector code (used in IndexReader) in  
the Trunk version of Lucene could be used to only load the fields you  
are interested when getting the document from disk.  This can be  
useful if you have large fields that are being loaded that you don't  
necessarily need (thus skipping them).

Also, do you need the BufferedWriter construction and check in side  
the loop?  Probably small in comparison to loading, but  It seems  
like it is only created once, why have it in the loop?



On Dec 7, 2006, at 1:14 PM, Aigner, Thomas wrote:

>
>
>
>
> Howdy all,
>
>
>
>       I have a question on reading many documents and time to do this.
> I have a loop on the hits object reading a record, then writing it  
> to a
> file.  When there is only 1 user on the Index Searcher, this  
> process to
> read say 100,000 takes around 3 seconds.  This is slow, but can be
> acceptable.  When a few more users do searchers, this time to just  
> read
> from the hits object becomes well over 10 seconds, sometimes even 30+
> seconds.  Is there a better way to read through and do something with
> the hits information?  And yes, I have to read all of them to do this
> particular task.
>
>
>
> for (int i = 0;(i <= hits.length() - 1); i++)
>
> {
>
>
>
>       if (fw == null)
>
>       {
>
>             fw = new BufferedWriter( new FileWriter 
> ( searchWriteSpec ),
> 8196) ;
>
>       }
>
>
>
>       //Write Out records
>
>       String tmpHold = "";
>
> tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get 
> ("somefield2");
>
>
>
>       fw.write(tmpHold + "\n" );
>
>
>
> }
>
>
>
> Any ideas on how to speed this up especially with multiple users?   
> Each
> user gets their own class which has the above code in it.
>
>
>
> Thanks,
>
> Tom
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Reading Performance

Posted by Grant Ingersoll <gr...@gmail.com>.

Have you done any profiling to identify hotspots in Lucene versus  
your application?

You might look into the FieldSelector code (used in IndexReader) in  
the Trunk version of Lucene could be used to only load the fields you  
are interested when getting the document from disk.  This can be  
useful if you have large fields that are being loaded that you don't  
necessarily need (thus skipping them).

Also, do you need the BufferedWriter construction and check in side  
the loop?  Probably small in comparison to loading, but  It seems  
like it is only created once, why have it in the loop?



On Dec 7, 2006, at 1:14 PM, Aigner, Thomas wrote:

>
>
>
>
> Howdy all,
>
>
>
>       I have a question on reading many documents and time to do this.
> I have a loop on the hits object reading a record, then writing it  
> to a
> file.  When there is only 1 user on the Index Searcher, this  
> process to
> read say 100,000 takes around 3 seconds.  This is slow, but can be
> acceptable.  When a few more users do searchers, this time to just  
> read
> from the hits object becomes well over 10 seconds, sometimes even 30+
> seconds.  Is there a better way to read through and do something with
> the hits information?  And yes, I have to read all of them to do this
> particular task.
>
>
>
> for (int i = 0;(i <= hits.length() - 1); i++)
>
> {
>
>
>
>       if (fw == null)
>
>       {
>
>             fw = new BufferedWriter( new FileWriter 
> ( searchWriteSpec ),
> 8196) ;
>
>       }
>
>
>
>       //Write Out records
>
>       String tmpHold = "";
>
> tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get 
> ("somefield2");
>
>
>
>       fw.write(tmpHold + "\n" );
>
>
>
> }
>
>
>
> Any ideas on how to speed this up especially with multiple users?   
> Each
> user gets their own class which has the above code in it.
>
>
>
> Thanks,
>
> Tom
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Reading Performance

Posted by Erick Erickson <er...@gmail.com>.

Well, the performance isn't bad considering you're executing the *search*
around 1,000 times.......

One of the characteristics of a Hits object is that it's optimized for
getting the top 100 docs or so. To get the next 100 docs it re-executes the
query. Repeatedly <G>. I'd try using a HitCollector or TopDocs instead of a
Hits object....

Hope this helps
Erick

On 12/7/06, Aigner, Thomas <TA...@wescodist.com> wrote:
>
>
>
>
>
> Howdy all,
>
>
>
>       I have a question on reading many documents and time to do this.
> I have a loop on the hits object reading a record, then writing it to a
> file.  When there is only 1 user on the Index Searcher, this process to
> read say 100,000 takes around 3 seconds.  This is slow, but can be
> acceptable.  When a few more users do searchers, this time to just read
> from the hits object becomes well over 10 seconds, sometimes even 30+
> seconds.  Is there a better way to read through and do something with
> the hits information?  And yes, I have to read all of them to do this
> particular task.
>
>
>
> for (int i = 0;(i <= hits.length() - 1); i++)
>
> {
>
>
>
>       if (fw == null)
>
>       {
>
>             fw = new BufferedWriter( new FileWriter( searchWriteSpec ),
> 8196) ;
>
>       }
>
>
>
>       //Write Out records
>
>       String tmpHold = "";
>
> tmpHold = hits.doc(i).get("somefield1") + hits.doc(i).get("somefield2");
>
>
>
>       fw.write(tmpHold + "\n" );
>
>
>
> }
>
>
>
> Any ideas on how to speed this up especially with multiple users?  Each
> user gets their own class which has the above code in it.
>
>
>
> Thanks,
>
> Tom
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>