You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Sharma, Raghvendra" <sr...@corelogic.com> on 2010/10/07 14:12:12 UTC

slow inserts (updatecsv) on solr

Hi,

I am running my instance on an old Pentium D (two cores) with 3 GB RAM on Ubuntu 64 bit server.

My schema is a mix of various data types from int, float, double and string. I am using uuid as my unique key, and my schema is pretty wide, 232 columns to be exact.  The average load speed that I have got is around 1600 - 1700 rows per second (when running multiple updatecsv loads in parallel using curl).

All my fields are indexed, the user might want to search on anything ... I have to allow that to happen...

Till the time I was using relatively smaller files (few hundred MBs), the insert speeds were quite impressive.
However, as soon as I got into GBs of files, the load speeds went down dramatically..

I wonder if I can configure my solr instance to use more than one thread for loading data in parallel ??

Please suggest..


Raghvendra Sharma
Tech Consultant - BI
[cid:image001.png@01CB6643.D8997A10]
Direct   +1-714-800-4926
Home   +1-732-395-7111
Mobile  +91-9686-45-1174
sraghvendra@corelogic.com<ma...@corelogic.com>

Block A, Lakeview Building,
5th Floor Bagmane Tech Park,
CV Raman Nagar
Bangalore - 560093

****************************************************************************************** 
This message may contain confidential or proprietary information intended only for the use of the 
addressee(s) named above or may contain information that is legally privileged. If you are 
not the intended addressee, or the person responsible for delivering it to the intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying this message is strictly 
prohibited. If you have received this message by mistake, please immediately notify us by  
replying to the message and delete the original message and any copies immediately thereafter. 

Thank you. 
****************************************************************************************** 
CLLD

Re: slow inserts (updatecsv) on solr

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

Perhaps it's that old Pentium D with 3 GB RAM that simply can't handle both 
reading files whtat are a few hundred MBs and writing them at the same time?
You could:
* split the big file (man split)
* open two terminals
* from each terminal make that same updatecsv call, but with one half of the 
original file


It may not get your data in faster if you are hitting the hw limit, but it's 
doable.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: "Sharma, Raghvendra" <sr...@corelogic.com>
>To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>Sent: Thu, October 7, 2010 8:12:12 AM
>Subject: slow inserts (updatecsv) on solr
>
>  
>Hi,
> 
>I am running my instance on an old Pentium D (two cores) with 3 GB RAM on Ubuntu 
>64 bit server.  
>
> 
>My schema is a mix of various data types from int, float, double and string. I 
>am using uuid as my unique key, and my schema is pretty wide, 232 columns to be 
>exact.  The average load speed that I have got is around 1600 – 1700 rows per 
>second (when running multiple updatecsv loads in parallel using curl).
> 
>All my fields are indexed, the user might want to search on anything … I have to 
>allow that to happen…
> 
>Till the time I was using relatively smaller files (few hundred MBs), the insert 
>speeds were quite impressive.
>However, as soon as I got into GBs of files, the load speeds went down 
>dramatically..
> 
>I wonder if I can configure my solr instance to use more than one thread for 
>loading data in parallel ?? 
>
> 
>Please suggest..
> 
> 
>Raghvendra Sharma
>Tech Consultant - BI
>
>Direct   +1-714-800-4926
>Home   +1-732-395-7111
>Mobile  +91-9686-45-1174
>sraghvendra@corelogic.com
>
>Block A, Lakeview Building, 
>5th Floor Bagmane Tech Park, 
>CV Raman Nagar
>Bangalore - 560093
> ******************************************************************************************
> 
>This message may contain confidential or proprietary information intended only 
>for the use of the 
>
>addressee(s) named above or may contain information that is legally privileged. 
>If you are 
>
>not the intended addressee, or the person responsible for delivering it to the 
>intended addressee, 
>
>you are hereby notified that reading, disseminating, distributing or copying 
>this message is strictly 
>
>prohibited. If you have received this message by mistake, please immediately 
>notify us by  
>
>replying to the message and delete the original message and any copies 
>immediately thereafter. 
>
>
>Thank you. 
>******************************************************************************************
> 
>CLLD