You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Sharma, Raghvendra" <sr...@corelogic.com> on 2010/10/07 14:12:12 UTC
slow inserts (updatecsv) on solr
Hi,
I am running my instance on an old Pentium D (two cores) with 3 GB RAM on Ubuntu 64 bit server.
My schema is a mix of various data types from int, float, double and string. I am using uuid as my unique key, and my schema is pretty wide, 232 columns to be exact. The average load speed that I have got is around 1600 - 1700 rows per second (when running multiple updatecsv loads in parallel using curl).
All my fields are indexed, the user might want to search on anything ... I have to allow that to happen...
Till the time I was using relatively smaller files (few hundred MBs), the insert speeds were quite impressive.
However, as soon as I got into GBs of files, the load speeds went down dramatically..
I wonder if I can configure my solr instance to use more than one thread for loading data in parallel ??
Please suggest..
Raghvendra Sharma
Tech Consultant - BI
[cid:image001.png@01CB6643.D8997A10]
Direct +1-714-800-4926
Home +1-732-395-7111
Mobile +91-9686-45-1174
sraghvendra@corelogic.com<ma...@corelogic.com>
Block A, Lakeview Building,
5th Floor Bagmane Tech Park,
CV Raman Nagar
Bangalore - 560093
******************************************************************************************
This message may contain confidential or proprietary information intended only for the use of the
addressee(s) named above or may contain information that is legally privileged. If you are
not the intended addressee, or the person responsible for delivering it to the intended addressee,
you are hereby notified that reading, disseminating, distributing or copying this message is strictly
prohibited. If you have received this message by mistake, please immediately notify us by
replying to the message and delete the original message and any copies immediately thereafter.
Thank you.
******************************************************************************************
CLLD
Re: slow inserts (updatecsv) on solr
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,
Perhaps it's that old Pentium D with 3 GB RAM that simply can't handle both
reading files whtat are a few hundred MBs and writing them at the same time?
You could:
* split the big file (man split)
* open two terminals
* from each terminal make that same updatecsv call, but with one half of the
original file
It may not get your data in faster if you are hitting the hw limit, but it's
doable.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
>
>From: "Sharma, Raghvendra" <sr...@corelogic.com>
>To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>Sent: Thu, October 7, 2010 8:12:12 AM
>Subject: slow inserts (updatecsv) on solr
>
>
>Hi,
>
>I am running my instance on an old Pentium D (two cores) with 3 GB RAM on Ubuntu
>64 bit server.
>
>
>My schema is a mix of various data types from int, float, double and string. I
>am using uuid as my unique key, and my schema is pretty wide, 232 columns to be
>exact. The average load speed that I have got is around 1600 – 1700 rows per
>second (when running multiple updatecsv loads in parallel using curl).
>
>All my fields are indexed, the user might want to search on anything … I have to
>allow that to happen…
>
>Till the time I was using relatively smaller files (few hundred MBs), the insert
>speeds were quite impressive.
>However, as soon as I got into GBs of files, the load speeds went down
>dramatically..
>
>I wonder if I can configure my solr instance to use more than one thread for
>loading data in parallel ??
>
>
>Please suggest..
>
>
>Raghvendra Sharma
>Tech Consultant - BI
>
>Direct +1-714-800-4926
>Home +1-732-395-7111
>Mobile +91-9686-45-1174
>sraghvendra@corelogic.com
>
>Block A, Lakeview Building,
>5th Floor Bagmane Tech Park,
>CV Raman Nagar
>Bangalore - 560093
> ******************************************************************************************
>
>This message may contain confidential or proprietary information intended only
>for the use of the
>
>addressee(s) named above or may contain information that is legally privileged.
>If you are
>
>not the intended addressee, or the person responsible for delivering it to the
>intended addressee,
>
>you are hereby notified that reading, disseminating, distributing or copying
>this message is strictly
>
>prohibited. If you have received this message by mistake, please immediately
>notify us by
>
>replying to the message and delete the original message and any copies
>immediately thereafter.
>
>
>Thank you.
>******************************************************************************************
>
>CLLD