You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Utkarsh Sengar <ut...@gmail.com> on 2013/11/13 00:58:56 UTC

High disk IO during UpdateCSV

Hello,

I load data from csv to solr via UpdateCSV. There are about 50M documents
with 10 columns in each document. The index size is about 15GB and I am
using a 3 node distributed solr cluster.

While loading the data the disk IO goes to 100%. if the load balancer in
front of solr hits the machine which is doing the processing then the
request times out. But in general, requests to all the machines become
slow. I have attached a screenshot of the diskI/O and CPU usage.

Is there a fix in solr which can possibly throttle the load or maybe its
due to MergePolicy? How can I debug solr to get the exact cause?

-- 
Thanks,
-Utkarsh

Re: High disk IO during UpdateCSV

Posted by Utkarsh Sengar <ut...@gmail.com>.
Thanks guys!
I will start splitting the file in chunks of 5M (10 chunks) to start with
reduce the size if needed.

Thanks,
-Utkarsh


On Wed, Nov 13, 2013 at 9:08 AM, Walter Underwood <wu...@wunderwood.org>wrote:

> Don't load 50M documents in one shot. Break it up into reasonable chunks
> (100K?) with commits at each point.
>
> You will have a bottleneck somewhere, usually disk or CPU. Yours appears
> to be disk. If you get faster disks, it might become the CPU.
>
> wunder
>
> On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar <ut...@gmail.com> wrote:
>
> > Bumping this one again, any suggestions?
> >
> >
> > On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <utkarsh2012@gmail.com
> >wrote:
> >
> >> Hello,
> >>
> >> I load data from csv to solr via UpdateCSV. There are about 50M
> documents
> >> with 10 columns in each document. The index size is about 15GB and I am
> >> using a 3 node distributed solr cluster.
> >>
> >> While loading the data the disk IO goes to 100%. if the load balancer in
> >> front of solr hits the machine which is doing the processing then the
> >> request times out. But in general, requests to all the machines become
> >> slow. I have attached a screenshot of the diskI/O and CPU usage.
> >>
> >> Is there a fix in solr which can possibly throttle the load or maybe its
> >> due to MergePolicy? How can I debug solr to get the exact cause?
> >>
> >> --
> >> Thanks,
> >> -Utkarsh
> >>
> >
> >
> >
> > --
> > Thanks,
> > -Utkarsh
>
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
>


-- 
Thanks,
-Utkarsh

Re: High disk IO during UpdateCSV

Posted by Walter Underwood <wu...@wunderwood.org>.
Don't load 50M documents in one shot. Break it up into reasonable chunks (100K?) with commits at each point.

You will have a bottleneck somewhere, usually disk or CPU. Yours appears to be disk. If you get faster disks, it might become the CPU.

wunder

On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar <ut...@gmail.com> wrote:

> Bumping this one again, any suggestions?
> 
> 
> On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <ut...@gmail.com>wrote:
> 
>> Hello,
>> 
>> I load data from csv to solr via UpdateCSV. There are about 50M documents
>> with 10 columns in each document. The index size is about 15GB and I am
>> using a 3 node distributed solr cluster.
>> 
>> While loading the data the disk IO goes to 100%. if the load balancer in
>> front of solr hits the machine which is doing the processing then the
>> request times out. But in general, requests to all the machines become
>> slow. I have attached a screenshot of the diskI/O and CPU usage.
>> 
>> Is there a fix in solr which can possibly throttle the load or maybe its
>> due to MergePolicy? How can I debug solr to get the exact cause?
>> 
>> --
>> Thanks,
>> -Utkarsh
>> 
> 
> 
> 
> -- 
> Thanks,
> -Utkarsh

--
Walter Underwood
wunder@wunderwood.org




Re: High disk IO during UpdateCSV

Posted by Utkarsh Sengar <ut...@gmail.com>.
Hi Michael,

I am using solr cloud 4.5.
And update csv loads data to one of these nodes.
Attachment: http://i.imgur.com/1xmoNtt.png


Thanks,
-Utkarsh


On Wed, Nov 13, 2013 at 8:33 AM, Michael Della Bitta <
michael.della.bitta@appinions.com> wrote:

> Utkarsh,
>
> Your screenshot didn't come through. I don't think this list allows
> attachments. Maybe put it up on imgur or something?
>
> I'm a little unclear on whether you're using Solr in Cloud mode, or with a
> single master.
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/>
>
>
> On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar <utkarsh2012@gmail.com
> >wrote:
>
> > Bumping this one again, any suggestions?
> >
> >
> > On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <utkarsh2012@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > I load data from csv to solr via UpdateCSV. There are about 50M
> documents
> > > with 10 columns in each document. The index size is about 15GB and I am
> > > using a 3 node distributed solr cluster.
> > >
> > > While loading the data the disk IO goes to 100%. if the load balancer
> in
> > > front of solr hits the machine which is doing the processing then the
> > > request times out. But in general, requests to all the machines become
> > > slow. I have attached a screenshot of the diskI/O and CPU usage.
> > >
> > > Is there a fix in solr which can possibly throttle the load or maybe
> its
> > > due to MergePolicy? How can I debug solr to get the exact cause?
> > >
> > > --
> > > Thanks,
> > > -Utkarsh
> > >
> >
> >
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>



-- 
Thanks,
-Utkarsh

Re: High disk IO during UpdateCSV

Posted by Michael Della Bitta <mi...@appinions.com>.
Utkarsh,

Your screenshot didn't come through. I don't think this list allows
attachments. Maybe put it up on imgur or something?

I'm a little unclear on whether you're using Solr in Cloud mode, or with a
single master.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar <ut...@gmail.com>wrote:

> Bumping this one again, any suggestions?
>
>
> On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <utkarsh2012@gmail.com
> >wrote:
>
> > Hello,
> >
> > I load data from csv to solr via UpdateCSV. There are about 50M documents
> > with 10 columns in each document. The index size is about 15GB and I am
> > using a 3 node distributed solr cluster.
> >
> > While loading the data the disk IO goes to 100%. if the load balancer in
> > front of solr hits the machine which is doing the processing then the
> > request times out. But in general, requests to all the machines become
> > slow. I have attached a screenshot of the diskI/O and CPU usage.
> >
> > Is there a fix in solr which can possibly throttle the load or maybe its
> > due to MergePolicy? How can I debug solr to get the exact cause?
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>
>
>
> --
> Thanks,
> -Utkarsh
>

Re: High disk IO during UpdateCSV

Posted by Utkarsh Sengar <ut...@gmail.com>.
Bumping this one again, any suggestions?


On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <ut...@gmail.com>wrote:

> Hello,
>
> I load data from csv to solr via UpdateCSV. There are about 50M documents
> with 10 columns in each document. The index size is about 15GB and I am
> using a 3 node distributed solr cluster.
>
> While loading the data the disk IO goes to 100%. if the load balancer in
> front of solr hits the machine which is doing the processing then the
> request times out. But in general, requests to all the machines become
> slow. I have attached a screenshot of the diskI/O and CPU usage.
>
> Is there a fix in solr which can possibly throttle the load or maybe its
> due to MergePolicy? How can I debug solr to get the exact cause?
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh