You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Gründler <ro...@dubture.com> on 2010/12/02 12:28:31 UTC

Dataimport destroys our harddisks

Hi,

we have a serious harddisk problem, and it's definitely related to a full-import from a relational
database into a solr index.

The first time it happened on our development server, where the raidcontroller crashed during a full-import
of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2 of the harddisks where the solr
index files are located stopped working (we needed to replace them).

After the crash of the raid controller, we decided to move the development of solr/index related stuff to our
local development machines. 

Yesterday i was running another full-import of ~10 Million documents on my local development machine, 
and during the import, a harddisk failure occurred. Since this failure, my harddisk activity seems to 
be around 100% all the time, even if no solr server is running at all. 

I've been googling the last 2 days to find some info about solr related harddisk problems, but i didn't find anything
useful.

Are there any steps we need to take care of in respect to harddisk failures when doing a full-import? Right now,
our steps look like this:

1. Delete the current index
2. Restart solr, to load the updated schemas
3. Start the full import

Initially, the solr index and the relational database were located on the same harddisk. After the crash, we moved
the index to a separate harddisk, but nevertheless this harddisk crashed too.

I'd really appreciate any hints on what we might do wrong when importing data, as we can't release this
on our production servers when there's the risk of harddisk failures.


thanks.


-robert






Re: Dataimport destroys our harddisks

Posted by Sven Almgren <sv...@tras.se>.
That's the same series we use... we hade problems when running other
disk-heavy operations like rsync and backup on them too..

But in our case we mostly had hangs or load > 180 :P... Can you
simulate very heavy random disk i/o? if so then you could check if you
still have the same problems...

That's all I can be of help with, good luck :)

/Sven

2010/12/2 Robert Gründler <ro...@dubture.com>:
> On Dec 2, 2010, at 15:43 , Sven Almgren wrote:
>
>> What Raid controller do you use, and what kernel version? (Assuming
>> Linux). We hade problems during high load with a 3Ware raid controller
>> and the current kernel for Ubuntu 10.04, we hade to downgrade the
>> kernel...
>>
>> The problem was a bug in the driver that only showed up with very high
>> disk load (as is the case when doing imports)
>>
>
> We're running freebsd:
>
> RaidController  3ware 9500S-8
> Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU
> Freebsd 7.2, UFS Filesystem.
>
>
>
>> /Sven
>>
>> 2010/12/2 Robert Gründler <ro...@dubture.com>:
>>>> The very first thing I'd ask is "how much free space is on your disk
>>>> when this occurs?" Is it possible that you're simply filling up your
>>>> disk?
>>>
>>> no, i've checked that already. all disks have plenty of space (they have
>>> a capacity of 2TB, and are currently filled up to 20%.
>>>
>>>>
>>>> do note that an optimize may require up to 2X the size of your index
>>>> if/when it occurs. Are you sure you aren't optimizing as you add
>>>> items to your index?
>>>>
>>>
>>> index size is not a problem in our case. Our index currently has about 3GB.
>>>
>>> What do you mean with "optimizing as you add items to your index"?
>>>
>>>> But I've never heard of Solr causing hard disk crashes,
>>>
>>> neither did we, and google is the same opinion.
>>>
>>> One thing that i've found is the mergeFactor value:
>>>
>>> http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
>>>
>>> Our sysadmin speculates that maybe the chunk size of our raid/harddisks
>>> and the segment size of the lucene index does not play well together.
>>>
>>> Does the lucene segment size affect how the data is written to the disk?
>>>
>>>
>>> thanks for your help.
>>>
>>>
>>> -robert
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> 2010/12/2 Robert Gründler <ro...@dubture.com>
>>>>
>>>>> Hi,
>>>>>
>>>>> we have a serious harddisk problem, and it's definitely related to a
>>>>> full-import from a relational
>>>>> database into a solr index.
>>>>>
>>>>> The first time it happened on our development server, where the
>>>>> raidcontroller crashed during a full-import
>>>>> of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
>>>>> of the harddisks where the solr
>>>>> index files are located stopped working (we needed to replace them).
>>>>>
>>>>> After the crash of the raid controller, we decided to move the development
>>>>> of solr/index related stuff to our
>>>>> local development machines.
>>>>>
>>>>> Yesterday i was running another full-import of ~10 Million documents on my
>>>>> local development machine,
>>>>> and during the import, a harddisk failure occurred. Since this failure, my
>>>>> harddisk activity seems to
>>>>> be around 100% all the time, even if no solr server is running at all.
>>>>>
>>>>> I've been googling the last 2 days to find some info about solr related
>>>>> harddisk problems, but i didn't find anything
>>>>> useful.
>>>>>
>>>>> Are there any steps we need to take care of in respect to harddisk failures
>>>>> when doing a full-import? Right now,
>>>>> our steps look like this:
>>>>>
>>>>> 1. Delete the current index
>>>>> 2. Restart solr, to load the updated schemas
>>>>> 3. Start the full import
>>>>>
>>>>> Initially, the solr index and the relational database were located on the
>>>>> same harddisk. After the crash, we moved
>>>>> the index to a separate harddisk, but nevertheless this harddisk crashed
>>>>> too.
>>>>>
>>>>> I'd really appreciate any hints on what we might do wrong when importing
>>>>> data, as we can't release this
>>>>> on our production servers when there's the risk of harddisk failures.
>>>>>
>>>>>
>>>>> thanks.
>>>>>
>>>>>
>>>>> -robert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Re: Dataimport destroys our harddisks

Posted by Robert Gründler <ro...@dubture.com>.
On Dec 2, 2010, at 15:43 , Sven Almgren wrote:

> What Raid controller do you use, and what kernel version? (Assuming
> Linux). We hade problems during high load with a 3Ware raid controller
> and the current kernel for Ubuntu 10.04, we hade to downgrade the
> kernel...
> 
> The problem was a bug in the driver that only showed up with very high
> disk load (as is the case when doing imports)
> 

We're running freebsd:

RaidController  3ware 9500S-8
Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU
Freebsd 7.2, UFS Filesystem.



> /Sven
> 
> 2010/12/2 Robert Gründler <ro...@dubture.com>:
>>> The very first thing I'd ask is "how much free space is on your disk
>>> when this occurs?" Is it possible that you're simply filling up your
>>> disk?
>> 
>> no, i've checked that already. all disks have plenty of space (they have
>> a capacity of 2TB, and are currently filled up to 20%.
>> 
>>> 
>>> do note that an optimize may require up to 2X the size of your index
>>> if/when it occurs. Are you sure you aren't optimizing as you add
>>> items to your index?
>>> 
>> 
>> index size is not a problem in our case. Our index currently has about 3GB.
>> 
>> What do you mean with "optimizing as you add items to your index"?
>> 
>>> But I've never heard of Solr causing hard disk crashes,
>> 
>> neither did we, and google is the same opinion.
>> 
>> One thing that i've found is the mergeFactor value:
>> 
>> http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
>> 
>> Our sysadmin speculates that maybe the chunk size of our raid/harddisks
>> and the segment size of the lucene index does not play well together.
>> 
>> Does the lucene segment size affect how the data is written to the disk?
>> 
>> 
>> thanks for your help.
>> 
>> 
>> -robert
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Best
>>> Erick
>>> 
>>> 2010/12/2 Robert Gründler <ro...@dubture.com>
>>> 
>>>> Hi,
>>>> 
>>>> we have a serious harddisk problem, and it's definitely related to a
>>>> full-import from a relational
>>>> database into a solr index.
>>>> 
>>>> The first time it happened on our development server, where the
>>>> raidcontroller crashed during a full-import
>>>> of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
>>>> of the harddisks where the solr
>>>> index files are located stopped working (we needed to replace them).
>>>> 
>>>> After the crash of the raid controller, we decided to move the development
>>>> of solr/index related stuff to our
>>>> local development machines.
>>>> 
>>>> Yesterday i was running another full-import of ~10 Million documents on my
>>>> local development machine,
>>>> and during the import, a harddisk failure occurred. Since this failure, my
>>>> harddisk activity seems to
>>>> be around 100% all the time, even if no solr server is running at all.
>>>> 
>>>> I've been googling the last 2 days to find some info about solr related
>>>> harddisk problems, but i didn't find anything
>>>> useful.
>>>> 
>>>> Are there any steps we need to take care of in respect to harddisk failures
>>>> when doing a full-import? Right now,
>>>> our steps look like this:
>>>> 
>>>> 1. Delete the current index
>>>> 2. Restart solr, to load the updated schemas
>>>> 3. Start the full import
>>>> 
>>>> Initially, the solr index and the relational database were located on the
>>>> same harddisk. After the crash, we moved
>>>> the index to a separate harddisk, but nevertheless this harddisk crashed
>>>> too.
>>>> 
>>>> I'd really appreciate any hints on what we might do wrong when importing
>>>> data, as we can't release this
>>>> on our production servers when there's the risk of harddisk failures.
>>>> 
>>>> 
>>>> thanks.
>>>> 
>>>> 
>>>> -robert
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Dataimport destroys our harddisks

Posted by Sven Almgren <sv...@tras.se>.
What Raid controller do you use, and what kernel version? (Assuming
Linux). We hade problems during high load with a 3Ware raid controller
and the current kernel for Ubuntu 10.04, we hade to downgrade the
kernel...

The problem was a bug in the driver that only showed up with very high
disk load (as is the case when doing imports)

/Sven

2010/12/2 Robert Gründler <ro...@dubture.com>:
>> The very first thing I'd ask is "how much free space is on your disk
>> when this occurs?" Is it possible that you're simply filling up your
>> disk?
>
> no, i've checked that already. all disks have plenty of space (they have
> a capacity of 2TB, and are currently filled up to 20%.
>
>>
>> do note that an optimize may require up to 2X the size of your index
>> if/when it occurs. Are you sure you aren't optimizing as you add
>> items to your index?
>>
>
> index size is not a problem in our case. Our index currently has about 3GB.
>
> What do you mean with "optimizing as you add items to your index"?
>
>> But I've never heard of Solr causing hard disk crashes,
>
> neither did we, and google is the same opinion.
>
> One thing that i've found is the mergeFactor value:
>
> http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
>
> Our sysadmin speculates that maybe the chunk size of our raid/harddisks
> and the segment size of the lucene index does not play well together.
>
> Does the lucene segment size affect how the data is written to the disk?
>
>
> thanks for your help.
>
>
> -robert
>
>
>
>
>
>
>
>>
>> Best
>> Erick
>>
>> 2010/12/2 Robert Gründler <ro...@dubture.com>
>>
>>> Hi,
>>>
>>> we have a serious harddisk problem, and it's definitely related to a
>>> full-import from a relational
>>> database into a solr index.
>>>
>>> The first time it happened on our development server, where the
>>> raidcontroller crashed during a full-import
>>> of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
>>> of the harddisks where the solr
>>> index files are located stopped working (we needed to replace them).
>>>
>>> After the crash of the raid controller, we decided to move the development
>>> of solr/index related stuff to our
>>> local development machines.
>>>
>>> Yesterday i was running another full-import of ~10 Million documents on my
>>> local development machine,
>>> and during the import, a harddisk failure occurred. Since this failure, my
>>> harddisk activity seems to
>>> be around 100% all the time, even if no solr server is running at all.
>>>
>>> I've been googling the last 2 days to find some info about solr related
>>> harddisk problems, but i didn't find anything
>>> useful.
>>>
>>> Are there any steps we need to take care of in respect to harddisk failures
>>> when doing a full-import? Right now,
>>> our steps look like this:
>>>
>>> 1. Delete the current index
>>> 2. Restart solr, to load the updated schemas
>>> 3. Start the full import
>>>
>>> Initially, the solr index and the relational database were located on the
>>> same harddisk. After the crash, we moved
>>> the index to a separate harddisk, but nevertheless this harddisk crashed
>>> too.
>>>
>>> I'd really appreciate any hints on what we might do wrong when importing
>>> data, as we can't release this
>>> on our production servers when there's the risk of harddisk failures.
>>>
>>>
>>> thanks.
>>>
>>>
>>> -robert
>>>
>>>
>>>
>>>
>>>
>>>
>
>

Re: Dataimport destroys our harddisks

Posted by Robert Gründler <ro...@dubture.com>.
> The very first thing I'd ask is "how much free space is on your disk
> when this occurs?" Is it possible that you're simply filling up your
> disk?

no, i've checked that already. all disks have plenty of space (they have
a capacity of 2TB, and are currently filled up to 20%.

> 
> do note that an optimize may require up to 2X the size of your index
> if/when it occurs. Are you sure you aren't optimizing as you add
> items to your index?
> 

index size is not a problem in our case. Our index currently has about 3GB.

What do you mean with "optimizing as you add items to your index"? 

> But I've never heard of Solr causing hard disk crashes,

neither did we, and google is the same opinion. 

One thing that i've found is the mergeFactor value:

http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

Our sysadmin speculates that maybe the chunk size of our raid/harddisks
and the segment size of the lucene index does not play well together.

Does the lucene segment size affect how the data is written to the disk?


thanks for your help.


-robert







> 
> Best
> Erick
> 
> 2010/12/2 Robert Gründler <ro...@dubture.com>
> 
>> Hi,
>> 
>> we have a serious harddisk problem, and it's definitely related to a
>> full-import from a relational
>> database into a solr index.
>> 
>> The first time it happened on our development server, where the
>> raidcontroller crashed during a full-import
>> of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
>> of the harddisks where the solr
>> index files are located stopped working (we needed to replace them).
>> 
>> After the crash of the raid controller, we decided to move the development
>> of solr/index related stuff to our
>> local development machines.
>> 
>> Yesterday i was running another full-import of ~10 Million documents on my
>> local development machine,
>> and during the import, a harddisk failure occurred. Since this failure, my
>> harddisk activity seems to
>> be around 100% all the time, even if no solr server is running at all.
>> 
>> I've been googling the last 2 days to find some info about solr related
>> harddisk problems, but i didn't find anything
>> useful.
>> 
>> Are there any steps we need to take care of in respect to harddisk failures
>> when doing a full-import? Right now,
>> our steps look like this:
>> 
>> 1. Delete the current index
>> 2. Restart solr, to load the updated schemas
>> 3. Start the full import
>> 
>> Initially, the solr index and the relational database were located on the
>> same harddisk. After the crash, we moved
>> the index to a separate harddisk, but nevertheless this harddisk crashed
>> too.
>> 
>> I'd really appreciate any hints on what we might do wrong when importing
>> data, as we can't release this
>> on our production servers when there's the risk of harddisk failures.
>> 
>> 
>> thanks.
>> 
>> 
>> -robert
>> 
>> 
>> 
>> 
>> 
>> 


Re: Dataimport destroys our harddisks

Posted by Erick Erickson <er...@gmail.com>.
The very first thing I'd ask is "how much free space is on your disk
when this occurs?" Is it possible that you're simply filling up your
disk?

do note that an optimize may require up to 2X the size of your index
if/when it occurs. Are you sure you aren't optimizing as you add
items to your index?

But I've never heard of Solr causing hard disk crashes, it doesn't do
anything special but read/write...

Best
Erick

2010/12/2 Robert Gründler <ro...@dubture.com>

> Hi,
>
> we have a serious harddisk problem, and it's definitely related to a
> full-import from a relational
> database into a solr index.
>
> The first time it happened on our development server, where the
> raidcontroller crashed during a full-import
> of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2
> of the harddisks where the solr
> index files are located stopped working (we needed to replace them).
>
> After the crash of the raid controller, we decided to move the development
> of solr/index related stuff to our
> local development machines.
>
> Yesterday i was running another full-import of ~10 Million documents on my
> local development machine,
> and during the import, a harddisk failure occurred. Since this failure, my
> harddisk activity seems to
> be around 100% all the time, even if no solr server is running at all.
>
> I've been googling the last 2 days to find some info about solr related
> harddisk problems, but i didn't find anything
> useful.
>
> Are there any steps we need to take care of in respect to harddisk failures
> when doing a full-import? Right now,
> our steps look like this:
>
> 1. Delete the current index
> 2. Restart solr, to load the updated schemas
> 3. Start the full import
>
> Initially, the solr index and the relational database were located on the
> same harddisk. After the crash, we moved
> the index to a separate harddisk, but nevertheless this harddisk crashed
> too.
>
> I'd really appreciate any hints on what we might do wrong when importing
> data, as we can't release this
> on our production servers when there's the risk of harddisk failures.
>
>
> thanks.
>
>
> -robert
>
>
>
>
>
>