You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Alaa Zubaidi (PDF)" <al...@pdf.com> on 2016/08/12 01:20:00 UTC

Corrupt SSTABLE over and over

Hi,

I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
(NOT on the cloud)

and I am getting
Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
org.apache.cassandra.io.FSReaderError:
org.apache.cassandra.io.sstable.CorruptSSTableExecption:
org.apache.cassandra.io.compress.CurrptBlockException:
(E:\........\la-4886-big-Data.db): corruption detected, chunk at 4969092 of
length 10208.
    at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:357)
~[apache-cassandra-2.2.1.jar:2.2.1]
....
....
ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
forcefully due to file system exception on startup, disk failure policy
"stop"

I tried sstablescrub but it crashed with hs-err-pid-...
I removed the corrupted file and started the Node again, after one day the
corruption came back again, I removed the files, and restarted Cassandra,
it worked for few days, then I ran "nodetool repair" after it finished,
Cassandra failed again but with commitlog corruption, after removing the
commitlog files, it failed again with another sstable corruption.

I was also checking the HW, file system, and memory, the VMware logs showed
no HW error, also the HW management logs showed NO problems or issues.
Also checked the Windows Logs (Application and System) the only thing I
found is on the system logs "Cassandra Service terminated with
service-specific error Cannot create another system semaphore.

I could not find any thing regarding that error, all comments point to
application log.

Any help is appreciated..

-- 

Alaa Zubaidi

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: Corrupt SSTABLE over and over

Posted by Kai Wang <de...@gmail.com>.
This might not be good news to you. But my experience is that C*
2.X/Windows is not ready for production yet. I've seen various file system
related errors. And in one of the JIRAs I was told major work (or rework)
is done in 3.X to improve C* stability on Windows.

On Tue, Aug 16, 2016 at 3:44 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> Hi Alaa,
>
> Sounds like you have problems that go beyond Cassandra- likely filesystem
> corruption or bad disks. I don't know enough about Windows to give you any
> specific advice but I'd try a run of chkdsk to start.
>
> --Bryan
>
> On Fri, Aug 12, 2016 at 5:19 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
> wrote:
>
>> Hi Bryan,
>>
>> Changing disk_failure_policy to best_effort, and running nodetool scrub,
>> did not work, it generated another error:
>> java.nio.file.AccessDeniedException
>>
>> Also tried to remove all files (data, commitlog, savedcaches) and restart
>> the node fresh, and still I am getting corruption.
>>
>> and Still nothing that indicate there is a HW issue?
>> All other nodes are fine
>>
>> Regards,
>> Alaa
>>
>>
>> On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <br...@blockcypher.com>
>> wrote:
>>
>>> Should also add that if the scope of corruption is _very_ large, and you
>>> have a good, aggressive repair policy (read: you are confident in the
>>> consistency of the data elsewhere in the cluster), you may just want to
>>> decommission and rebuild that node.
>>>
>>> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <br...@blockcypher.com>
>>> wrote:
>>>
>>>> Looks like you're doing the offline scrub- have you tried online?
>>>>
>>>> Here's my typical process for corrupt SSTables.
>>>>
>>>> With disk_failure_policy set to stop, examine the failing sstables. If
>>>> they are very small (in the range of kbs), it is unlikely that there is any
>>>> salvageable data there. Just delete them, start the machine, and schedule a
>>>> repair ASAP.
>>>>
>>>> If they are large, then it may be worth salvaging. If the scope of
>>>> corruption is reasonable (limited to a few sstables scattered among
>>>> different keyspaces), set disk_failure_policy to best_effort, start the
>>>> machine up, and run the nodetool scrub. This is online scrub, faster than
>>>> offline scrub (at least of 2.1.12, the last time I had to do this).
>>>>
>>>> Only if all else fails, attempt the very painful offline sstablescrub.
>>>>
>>>> Is the VMWare client Windows? (Trying to make sure its not just the
>>>> host). YMMV but in the past Windows was somewhat of a neglected platform
>>>> wrt Cassandra. I think you'd have a lot easier time getting help if running
>>>> Linux is an option here.
>>>>
>>>>
>>>>
>>>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <
>>>> alaa.zubaidi@pdf.com> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Thanks for your input...
>>>>> Thats what I am afraid of?
>>>>> Did you find any HW error in the VMware and HW logs? any indication
>>>>> that the HW is the reason? I need to make sure that this is the reason
>>>>> before asking the customer to spend more money?
>>>>>
>>>>> Thanks,
>>>>> Alaa
>>>>>
>>>>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <pe...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> cassandra run on virtual server (vmware)?
>>>>>>
>>>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>>>> maybe try with larger heap allocated to sstablescrub
>>>>>>
>>>>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>>>>>> try nodetool scrub, still persist, then offline sstablescrub still
>>>>>> persist, wipe the node and it happen again, then i change the hardware
>>>>>> (disk and mem). things went good.
>>>>>>
>>>>>> hth
>>>>>>
>>>>>> jason
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>>>>> <al...@pdf.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>>>>> installation
>>>>>> > (NOT on the cloud)
>>>>>> >
>>>>>> > and I am getting
>>>>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>>>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>>>>> ain]
>>>>>> > org.apache.cassandra.io.FSReaderError:
>>>>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>>>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>>>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
>>>>>> 4969092 of
>>>>>> > length 10208.
>>>>>> >     at
>>>>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>>>>> ndomAccessReader.java:357)
>>>>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>>>>> > ....
>>>>>> > ....
>>>>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>>>>>> > forcefully due to file system exception on startup, disk failure
>>>>>> policy
>>>>>> > "stop"
>>>>>> >
>>>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>>>> > I removed the corrupted file and started the Node again, after one
>>>>>> day the
>>>>>> > corruption came back again, I removed the files, and restarted
>>>>>> Cassandra, it
>>>>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>>>>> > Cassandra failed again but with commitlog corruption, after
>>>>>> removing the
>>>>>> > commitlog files, it failed again with another sstable corruption.
>>>>>> >
>>>>>> > I was also checking the HW, file system, and memory, the VMware
>>>>>> logs showed
>>>>>> > no HW error, also the HW management logs showed NO problems or
>>>>>> issues.
>>>>>> > Also checked the Windows Logs (Application and System) the only
>>>>>> thing I
>>>>>> > found is on the system logs "Cassandra Service terminated with
>>>>>> > service-specific error Cannot create another system semaphore.
>>>>>> >
>>>>>> > I could not find any thing regarding that error, all comments point
>>>>>> to
>>>>>> > application log.
>>>>>> >
>>>>>> > Any help is appreciated..
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > Alaa Zubaidi
>>>>>> >
>>>>>> >
>>>>>> > This message may contain confidential and privileged information.
>>>>>> If it has
>>>>>> > been sent to you in error, please reply to advise the sender of the
>>>>>> error
>>>>>> > and then immediately permanently delete it and all attachments to
>>>>>> it from
>>>>>> > your systems. If you are not the intended recipient, do not read,
>>>>>> copy,
>>>>>> > disclose or otherwise use this message or any attachments to it.
>>>>>> The sender
>>>>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that
>>>>>> all
>>>>>> > incoming e-mails sent to PDF e-mail accounts will be archived and
>>>>>> may be
>>>>>> > scanned by us and/or by external service providers to detect and
>>>>>> prevent
>>>>>> > threats to our systems, investigate illegal or inappropriate
>>>>>> behavior,
>>>>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you
>>>>>> have any
>>>>>> > concerns about this process, please contact us at
>>>>>> legal.department@pdf.com.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Alaa Zubaidi
>>>>> PDF Solutions, Inc.
>>>>> 333 West San Carlos Street, Suite 1000
>>>>> San Jose, CA 95110  USA
>>>>> Tel: 408-283-5639
>>>>> fax: 408-938-6479
>>>>> email: alaa.zubaidi@pdf.com
>>>>>
>>>>>
>>>>> *This message may contain confidential and privileged information. If
>>>>> it has been sent to you in error, please reply to advise the sender of the
>>>>> error and then immediately permanently delete it and all attachments to it
>>>>> from your systems. If you are not the intended recipient, do not read,
>>>>> copy, disclose or otherwise use this message or any attachments to it. The
>>>>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>>>>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>>>>> be scanned by us and/or by external service providers to detect and prevent
>>>>> threats to our systems, investigate illegal or inappropriate behavior,
>>>>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>>>>> concerns about this process, please contact us at *
>>>>> *legal.department@pdf.com* <le...@pdf.com>*.*
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Alaa Zubaidi
>> PDF Solutions, Inc.
>> 333 West San Carlos Street, Suite 1000
>> San Jose, CA 95110  USA
>> Tel: 408-283-5639
>> fax: 408-938-6479
>> email: alaa.zubaidi@pdf.com
>>
>>
>> *This message may contain confidential and privileged information. If it
>> has been sent to you in error, please reply to advise the sender of the
>> error and then immediately permanently delete it and all attachments to it
>> from your systems. If you are not the intended recipient, do not read,
>> copy, disclose or otherwise use this message or any attachments to it. The
>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>> be scanned by us and/or by external service providers to detect and prevent
>> threats to our systems, investigate illegal or inappropriate behavior,
>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>> concerns about this process, please contact us at *
>> *legal.department@pdf.com* <le...@pdf.com>*.*
>>
>
>

Re: Corrupt SSTABLE over and over

Posted by Bryan Cheng <br...@blockcypher.com>.
Hi Alaa,

Sounds like you have problems that go beyond Cassandra- likely filesystem
corruption or bad disks. I don't know enough about Windows to give you any
specific advice but I'd try a run of chkdsk to start.

--Bryan

On Fri, Aug 12, 2016 at 5:19 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:

> Hi Bryan,
>
> Changing disk_failure_policy to best_effort, and running nodetool scrub,
> did not work, it generated another error:
> java.nio.file.AccessDeniedException
>
> Also tried to remove all files (data, commitlog, savedcaches) and restart
> the node fresh, and still I am getting corruption.
>
> and Still nothing that indicate there is a HW issue?
> All other nodes are fine
>
> Regards,
> Alaa
>
>
> On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <br...@blockcypher.com>
> wrote:
>
>> Should also add that if the scope of corruption is _very_ large, and you
>> have a good, aggressive repair policy (read: you are confident in the
>> consistency of the data elsewhere in the cluster), you may just want to
>> decommission and rebuild that node.
>>
>> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <br...@blockcypher.com>
>> wrote:
>>
>>> Looks like you're doing the offline scrub- have you tried online?
>>>
>>> Here's my typical process for corrupt SSTables.
>>>
>>> With disk_failure_policy set to stop, examine the failing sstables. If
>>> they are very small (in the range of kbs), it is unlikely that there is any
>>> salvageable data there. Just delete them, start the machine, and schedule a
>>> repair ASAP.
>>>
>>> If they are large, then it may be worth salvaging. If the scope of
>>> corruption is reasonable (limited to a few sstables scattered among
>>> different keyspaces), set disk_failure_policy to best_effort, start the
>>> machine up, and run the nodetool scrub. This is online scrub, faster than
>>> offline scrub (at least of 2.1.12, the last time I had to do this).
>>>
>>> Only if all else fails, attempt the very painful offline sstablescrub.
>>>
>>> Is the VMWare client Windows? (Trying to make sure its not just the
>>> host). YMMV but in the past Windows was somewhat of a neglected platform
>>> wrt Cassandra. I think you'd have a lot easier time getting help if running
>>> Linux is an option here.
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <
>>> alaa.zubaidi@pdf.com> wrote:
>>>
>>>> Hi Jason,
>>>>
>>>> Thanks for your input...
>>>> Thats what I am afraid of?
>>>> Did you find any HW error in the VMware and HW logs? any indication
>>>> that the HW is the reason? I need to make sure that this is the reason
>>>> before asking the customer to spend more money?
>>>>
>>>> Thanks,
>>>> Alaa
>>>>
>>>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <pe...@gmail.com> wrote:
>>>>
>>>>> cassandra run on virtual server (vmware)?
>>>>>
>>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>>> maybe try with larger heap allocated to sstablescrub
>>>>>
>>>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>>>>> try nodetool scrub, still persist, then offline sstablescrub still
>>>>> persist, wipe the node and it happen again, then i change the hardware
>>>>> (disk and mem). things went good.
>>>>>
>>>>> hth
>>>>>
>>>>> jason
>>>>>
>>>>>
>>>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>>>> <al...@pdf.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>>>> installation
>>>>> > (NOT on the cloud)
>>>>> >
>>>>> > and I am getting
>>>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>>>> ain]
>>>>> > org.apache.cassandra.io.FSReaderError:
>>>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
>>>>> 4969092 of
>>>>> > length 10208.
>>>>> >     at
>>>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>>>> ndomAccessReader.java:357)
>>>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>>>> > ....
>>>>> > ....
>>>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>>>>> > forcefully due to file system exception on startup, disk failure
>>>>> policy
>>>>> > "stop"
>>>>> >
>>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>>> > I removed the corrupted file and started the Node again, after one
>>>>> day the
>>>>> > corruption came back again, I removed the files, and restarted
>>>>> Cassandra, it
>>>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>>>> > Cassandra failed again but with commitlog corruption, after removing
>>>>> the
>>>>> > commitlog files, it failed again with another sstable corruption.
>>>>> >
>>>>> > I was also checking the HW, file system, and memory, the VMware logs
>>>>> showed
>>>>> > no HW error, also the HW management logs showed NO problems or
>>>>> issues.
>>>>> > Also checked the Windows Logs (Application and System) the only
>>>>> thing I
>>>>> > found is on the system logs "Cassandra Service terminated with
>>>>> > service-specific error Cannot create another system semaphore.
>>>>> >
>>>>> > I could not find any thing regarding that error, all comments point
>>>>> to
>>>>> > application log.
>>>>> >
>>>>> > Any help is appreciated..
>>>>> >
>>>>> > --
>>>>> >
>>>>> > Alaa Zubaidi
>>>>> >
>>>>> >
>>>>> > This message may contain confidential and privileged information. If
>>>>> it has
>>>>> > been sent to you in error, please reply to advise the sender of the
>>>>> error
>>>>> > and then immediately permanently delete it and all attachments to it
>>>>> from
>>>>> > your systems. If you are not the intended recipient, do not read,
>>>>> copy,
>>>>> > disclose or otherwise use this message or any attachments to it. The
>>>>> sender
>>>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that
>>>>> all
>>>>> > incoming e-mails sent to PDF e-mail accounts will be archived and
>>>>> may be
>>>>> > scanned by us and/or by external service providers to detect and
>>>>> prevent
>>>>> > threats to our systems, investigate illegal or inappropriate
>>>>> behavior,
>>>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you
>>>>> have any
>>>>> > concerns about this process, please contact us at
>>>>> legal.department@pdf.com.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Alaa Zubaidi
>>>> PDF Solutions, Inc.
>>>> 333 West San Carlos Street, Suite 1000
>>>> San Jose, CA 95110  USA
>>>> Tel: 408-283-5639
>>>> fax: 408-938-6479
>>>> email: alaa.zubaidi@pdf.com
>>>>
>>>>
>>>> *This message may contain confidential and privileged information. If
>>>> it has been sent to you in error, please reply to advise the sender of the
>>>> error and then immediately permanently delete it and all attachments to it
>>>> from your systems. If you are not the intended recipient, do not read,
>>>> copy, disclose or otherwise use this message or any attachments to it. The
>>>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>>>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>>>> be scanned by us and/or by external service providers to detect and prevent
>>>> threats to our systems, investigate illegal or inappropriate behavior,
>>>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>>>> concerns about this process, please contact us at *
>>>> *legal.department@pdf.com* <le...@pdf.com>*.*
>>>>
>>>
>>>
>>
>
>
> --
>
> Alaa Zubaidi
> PDF Solutions, Inc.
> 333 West San Carlos Street, Suite 1000
> San Jose, CA 95110  USA
> Tel: 408-283-5639
> fax: 408-938-6479
> email: alaa.zubaidi@pdf.com
>
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.department@pdf.com* <le...@pdf.com>*.*
>

Re: Corrupt SSTABLE over and over

Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
Hi Bryan,

Changing disk_failure_policy to best_effort, and running nodetool scrub,
did not work, it generated another error:
java.nio.file.AccessDeniedException

Also tried to remove all files (data, commitlog, savedcaches) and restart
the node fresh, and still I am getting corruption.

and Still nothing that indicate there is a HW issue?
All other nodes are fine

Regards,
Alaa


On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <br...@blockcypher.com> wrote:

> Should also add that if the scope of corruption is _very_ large, and you
> have a good, aggressive repair policy (read: you are confident in the
> consistency of the data elsewhere in the cluster), you may just want to
> decommission and rebuild that node.
>
> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <br...@blockcypher.com>
> wrote:
>
>> Looks like you're doing the offline scrub- have you tried online?
>>
>> Here's my typical process for corrupt SSTables.
>>
>> With disk_failure_policy set to stop, examine the failing sstables. If
>> they are very small (in the range of kbs), it is unlikely that there is any
>> salvageable data there. Just delete them, start the machine, and schedule a
>> repair ASAP.
>>
>> If they are large, then it may be worth salvaging. If the scope of
>> corruption is reasonable (limited to a few sstables scattered among
>> different keyspaces), set disk_failure_policy to best_effort, start the
>> machine up, and run the nodetool scrub. This is online scrub, faster than
>> offline scrub (at least of 2.1.12, the last time I had to do this).
>>
>> Only if all else fails, attempt the very painful offline sstablescrub.
>>
>> Is the VMWare client Windows? (Trying to make sure its not just the
>> host). YMMV but in the past Windows was somewhat of a neglected platform
>> wrt Cassandra. I think you'd have a lot easier time getting help if running
>> Linux is an option here.
>>
>>
>>
>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <alaa.zubaidi@pdf.com
>> > wrote:
>>
>>> Hi Jason,
>>>
>>> Thanks for your input...
>>> Thats what I am afraid of?
>>> Did you find any HW error in the VMware and HW logs? any indication that
>>> the HW is the reason? I need to make sure that this is the reason before
>>> asking the customer to spend more money?
>>>
>>> Thanks,
>>> Alaa
>>>
>>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <pe...@gmail.com> wrote:
>>>
>>>> cassandra run on virtual server (vmware)?
>>>>
>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>> maybe try with larger heap allocated to sstablescrub
>>>>
>>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>>>> try nodetool scrub, still persist, then offline sstablescrub still
>>>> persist, wipe the node and it happen again, then i change the hardware
>>>> (disk and mem). things went good.
>>>>
>>>> hth
>>>>
>>>> jason
>>>>
>>>>
>>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>>> <al...@pdf.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>>> installation
>>>> > (NOT on the cloud)
>>>> >
>>>> > and I am getting
>>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>>> ain]
>>>> > org.apache.cassandra.io.FSReaderError:
>>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
>>>> 4969092 of
>>>> > length 10208.
>>>> >     at
>>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>>> ndomAccessReader.java:357)
>>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>>> > ....
>>>> > ....
>>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>>>> > forcefully due to file system exception on startup, disk failure
>>>> policy
>>>> > "stop"
>>>> >
>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>> > I removed the corrupted file and started the Node again, after one
>>>> day the
>>>> > corruption came back again, I removed the files, and restarted
>>>> Cassandra, it
>>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>>> > Cassandra failed again but with commitlog corruption, after removing
>>>> the
>>>> > commitlog files, it failed again with another sstable corruption.
>>>> >
>>>> > I was also checking the HW, file system, and memory, the VMware logs
>>>> showed
>>>> > no HW error, also the HW management logs showed NO problems or issues.
>>>> > Also checked the Windows Logs (Application and System) the only thing
>>>> I
>>>> > found is on the system logs "Cassandra Service terminated with
>>>> > service-specific error Cannot create another system semaphore.
>>>> >
>>>> > I could not find any thing regarding that error, all comments point to
>>>> > application log.
>>>> >
>>>> > Any help is appreciated..
>>>> >
>>>> > --
>>>> >
>>>> > Alaa Zubaidi
>>>> >
>>>> >
>>>> > This message may contain confidential and privileged information. If
>>>> it has
>>>> > been sent to you in error, please reply to advise the sender of the
>>>> error
>>>> > and then immediately permanently delete it and all attachments to it
>>>> from
>>>> > your systems. If you are not the intended recipient, do not read,
>>>> copy,
>>>> > disclose or otherwise use this message or any attachments to it. The
>>>> sender
>>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that
>>>> all
>>>> > incoming e-mails sent to PDF e-mail accounts will be archived and may
>>>> be
>>>> > scanned by us and/or by external service providers to detect and
>>>> prevent
>>>> > threats to our systems, investigate illegal or inappropriate behavior,
>>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you
>>>> have any
>>>> > concerns about this process, please contact us at
>>>> legal.department@pdf.com.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Alaa Zubaidi
>>> PDF Solutions, Inc.
>>> 333 West San Carlos Street, Suite 1000
>>> San Jose, CA 95110  USA
>>> Tel: 408-283-5639
>>> fax: 408-938-6479
>>> email: alaa.zubaidi@pdf.com
>>>
>>>
>>> *This message may contain confidential and privileged information. If it
>>> has been sent to you in error, please reply to advise the sender of the
>>> error and then immediately permanently delete it and all attachments to it
>>> from your systems. If you are not the intended recipient, do not read,
>>> copy, disclose or otherwise use this message or any attachments to it. The
>>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>>> be scanned by us and/or by external service providers to detect and prevent
>>> threats to our systems, investigate illegal or inappropriate behavior,
>>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>>> concerns about this process, please contact us at *
>>> *legal.department@pdf.com* <le...@pdf.com>*.*
>>>
>>
>>
>


-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: Corrupt SSTABLE over and over

Posted by Bryan Cheng <br...@blockcypher.com>.
Should also add that if the scope of corruption is _very_ large, and you
have a good, aggressive repair policy (read: you are confident in the
consistency of the data elsewhere in the cluster), you may just want to
decommission and rebuild that node.

On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> Looks like you're doing the offline scrub- have you tried online?
>
> Here's my typical process for corrupt SSTables.
>
> With disk_failure_policy set to stop, examine the failing sstables. If
> they are very small (in the range of kbs), it is unlikely that there is any
> salvageable data there. Just delete them, start the machine, and schedule a
> repair ASAP.
>
> If they are large, then it may be worth salvaging. If the scope of
> corruption is reasonable (limited to a few sstables scattered among
> different keyspaces), set disk_failure_policy to best_effort, start the
> machine up, and run the nodetool scrub. This is online scrub, faster than
> offline scrub (at least of 2.1.12, the last time I had to do this).
>
> Only if all else fails, attempt the very painful offline sstablescrub.
>
> Is the VMWare client Windows? (Trying to make sure its not just the host).
> YMMV but in the past Windows was somewhat of a neglected platform wrt
> Cassandra. I think you'd have a lot easier time getting help if running
> Linux is an option here.
>
>
>
> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <al...@pdf.com>
> wrote:
>
>> Hi Jason,
>>
>> Thanks for your input...
>> Thats what I am afraid of?
>> Did you find any HW error in the VMware and HW logs? any indication that
>> the HW is the reason? I need to make sure that this is the reason before
>> asking the customer to spend more money?
>>
>> Thanks,
>> Alaa
>>
>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <pe...@gmail.com> wrote:
>>
>>> cassandra run on virtual server (vmware)?
>>>
>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>> maybe try with larger heap allocated to sstablescrub
>>>
>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>>> try nodetool scrub, still persist, then offline sstablescrub still
>>> persist, wipe the node and it happen again, then i change the hardware
>>> (disk and mem). things went good.
>>>
>>> hth
>>>
>>> jason
>>>
>>>
>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>> <al...@pdf.com> wrote:
>>> > Hi,
>>> >
>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>> installation
>>> > (NOT on the cloud)
>>> >
>>> > and I am getting
>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>> ain]
>>> > org.apache.cassandra.io.FSReaderError:
>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
>>> 4969092 of
>>> > length 10208.
>>> >     at
>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>> ndomAccessReader.java:357)
>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>> > ....
>>> > ....
>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>>> > forcefully due to file system exception on startup, disk failure policy
>>> > "stop"
>>> >
>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>> > I removed the corrupted file and started the Node again, after one day
>>> the
>>> > corruption came back again, I removed the files, and restarted
>>> Cassandra, it
>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>> > Cassandra failed again but with commitlog corruption, after removing
>>> the
>>> > commitlog files, it failed again with another sstable corruption.
>>> >
>>> > I was also checking the HW, file system, and memory, the VMware logs
>>> showed
>>> > no HW error, also the HW management logs showed NO problems or issues.
>>> > Also checked the Windows Logs (Application and System) the only thing I
>>> > found is on the system logs "Cassandra Service terminated with
>>> > service-specific error Cannot create another system semaphore.
>>> >
>>> > I could not find any thing regarding that error, all comments point to
>>> > application log.
>>> >
>>> > Any help is appreciated..
>>> >
>>> > --
>>> >
>>> > Alaa Zubaidi
>>> >
>>> >
>>> > This message may contain confidential and privileged information. If
>>> it has
>>> > been sent to you in error, please reply to advise the sender of the
>>> error
>>> > and then immediately permanently delete it and all attachments to it
>>> from
>>> > your systems. If you are not the intended recipient, do not read, copy,
>>> > disclose or otherwise use this message or any attachments to it. The
>>> sender
>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that all
>>> > incoming e-mails sent to PDF e-mail accounts will be archived and may
>>> be
>>> > scanned by us and/or by external service providers to detect and
>>> prevent
>>> > threats to our systems, investigate illegal or inappropriate behavior,
>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you have
>>> any
>>> > concerns about this process, please contact us at
>>> legal.department@pdf.com.
>>>
>>
>>
>>
>> --
>>
>> Alaa Zubaidi
>> PDF Solutions, Inc.
>> 333 West San Carlos Street, Suite 1000
>> San Jose, CA 95110  USA
>> Tel: 408-283-5639
>> fax: 408-938-6479
>> email: alaa.zubaidi@pdf.com
>>
>>
>> *This message may contain confidential and privileged information. If it
>> has been sent to you in error, please reply to advise the sender of the
>> error and then immediately permanently delete it and all attachments to it
>> from your systems. If you are not the intended recipient, do not read,
>> copy, disclose or otherwise use this message or any attachments to it. The
>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>> be scanned by us and/or by external service providers to detect and prevent
>> threats to our systems, investigate illegal or inappropriate behavior,
>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>> concerns about this process, please contact us at *
>> *legal.department@pdf.com* <le...@pdf.com>*.*
>>
>
>

Re: Corrupt SSTABLE over and over

Posted by Bryan Cheng <br...@blockcypher.com>.
Looks like you're doing the offline scrub- have you tried online?

Here's my typical process for corrupt SSTables.

With disk_failure_policy set to stop, examine the failing sstables. If they
are very small (in the range of kbs), it is unlikely that there is any
salvageable data there. Just delete them, start the machine, and schedule a
repair ASAP.

If they are large, then it may be worth salvaging. If the scope of
corruption is reasonable (limited to a few sstables scattered among
different keyspaces), set disk_failure_policy to best_effort, start the
machine up, and run the nodetool scrub. This is online scrub, faster than
offline scrub (at least of 2.1.12, the last time I had to do this).

Only if all else fails, attempt the very painful offline sstablescrub.

Is the VMWare client Windows? (Trying to make sure its not just the host).
YMMV but in the past Windows was somewhat of a neglected platform wrt
Cassandra. I think you'd have a lot easier time getting help if running
Linux is an option here.



On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:

> Hi Jason,
>
> Thanks for your input...
> Thats what I am afraid of?
> Did you find any HW error in the VMware and HW logs? any indication that
> the HW is the reason? I need to make sure that this is the reason before
> asking the customer to spend more money?
>
> Thanks,
> Alaa
>
> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <pe...@gmail.com> wrote:
>
>> cassandra run on virtual server (vmware)?
>>
>> > I tried sstablescrub but it crashed with hs-err-pid-...
>> maybe try with larger heap allocated to sstablescrub
>>
>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>> try nodetool scrub, still persist, then offline sstablescrub still
>> persist, wipe the node and it happen again, then i change the hardware
>> (disk and mem). things went good.
>>
>> hth
>>
>> jason
>>
>>
>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>> <al...@pdf.com> wrote:
>> > Hi,
>> >
>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
>> > (NOT on the cloud)
>> >
>> > and I am getting
>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>> ain]
>> > org.apache.cassandra.io.FSReaderError:
>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>> > org.apache.cassandra.io.compress.CurrptBlockException:
>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
>> 4969092 of
>> > length 10208.
>> >     at
>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>> ndomAccessReader.java:357)
>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>> > ....
>> > ....
>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>> > forcefully due to file system exception on startup, disk failure policy
>> > "stop"
>> >
>> > I tried sstablescrub but it crashed with hs-err-pid-...
>> > I removed the corrupted file and started the Node again, after one day
>> the
>> > corruption came back again, I removed the files, and restarted
>> Cassandra, it
>> > worked for few days, then I ran "nodetool repair" after it finished,
>> > Cassandra failed again but with commitlog corruption, after removing the
>> > commitlog files, it failed again with another sstable corruption.
>> >
>> > I was also checking the HW, file system, and memory, the VMware logs
>> showed
>> > no HW error, also the HW management logs showed NO problems or issues.
>> > Also checked the Windows Logs (Application and System) the only thing I
>> > found is on the system logs "Cassandra Service terminated with
>> > service-specific error Cannot create another system semaphore.
>> >
>> > I could not find any thing regarding that error, all comments point to
>> > application log.
>> >
>> > Any help is appreciated..
>> >
>> > --
>> >
>> > Alaa Zubaidi
>> >
>> >
>> > This message may contain confidential and privileged information. If it
>> has
>> > been sent to you in error, please reply to advise the sender of the
>> error
>> > and then immediately permanently delete it and all attachments to it
>> from
>> > your systems. If you are not the intended recipient, do not read, copy,
>> > disclose or otherwise use this message or any attachments to it. The
>> sender
>> > disclaims any liability for such unauthorized use. PLEASE NOTE that all
>> > incoming e-mails sent to PDF e-mail accounts will be archived and may be
>> > scanned by us and/or by external service providers to detect and prevent
>> > threats to our systems, investigate illegal or inappropriate behavior,
>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you have
>> any
>> > concerns about this process, please contact us at
>> legal.department@pdf.com.
>>
>
>
>
> --
>
> Alaa Zubaidi
> PDF Solutions, Inc.
> 333 West San Carlos Street, Suite 1000
> San Jose, CA 95110  USA
> Tel: 408-283-5639
> fax: 408-938-6479
> email: alaa.zubaidi@pdf.com
>
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.department@pdf.com* <le...@pdf.com>*.*
>

Re: Corrupt SSTABLE over and over

Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
Hi Jason,

Thanks for your input...
Thats what I am afraid of?
Did you find any HW error in the VMware and HW logs? any indication that
the HW is the reason? I need to make sure that this is the reason before
asking the customer to spend more money?

Thanks,
Alaa

On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <pe...@gmail.com> wrote:

> cassandra run on virtual server (vmware)?
>
> > I tried sstablescrub but it crashed with hs-err-pid-...
> maybe try with larger heap allocated to sstablescrub
>
> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
> try nodetool scrub, still persist, then offline sstablescrub still
> persist, wipe the node and it happen again, then i change the hardware
> (disk and mem). things went good.
>
> hth
>
> jason
>
>
> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
> <al...@pdf.com> wrote:
> > Hi,
> >
> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
> > (NOT on the cloud)
> >
> > and I am getting
> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
> > org.apache.cassandra.io.FSReaderError:
> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
> > org.apache.cassandra.io.compress.CurrptBlockException:
> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
> 4969092 of
> > length 10208.
> >     at
> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(
> RandomAccessReader.java:357)
> > ~[apache-cassandra-2.2.1.jar:2.2.1]
> > ....
> > ....
> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
> > forcefully due to file system exception on startup, disk failure policy
> > "stop"
> >
> > I tried sstablescrub but it crashed with hs-err-pid-...
> > I removed the corrupted file and started the Node again, after one day
> the
> > corruption came back again, I removed the files, and restarted
> Cassandra, it
> > worked for few days, then I ran "nodetool repair" after it finished,
> > Cassandra failed again but with commitlog corruption, after removing the
> > commitlog files, it failed again with another sstable corruption.
> >
> > I was also checking the HW, file system, and memory, the VMware logs
> showed
> > no HW error, also the HW management logs showed NO problems or issues.
> > Also checked the Windows Logs (Application and System) the only thing I
> > found is on the system logs "Cassandra Service terminated with
> > service-specific error Cannot create another system semaphore.
> >
> > I could not find any thing regarding that error, all comments point to
> > application log.
> >
> > Any help is appreciated..
> >
> > --
> >
> > Alaa Zubaidi
> >
> >
> > This message may contain confidential and privileged information. If it
> has
> > been sent to you in error, please reply to advise the sender of the error
> > and then immediately permanently delete it and all attachments to it from
> > your systems. If you are not the intended recipient, do not read, copy,
> > disclose or otherwise use this message or any attachments to it. The
> sender
> > disclaims any liability for such unauthorized use. PLEASE NOTE that all
> > incoming e-mails sent to PDF e-mail accounts will be archived and may be
> > scanned by us and/or by external service providers to detect and prevent
> > threats to our systems, investigate illegal or inappropriate behavior,
> > and/or eliminate unsolicited promotional e-mails (“spam”). If you have
> any
> > concerns about this process, please contact us at
> legal.department@pdf.com.
>



-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: Corrupt SSTABLE over and over

Posted by Jason Wee <pe...@gmail.com>.
cassandra run on virtual server (vmware)?

> I tried sstablescrub but it crashed with hs-err-pid-...
maybe try with larger heap allocated to sstablescrub

this sstable corrupt i ran into it as well (on cassandra 1.2), first i
try nodetool scrub, still persist, then offline sstablescrub still
persist, wipe the node and it happen again, then i change the hardware
(disk and mem). things went good.

hth

jason


On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
<al...@pdf.com> wrote:
> Hi,
>
> I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
> (NOT on the cloud)
>
> and I am getting
> Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
> Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
> org.apache.cassandra.io.FSReaderError:
> org.apache.cassandra.io.sstable.CorruptSSTableExecption:
> org.apache.cassandra.io.compress.CurrptBlockException:
> (E:\........\la-4886-big-Data.db): corruption detected, chunk at 4969092 of
> length 10208.
>     at
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:357)
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> ....
> ....
> ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
> forcefully due to file system exception on startup, disk failure policy
> "stop"
>
> I tried sstablescrub but it crashed with hs-err-pid-...
> I removed the corrupted file and started the Node again, after one day the
> corruption came back again, I removed the files, and restarted Cassandra, it
> worked for few days, then I ran "nodetool repair" after it finished,
> Cassandra failed again but with commitlog corruption, after removing the
> commitlog files, it failed again with another sstable corruption.
>
> I was also checking the HW, file system, and memory, the VMware logs showed
> no HW error, also the HW management logs showed NO problems or issues.
> Also checked the Windows Logs (Application and System) the only thing I
> found is on the system logs "Cassandra Service terminated with
> service-specific error Cannot create another system semaphore.
>
> I could not find any thing regarding that error, all comments point to
> application log.
>
> Any help is appreciated..
>
> --
>
> Alaa Zubaidi
>
>
> This message may contain confidential and privileged information. If it has
> been sent to you in error, please reply to advise the sender of the error
> and then immediately permanently delete it and all attachments to it from
> your systems. If you are not the intended recipient, do not read, copy,
> disclose or otherwise use this message or any attachments to it. The sender
> disclaims any liability for such unauthorized use. PLEASE NOTE that all
> incoming e-mails sent to PDF e-mail accounts will be archived and may be
> scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at legal.department@pdf.com.

Re: Corrupt SSTABLE over and over

Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
One more thing I noticed..
The corrupted SSTable is mentioned twice in the log file
[CompactionExecutor:10253] 2016-08-11 08:59:01,952.... - Compacting (.....)
[...la-1104-big-Data.db, ....]
[CompactionExecutor:10253] 2016-08-11 09:32:04,814.... - Compacting (.....)
[...la-1104-big-Data.db....]

Is it possible Cassandra is trying to compact the same file again while its
being compacted by another process?

Regards,
Alaa

On Thu, Aug 11, 2016 at 6:20 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:

> Hi,
>
> I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
> (NOT on the cloud)
>
> and I am getting
> Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
> Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
> org.apache.cassandra.io.FSReaderError: org.apache.cassandra.io.sstable.CorruptSSTableExecption:
> org.apache.cassandra.io.compress.CurrptBlockException:
> (E:\........\la-4886-big-Data.db): corruption detected, chunk at 4969092
> of length 10208.
>     at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:357)
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> ....
> ....
> ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
> forcefully due to file system exception on startup, disk failure policy
> "stop"
>
> I tried sstablescrub but it crashed with hs-err-pid-...
> I removed the corrupted file and started the Node again, after one day the
> corruption came back again, I removed the files, and restarted Cassandra,
> it worked for few days, then I ran "nodetool repair" after it finished,
> Cassandra failed again but with commitlog corruption, after removing the
> commitlog files, it failed again with another sstable corruption.
>
> I was also checking the HW, file system, and memory, the VMware logs
> showed no HW error, also the HW management logs showed NO problems or
> issues.
> Also checked the Windows Logs (Application and System) the only thing I
> found is on the system logs "Cassandra Service terminated with
> service-specific error Cannot create another system semaphore.
>
> I could not find any thing regarding that error, all comments point to
> application log.
>
> Any help is appreciated..
>
> --
>
> Alaa Zubaidi
>
>


-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*