You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alan Boudreault (JIRA)" <ji...@apache.org> on 2015/01/15 04:42:34 UTC

[jira] [Updated] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing

     [ https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Boudreault updated CASSANDRA-8623:
---------------------------------------
    Description: 
I'm experiencing an issue related to sstablesplit. I would like to understand if I am doing something wrong or there is an issue in the split process. The process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been able to reproduce the issue with a single sstablesplit command. ie, specifying all files with glob matching.
* I can reproduce the bug if I call multiple sstablesplit on a single file (the way ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUU&export=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset the dataset as its initial state and re-run the test. You might have to run the tests some times before experiencing it... but I'm always able with only 2-3 runs.


  was:
I'm experiencing an issue related to sstablesplit. I would like to understand if I am doing something wrong or there is an issue in the split process. The process fails randomly with the following exception:
{code}
ERROR 02:17:36 Error in ThreadPoolExecutor
java.lang.AssertionError: Data component is missing for sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
{code}

See attached output.log file. The process never stops after this exception and I've also seen the dataset growing indefinitely (number of sstables).  

* I have not been able to reproduce the issue with a single sstablesplit command. ie, specifying all files with glob matching.
* I can reproduce the if I call multiple sstablesplit on a single file (the way ccm does)

Here is the test case file to reproduce the bug:

https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing

1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 branch binaries.
2. Extract it
3. CD inside the use case directory
4. Download the dataset (2G) just to be sure we have the same thing, and place it in the working directory.
   https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUU&export=download
5. The first time, run ./test.sh. This will setup and run a test.
6. The next times, you can only run ./test --no-setup . This will only reset the dataset as its initial state and re-run the test. You might have to run the tests some times before experiencing it... but I'm always able with only 2-3 runs.



> sstablesplit fails *randomly* with Data component is missing
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8623
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>            Assignee: Marcus Eriksson
>         Attachments: output.log
>
>
> I'm experiencing an issue related to sstablesplit. I would like to understand if I am doing something wrong or there is an issue in the split process. The process fails randomly with the following exception:
> {code}
> ERROR 02:17:36 Error in ThreadPoolExecutor
> java.lang.AssertionError: Data component is missing for sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
> {code}
> See attached output.log file. The process never stops after this exception and I've also seen the dataset growing indefinitely (number of sstables).  
> * I have not been able to reproduce the issue with a single sstablesplit command. ie, specifying all files with glob matching.
> * I can reproduce the bug if I call multiple sstablesplit on a single file (the way ccm does)
> Here is the test case file to reproduce the bug:
> https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
> 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 branch binaries.
> 2. Extract it
> 3. CD inside the use case directory
> 4. Download the dataset (2G) just to be sure we have the same thing, and place it in the working directory.
>    https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUU&export=download
> 5. The first time, run ./test.sh. This will setup and run a test.
> 6. The next times, you can only run ./test --no-setup . This will only reset the dataset as its initial state and re-run the test. You might have to run the tests some times before experiencing it... but I'm always able with only 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)