You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ross Black <ro...@gmail.com> on 2013/11/27 10:12:12 UTC

data dropped when using sstableloader?

Hi,

Using Cassandra 1.2.10, I am trying to load sstable data into a cluster of
6 machines.
The machines are using vnodes, and are configured with
NetworkTopologyStrategy replication=3 and LeveledCompactionStrategy on the
tables being loaded.
The sstable data was generated using SSTableSimpleUnsortedWriter.
The small dataset for one table is ~100GB, the large dataset for another
table is ~500GB.

The data was loaded using:
    sstableloader --nodes ihz58,ihz59,ihz60,ihz61,ihz62,ihz63 --verbose
${sstable_dir}
and was run on a machine that was not part of the cluster.

After loading the data using sstableloader, I discovered that some rows
were missing from Cassandra.  I dumped the sstables using sstable2json and
could see the missing rows in the generated data.

Over time the list of missing rows reduced, but for several days now the
list of missing data has not changed.  It is now more than a week since I
first loaded the data.
I have tried flushing all the nodes, restarting all machines, and running a
repair, but nothing changes the set of missing rows.

Is there anything I have done wrong here that could result in lost data?

Thanks,
Ross

Re: data dropped when using sstableloader?

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Nov 27, 2013 at 2:47 AM, Turi, Ferenc (GE Power & Water, Non-GE) <
Ferenc.Turi@ge.com> wrote:

>  Did you try to use CQL2 tables?
>
>
>
> /create the CF / table using “cqlsh -2”.
>
>
>
> We experienced the same but using CQL2 helped us.
>

CQL2 is a historical footnote and is likely to be removed from the major
version of Cassandra after 2.1.

https://issues.apache.org/jira/browse/CASSANDRA-5918

I do not suggest solving any current problems with it.

=Rob

RE: data dropped when using sstableloader?

Posted by "Turi, Ferenc (GE Power & Water, Non-GE)" <Fe...@ge.com>.
Hi Ross,

Did you try to use CQL2 tables?

/create the CF / table using "cqlsh -2".

We experienced the same but using CQL2 helped us.

Ferenc

From: Ross Black [mailto:ross.w.black@gmail.com]
Sent: Wednesday, November 27, 2013 10:12 AM
To: user@cassandra.apache.org
Subject: data dropped when using sstableloader?

Hi,
Using Cassandra 1.2.10, I am trying to load sstable data into a cluster of 6 machines.
The machines are using vnodes, and are configured with NetworkTopologyStrategy replication=3 and LeveledCompactionStrategy on the tables being loaded.
The sstable data was generated using SSTableSimpleUnsortedWriter.
The small dataset for one table is ~100GB, the large dataset for another table is ~500GB.

The data was loaded using:
    sstableloader --nodes ihz58,ihz59,ihz60,ihz61,ihz62,ihz63 --verbose ${sstable_dir}
and was run on a machine that was not part of the cluster.

After loading the data using sstableloader, I discovered that some rows were missing from Cassandra.  I dumped the sstables using sstable2json and could see the missing rows in the generated data.
Over time the list of missing rows reduced, but for several days now the list of missing data has not changed.  It is now more than a week since I first loaded the data.
I have tried flushing all the nodes, restarting all machines, and running a repair, but nothing changes the set of missing rows.
Is there anything I have done wrong here that could result in lost data?
Thanks,
Ross

Re: data dropped when using sstableloader?

Posted by Francisco Nogueira Calmon Sobral <fs...@igcorp.com.br>.
Hi, Ross.

We had the same problem under the same version of Cassandra. We opted to copy ALL the stables from the old cluster to each new node, then run nodetool refresh. The missing rows have appeared after this procedure.

Best regards,
Francisco.



On Nov 27, 2013, at 7:49 PM, Ross Black <ro...@gmail.com> wrote:

> Hi Tyler,
> 
> Thanks (somehow I missed that ticket when I searched for sstableloader bugs).
> 
> I will retry with 1.2.12 when we get a chance to upgrade.  In the meantime I have switched to loading data via the normal client API (slower but reliable).
> 
> Ross
> 
> 
> 
> On 28 November 2013 03:45, Tyler Hobbs <ty...@datastax.com> wrote:
> 
> On Wed, Nov 27, 2013 at 3:12 AM, Ross Black <ro...@gmail.com> wrote:
> Using Cassandra 1.2.10, I am trying to load sstable data into a cluster of 6 machines.
> 
> This may be affecting you: https://issues.apache.org/jira/browse/CASSANDRA-6272
> 
> Using 1.2.12 for the sstableloader process should work.
> 
> 
> -- 
> Tyler Hobbs
> DataStax
> 


Re: data dropped when using sstableloader?

Posted by Ross Black <ro...@gmail.com>.
Hi Tyler,

Thanks (somehow I missed that ticket when I searched for sstableloader
bugs).

I will retry with 1.2.12 when we get a chance to upgrade.  In the meantime
I have switched to loading data via the normal client API (slower but
reliable).

Ross



On 28 November 2013 03:45, Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Wed, Nov 27, 2013 at 3:12 AM, Ross Black <ro...@gmail.com>wrote:
>
>> Using Cassandra 1.2.10, I am trying to load sstable data into a cluster
>> of 6 machines.
>
>
> This may be affecting you:
> https://issues.apache.org/jira/browse/CASSANDRA-6272
>
> Using 1.2.12 for the sstableloader process should work.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: data dropped when using sstableloader?

Posted by Tyler Hobbs <ty...@datastax.com>.
On Wed, Nov 27, 2013 at 3:12 AM, Ross Black <ro...@gmail.com> wrote:

> Using Cassandra 1.2.10, I am trying to load sstable data into a cluster of
> 6 machines.


This may be affecting you:
https://issues.apache.org/jira/browse/CASSANDRA-6272

Using 1.2.12 for the sstableloader process should work.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>