You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Arindam Choudhury <ar...@ackstorm.com> on 2016/01/29 13:33:09 UTC

missing rows while importing data using sstable loader

Hi,

I am importing data to a new cassandra cluster using sstableloader. The
sstableloader runs without any warning or error. But I am missing around
1000 rows.

Any feedback will be highly appreciated.

Kind Regards,
Arindam Choudhury

Re: missing rows while importing data using sstable loader

Posted by Victor Chen <vi...@gmail.com>.
Arindam,

What can you share regarding the source from which you are importing data?
Is it a separate cassandra cluster? If so, how many nodes and datacenters?
What is RF (replication factor) of source cluster? How certain are you that
the rows indeed exist in the set of sstables which you are loading into
sstableloader? I ask b/c as a hypothetical, if you load sstables from a
single node from a 3 node single DC source cluster w/ RF=2, you won't be
importing a full set of the data that existed in the source cluster. In the
aforementioned case, you'd need to load sstables from at least two nodes to
have imported a full set of the data, because of the RF (if RF was 3, then
all you would need is a single node. If RF=1, then you'd need all sstables
from all three nodes).

On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
arindam.choudhury@ackstorm.com> wrote:

> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>

Re: missing rows while importing data using sstable loader

Posted by Jack Krupansky <ja...@gmail.com>.
I sent a message to DataStax Docs to add this nodetool flush suggestion to
the doc for sstableloader.

-- Jack Krupansky

On Fri, Feb 5, 2016 at 3:35 AM, Romain Hardouin <ro...@yahoo.fr> wrote:

> > What is the best practise to create sstables?
>
> When you run a "nodetool flush" Cassandra persists all the memtables on
> disk, i.e. it produces sstables.
> (You can create sstables by yourself thanks to  CQLSSTableWriter, but I
> don't think it was the point of your question.)
>

Re: missing rows while importing data using sstable loader

Posted by Romain Hardouin <ro...@yahoo.fr>.
> What is the best practise to create sstables?

When you run a "nodetool flush" Cassandra persists all the memtables on disk, i.e. it produces sstables.
(You can create sstables by yourself thanks to  CQLSSTableWriter, but I don't think it was the point of your question.)

Re: missing rows while importing data using sstable loader

Posted by Arindam Choudhury <ar...@ackstorm.com>.
What is the best practise to create sstables?

On 1 February 2016 at 15:21, Romain Hardouin <ro...@yahoo.fr> wrote:

> Did you run "nodetool flush" on the source node? If not, the missing rows
> could be in memtables.
>

Re: missing rows while importing data using sstable loader

Posted by Romain Hardouin <ro...@yahoo.fr>.
Did you run "nodetool flush" on the source node? If not, the missing rows could be in memtables.

Re: missing rows while importing data using sstable loader

Posted by Jack Krupansky <ja...@gmail.com>.
I agree that there should be more clear doc on exactly how the estimation
is calculated. When I inquired about this recently the response was that it
should be within about 2% of the actual key count. I started looking at the
code, but I ran out of time before I chased down all the subsidiary factors
in the calculation.

It would be nice to have an explicit nodetool option to count actual keys.
Presumably that would be more efficient than a select count(*).


-- Jack Krupansky

On Fri, Jan 29, 2016 at 11:27 AM, Arindam Choudhury <
arindam.choudhury@ackstorm.com> wrote:

> Why in cqlsh when I query "select count(*) from mordor.things_values_meta
> ;" it says: 4692
>
> But in nodetool cfstats it says Number of keys (estimate): 4720?
>
> On 29 January 2016 at 16:25, Arindam Choudhury <
> arindam.choudhury@ackstorm.com> wrote:
>
>> I am counting the rows with "select count(*) from
>> mordor.things_values_meta;"
>>
>> I am doing one node cluster to one node cluster for testing.
>>
>> On 29 January 2016 at 16:20, Jack Krupansky <ja...@gmail.com>
>> wrote:
>>
>>> And how are you counting the rows? With a query? If, so, what is the
>>> query. Using nodetool cfstats (estimated) key count? Or... what?
>>>
>>> Are the tokens for the missing rows is the same range and a distinct
>>> range from the rest of the data in the original cluster?
>>>
>>> How many nodes in the original cluster?
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
>>> arindam.choudhury@ackstorm.com> wrote:
>>>
>>>> I will check the output of nodetool cfstats.
>>>>
>>>> Its from version 2.1.2 to version 2.1.9.
>>>>
>>>> On 29 January 2016 at 16:02, Jack Krupansky <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> Are these sstables from an existing Cassandra cluster or generated by
>>>>> a program?
>>>>>
>>>>> If the former, do a nodetool tablestats or cfstats to get the sstable
>>>>> count and compare it to both the number of sstables that the loader is
>>>>> reading from and the number that end up in the target cluster.
>>>>>
>>>>> What Cassandra version did the sstables come from and what version are
>>>>> you importing into?
>>>>>
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>>>>> arindam.choudhury@ackstorm.com> wrote:
>>>>>
>>>>>> Hi Romain,
>>>>>>
>>>>>> The RF was set to 2.
>>>>>>
>>>>>> I changed it to one.
>>>>>>
>>>>>>  CREATE KEYSPACE mordor WITH replication = {'class' :
>>>>>> 'SimpleStrategy', 'replication_factor' : 1}  AND durable_writes = true;
>>>>>>
>>>>>> re-inserted the columns, still missing rows.
>>>>>>
>>>>>> Regards,
>>>>>> Arindam
>>>>>>
>>>>>> On 29 January 2016 at 15:14, Romain Hardouin <ro...@yahoo.fr>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I assume a RF > 1. Right?
>>>>>>> What is the consistency level you used? cqlsh use ONE by default.
>>>>>>> Try:
>>>>>>> cqlsh> CONSISTENCY ALL
>>>>>>> And run your query again.
>>>>>>>
>>>>>>> Best,
>>>>>>> Romain
>>>>>>>
>>>>>>>
>>>>>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>>>>>> arindam.choudhury@ackstorm.com> a écrit :
>>>>>>>
>>>>>>>
>>>>>>> Hi Kai,
>>>>>>>
>>>>>>> The table schema is:
>>>>>>>
>>>>>>> CREATE TABLE mordor.things_values_meta (
>>>>>>>     thing_id text,
>>>>>>>     key text,
>>>>>>>     bucket_timestamp timestamp,
>>>>>>>     total_rows counter,
>>>>>>>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>>>>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>>>>>     AND comment = ''
>>>>>>>     AND compaction = {'min_threshold': '4', 'class':
>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>> 'max_threshold': '32'}
>>>>>>>     AND compression = {'sstable_compression':
>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>     AND default_time_to_live = 0
>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>     AND max_index_interval = 2048
>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>     AND min_index_interval = 128
>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>     AND speculative_retry = '99.0PERCENTILE';
>>>>>>>
>>>>>>>
>>>>>>> I am just running "select count(*) from things_values_meta ;" to get
>>>>>>> the count.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Arindam
>>>>>>>
>>>>>>> On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:
>>>>>>>
>>>>>>> Arindam,
>>>>>>>
>>>>>>> what's the table schema and what does your query to retrieve the
>>>>>>> rows look like?
>>>>>>>
>>>>>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>>>>>> arindam.choudhury@ackstorm.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am importing data to a new cassandra cluster using sstableloader.
>>>>>>> The sstableloader runs without any warning or error. But I am missing
>>>>>>> around 1000 rows.
>>>>>>>
>>>>>>> Any feedback will be highly appreciated.
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>> Arindam Choudhury
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: missing rows while importing data using sstable loader

Posted by Arindam Choudhury <ar...@ackstorm.com>.
Why in cqlsh when I query "select count(*) from mordor.things_values_meta
;" it says: 4692

But in nodetool cfstats it says Number of keys (estimate): 4720?

On 29 January 2016 at 16:25, Arindam Choudhury <
arindam.choudhury@ackstorm.com> wrote:

> I am counting the rows with "select count(*) from
> mordor.things_values_meta;"
>
> I am doing one node cluster to one node cluster for testing.
>
> On 29 January 2016 at 16:20, Jack Krupansky <ja...@gmail.com>
> wrote:
>
>> And how are you counting the rows? With a query? If, so, what is the
>> query. Using nodetool cfstats (estimated) key count? Or... what?
>>
>> Are the tokens for the missing rows is the same range and a distinct
>> range from the rest of the data in the original cluster?
>>
>> How many nodes in the original cluster?
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
>> arindam.choudhury@ackstorm.com> wrote:
>>
>>> I will check the output of nodetool cfstats.
>>>
>>> Its from version 2.1.2 to version 2.1.9.
>>>
>>> On 29 January 2016 at 16:02, Jack Krupansky <ja...@gmail.com>
>>> wrote:
>>>
>>>> Are these sstables from an existing Cassandra cluster or generated by a
>>>> program?
>>>>
>>>> If the former, do a nodetool tablestats or cfstats to get the sstable
>>>> count and compare it to both the number of sstables that the loader is
>>>> reading from and the number that end up in the target cluster.
>>>>
>>>> What Cassandra version did the sstables come from and what version are
>>>> you importing into?
>>>>
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>>>> arindam.choudhury@ackstorm.com> wrote:
>>>>
>>>>> Hi Romain,
>>>>>
>>>>> The RF was set to 2.
>>>>>
>>>>> I changed it to one.
>>>>>
>>>>>  CREATE KEYSPACE mordor WITH replication = {'class' :
>>>>> 'SimpleStrategy', 'replication_factor' : 1}  AND durable_writes = true;
>>>>>
>>>>> re-inserted the columns, still missing rows.
>>>>>
>>>>> Regards,
>>>>> Arindam
>>>>>
>>>>> On 29 January 2016 at 15:14, Romain Hardouin <ro...@yahoo.fr>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I assume a RF > 1. Right?
>>>>>> What is the consistency level you used? cqlsh use ONE by default.
>>>>>> Try:
>>>>>> cqlsh> CONSISTENCY ALL
>>>>>> And run your query again.
>>>>>>
>>>>>> Best,
>>>>>> Romain
>>>>>>
>>>>>>
>>>>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>>>>> arindam.choudhury@ackstorm.com> a écrit :
>>>>>>
>>>>>>
>>>>>> Hi Kai,
>>>>>>
>>>>>> The table schema is:
>>>>>>
>>>>>> CREATE TABLE mordor.things_values_meta (
>>>>>>     thing_id text,
>>>>>>     key text,
>>>>>>     bucket_timestamp timestamp,
>>>>>>     total_rows counter,
>>>>>>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>>>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>>>>     AND comment = ''
>>>>>>     AND compaction = {'min_threshold': '4', 'class':
>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>> 'max_threshold': '32'}
>>>>>>     AND compression = {'sstable_compression':
>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>     AND default_time_to_live = 0
>>>>>>     AND gc_grace_seconds = 864000
>>>>>>     AND max_index_interval = 2048
>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>     AND min_index_interval = 128
>>>>>>     AND read_repair_chance = 0.0
>>>>>>     AND speculative_retry = '99.0PERCENTILE';
>>>>>>
>>>>>>
>>>>>> I am just running "select count(*) from things_values_meta ;" to get
>>>>>> the count.
>>>>>>
>>>>>> Regards,
>>>>>> Arindam
>>>>>>
>>>>>> On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:
>>>>>>
>>>>>> Arindam,
>>>>>>
>>>>>> what's the table schema and what does your query to retrieve the rows
>>>>>> look like?
>>>>>>
>>>>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>>>>> arindam.choudhury@ackstorm.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am importing data to a new cassandra cluster using sstableloader.
>>>>>> The sstableloader runs without any warning or error. But I am missing
>>>>>> around 1000 rows.
>>>>>>
>>>>>> Any feedback will be highly appreciated.
>>>>>>
>>>>>> Kind Regards,
>>>>>> Arindam Choudhury
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: missing rows while importing data using sstable loader

Posted by Arindam Choudhury <ar...@ackstorm.com>.
I am counting the rows with "select count(*) from
mordor.things_values_meta;"

I am doing one node cluster to one node cluster for testing.

On 29 January 2016 at 16:20, Jack Krupansky <ja...@gmail.com>
wrote:

> And how are you counting the rows? With a query? If, so, what is the
> query. Using nodetool cfstats (estimated) key count? Or... what?
>
> Are the tokens for the missing rows is the same range and a distinct range
> from the rest of the data in the original cluster?
>
> How many nodes in the original cluster?
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
> arindam.choudhury@ackstorm.com> wrote:
>
>> I will check the output of nodetool cfstats.
>>
>> Its from version 2.1.2 to version 2.1.9.
>>
>> On 29 January 2016 at 16:02, Jack Krupansky <ja...@gmail.com>
>> wrote:
>>
>>> Are these sstables from an existing Cassandra cluster or generated by a
>>> program?
>>>
>>> If the former, do a nodetool tablestats or cfstats to get the sstable
>>> count and compare it to both the number of sstables that the loader is
>>> reading from and the number that end up in the target cluster.
>>>
>>> What Cassandra version did the sstables come from and what version are
>>> you importing into?
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>>> arindam.choudhury@ackstorm.com> wrote:
>>>
>>>> Hi Romain,
>>>>
>>>> The RF was set to 2.
>>>>
>>>> I changed it to one.
>>>>
>>>>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
>>>> 'replication_factor' : 1}  AND durable_writes = true;
>>>>
>>>> re-inserted the columns, still missing rows.
>>>>
>>>> Regards,
>>>> Arindam
>>>>
>>>> On 29 January 2016 at 15:14, Romain Hardouin <ro...@yahoo.fr>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I assume a RF > 1. Right?
>>>>> What is the consistency level you used? cqlsh use ONE by default.
>>>>> Try:
>>>>> cqlsh> CONSISTENCY ALL
>>>>> And run your query again.
>>>>>
>>>>> Best,
>>>>> Romain
>>>>>
>>>>>
>>>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>>>> arindam.choudhury@ackstorm.com> a écrit :
>>>>>
>>>>>
>>>>> Hi Kai,
>>>>>
>>>>> The table schema is:
>>>>>
>>>>> CREATE TABLE mordor.things_values_meta (
>>>>>     thing_id text,
>>>>>     key text,
>>>>>     bucket_timestamp timestamp,
>>>>>     total_rows counter,
>>>>>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>>>     AND comment = ''
>>>>>     AND compaction = {'min_threshold': '4', 'class':
>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>> 'max_threshold': '32'}
>>>>>     AND compression = {'sstable_compression':
>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>     AND default_time_to_live = 0
>>>>>     AND gc_grace_seconds = 864000
>>>>>     AND max_index_interval = 2048
>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>     AND min_index_interval = 128
>>>>>     AND read_repair_chance = 0.0
>>>>>     AND speculative_retry = '99.0PERCENTILE';
>>>>>
>>>>>
>>>>> I am just running "select count(*) from things_values_meta ;" to get
>>>>> the count.
>>>>>
>>>>> Regards,
>>>>> Arindam
>>>>>
>>>>> On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:
>>>>>
>>>>> Arindam,
>>>>>
>>>>> what's the table schema and what does your query to retrieve the rows
>>>>> look like?
>>>>>
>>>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>>>> arindam.choudhury@ackstorm.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am importing data to a new cassandra cluster using sstableloader.
>>>>> The sstableloader runs without any warning or error. But I am missing
>>>>> around 1000 rows.
>>>>>
>>>>> Any feedback will be highly appreciated.
>>>>>
>>>>> Kind Regards,
>>>>> Arindam Choudhury
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: missing rows while importing data using sstable loader

Posted by Jack Krupansky <ja...@gmail.com>.
And how are you counting the rows? With a query? If, so, what is the query.
Using nodetool cfstats (estimated) key count? Or... what?

Are the tokens for the missing rows is the same range and a distinct range
from the rest of the data in the original cluster?

How many nodes in the original cluster?

-- Jack Krupansky

On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
arindam.choudhury@ackstorm.com> wrote:

> I will check the output of nodetool cfstats.
>
> Its from version 2.1.2 to version 2.1.9.
>
> On 29 January 2016 at 16:02, Jack Krupansky <ja...@gmail.com>
> wrote:
>
>> Are these sstables from an existing Cassandra cluster or generated by a
>> program?
>>
>> If the former, do a nodetool tablestats or cfstats to get the sstable
>> count and compare it to both the number of sstables that the loader is
>> reading from and the number that end up in the target cluster.
>>
>> What Cassandra version did the sstables come from and what version are
>> you importing into?
>>
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>> arindam.choudhury@ackstorm.com> wrote:
>>
>>> Hi Romain,
>>>
>>> The RF was set to 2.
>>>
>>> I changed it to one.
>>>
>>>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
>>> 'replication_factor' : 1}  AND durable_writes = true;
>>>
>>> re-inserted the columns, still missing rows.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 15:14, Romain Hardouin <ro...@yahoo.fr>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I assume a RF > 1. Right?
>>>> What is the consistency level you used? cqlsh use ONE by default.
>>>> Try:
>>>> cqlsh> CONSISTENCY ALL
>>>> And run your query again.
>>>>
>>>> Best,
>>>> Romain
>>>>
>>>>
>>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>>> arindam.choudhury@ackstorm.com> a écrit :
>>>>
>>>>
>>>> Hi Kai,
>>>>
>>>> The table schema is:
>>>>
>>>> CREATE TABLE mordor.things_values_meta (
>>>>     thing_id text,
>>>>     key text,
>>>>     bucket_timestamp timestamp,
>>>>     total_rows counter,
>>>>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>>>     AND bloom_filter_fp_chance = 0.01
>>>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>>     AND comment = ''
>>>>     AND compaction = {'min_threshold': '4', 'class':
>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>> 'max_threshold': '32'}
>>>>     AND compression = {'sstable_compression':
>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>     AND dclocal_read_repair_chance = 0.1
>>>>     AND default_time_to_live = 0
>>>>     AND gc_grace_seconds = 864000
>>>>     AND max_index_interval = 2048
>>>>     AND memtable_flush_period_in_ms = 0
>>>>     AND min_index_interval = 128
>>>>     AND read_repair_chance = 0.0
>>>>     AND speculative_retry = '99.0PERCENTILE';
>>>>
>>>>
>>>> I am just running "select count(*) from things_values_meta ;" to get
>>>> the count.
>>>>
>>>> Regards,
>>>> Arindam
>>>>
>>>> On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:
>>>>
>>>> Arindam,
>>>>
>>>> what's the table schema and what does your query to retrieve the rows
>>>> look like?
>>>>
>>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>>> arindam.choudhury@ackstorm.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am importing data to a new cassandra cluster using sstableloader. The
>>>> sstableloader runs without any warning or error. But I am missing around
>>>> 1000 rows.
>>>>
>>>> Any feedback will be highly appreciated.
>>>>
>>>> Kind Regards,
>>>> Arindam Choudhury
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: missing rows while importing data using sstable loader

Posted by Arindam Choudhury <ar...@ackstorm.com>.
I will check the output of nodetool cfstats.

Its from version 2.1.2 to version 2.1.9.

On 29 January 2016 at 16:02, Jack Krupansky <ja...@gmail.com>
wrote:

> Are these sstables from an existing Cassandra cluster or generated by a
> program?
>
> If the former, do a nodetool tablestats or cfstats to get the sstable
> count and compare it to both the number of sstables that the loader is
> reading from and the number that end up in the target cluster.
>
> What Cassandra version did the sstables come from and what version are you
> importing into?
>
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
> arindam.choudhury@ackstorm.com> wrote:
>
>> Hi Romain,
>>
>> The RF was set to 2.
>>
>> I changed it to one.
>>
>>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
>> 'replication_factor' : 1}  AND durable_writes = true;
>>
>> re-inserted the columns, still missing rows.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 15:14, Romain Hardouin <ro...@yahoo.fr> wrote:
>>
>>> Hi,
>>>
>>> I assume a RF > 1. Right?
>>> What is the consistency level you used? cqlsh use ONE by default.
>>> Try:
>>> cqlsh> CONSISTENCY ALL
>>> And run your query again.
>>>
>>> Best,
>>> Romain
>>>
>>>
>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>> arindam.choudhury@ackstorm.com> a écrit :
>>>
>>>
>>> Hi Kai,
>>>
>>> The table schema is:
>>>
>>> CREATE TABLE mordor.things_values_meta (
>>>     thing_id text,
>>>     key text,
>>>     bucket_timestamp timestamp,
>>>     total_rows counter,
>>>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>>     AND bloom_filter_fp_chance = 0.01
>>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>     AND comment = ''
>>>     AND compaction = {'min_threshold': '4', 'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32'}
>>>     AND compression = {'sstable_compression':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>     AND dclocal_read_repair_chance = 0.1
>>>     AND default_time_to_live = 0
>>>     AND gc_grace_seconds = 864000
>>>     AND max_index_interval = 2048
>>>     AND memtable_flush_period_in_ms = 0
>>>     AND min_index_interval = 128
>>>     AND read_repair_chance = 0.0
>>>     AND speculative_retry = '99.0PERCENTILE';
>>>
>>>
>>> I am just running "select count(*) from things_values_meta ;" to get the
>>> count.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:
>>>
>>> Arindam,
>>>
>>> what's the table schema and what does your query to retrieve the rows
>>> look like?
>>>
>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>> arindam.choudhury@ackstorm.com> wrote:
>>>
>>> Hi,
>>>
>>> I am importing data to a new cassandra cluster using sstableloader. The
>>> sstableloader runs without any warning or error. But I am missing around
>>> 1000 rows.
>>>
>>> Any feedback will be highly appreciated.
>>>
>>> Kind Regards,
>>> Arindam Choudhury
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: missing rows while importing data using sstable loader

Posted by Jack Krupansky <ja...@gmail.com>.
Are these sstables from an existing Cassandra cluster or generated by a
program?

If the former, do a nodetool tablestats or cfstats to get the sstable count
and compare it to both the number of sstables that the loader is reading
from and the number that end up in the target cluster.

What Cassandra version did the sstables come from and what version are you
importing into?


-- Jack Krupansky

On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
arindam.choudhury@ackstorm.com> wrote:

> Hi Romain,
>
> The RF was set to 2.
>
> I changed it to one.
>
>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
> 'replication_factor' : 1}  AND durable_writes = true;
>
> re-inserted the columns, still missing rows.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 15:14, Romain Hardouin <ro...@yahoo.fr> wrote:
>
>> Hi,
>>
>> I assume a RF > 1. Right?
>> What is the consistency level you used? cqlsh use ONE by default.
>> Try:
>> cqlsh> CONSISTENCY ALL
>> And run your query again.
>>
>> Best,
>> Romain
>>
>>
>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>> arindam.choudhury@ackstorm.com> a écrit :
>>
>>
>> Hi Kai,
>>
>> The table schema is:
>>
>> CREATE TABLE mordor.things_values_meta (
>>     thing_id text,
>>     key text,
>>     bucket_timestamp timestamp,
>>     total_rows counter,
>>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>     AND bloom_filter_fp_chance = 0.01
>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>     AND comment = ''
>>     AND compaction = {'min_threshold': '4', 'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32'}
>>     AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>     AND dclocal_read_repair_chance = 0.1
>>     AND default_time_to_live = 0
>>     AND gc_grace_seconds = 864000
>>     AND max_index_interval = 2048
>>     AND memtable_flush_period_in_ms = 0
>>     AND min_index_interval = 128
>>     AND read_repair_chance = 0.0
>>     AND speculative_retry = '99.0PERCENTILE';
>>
>>
>> I am just running "select count(*) from things_values_meta ;" to get the
>> count.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:
>>
>> Arindam,
>>
>> what's the table schema and what does your query to retrieve the rows
>> look like?
>>
>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>> arindam.choudhury@ackstorm.com> wrote:
>>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader. The
>> sstableloader runs without any warning or error. But I am missing around
>> 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>>
>>
>>
>>
>>
>

Re: missing rows while importing data using sstable loader

Posted by Arindam Choudhury <ar...@ackstorm.com>.
Hi Romain,

The RF was set to 2.

I changed it to one.

 CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
'replication_factor' : 1}  AND durable_writes = true;

re-inserted the columns, still missing rows.

Regards,
Arindam

On 29 January 2016 at 15:14, Romain Hardouin <ro...@yahoo.fr> wrote:

> Hi,
>
> I assume a RF > 1. Right?
> What is the consistency level you used? cqlsh use ONE by default.
> Try:
> cqlsh> CONSISTENCY ALL
> And run your query again.
>
> Best,
> Romain
>
>
> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
> arindam.choudhury@ackstorm.com> a écrit :
>
>
> Hi Kai,
>
> The table schema is:
>
> CREATE TABLE mordor.things_values_meta (
>     thing_id text,
>     key text,
>     bucket_timestamp timestamp,
>     total_rows counter,
>     PRIMARY KEY ((thing_id, key), bucket_timestamp)
> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
>
>
> I am just running "select count(*) from things_values_meta ;" to get the
> count.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:
>
> Arindam,
>
> what's the table schema and what does your query to retrieve the rows look
> like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudhury@ackstorm.com> wrote:
>
> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>
>
>
>
>
>

Re: missing rows while importing data using sstable loader

Posted by Romain Hardouin <ro...@yahoo.fr>.
Hi,
I assume a RF > 1. Right?What is the consistency level you used? cqlsh use ONE by default. Try: cqlsh> CONSISTENCY ALLAnd run your query again.
Best,Romain 

    Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <ar...@ackstorm.com> a écrit :
 

 Hi Kai,

The table schema is:

CREATE TABLE mordor.things_values_meta (
    thing_id text,
    key text,
    bucket_timestamp timestamp,
    total_rows counter,
    PRIMARY KEY ((thing_id, key), bucket_timestamp)
) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';


I am just running "select count(*) from things_values_meta ;" to get the count.

Regards,
Arindam

On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:

Arindam,

what's the table schema and what does your query to retrieve the rows look like?
 
On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <ar...@ackstorm.com> wrote:

Hi,

I am importing data to a new cassandra cluster using sstableloader. The sstableloader runs without any warning or error. But I am missing around 1000 rows.

Any feedback will be highly appreciated. 

Kind Regards,
Arindam Choudhury






  

Re: missing rows while importing data using sstable loader

Posted by Arindam Choudhury <ar...@ackstorm.com>.
Hi Kai,

The table schema is:

CREATE TABLE mordor.things_values_meta (
    thing_id text,
    key text,
    bucket_timestamp timestamp,
    total_rows counter,
    PRIMARY KEY ((thing_id, key), bucket_timestamp)
) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';


I am just running "select count(*) from things_values_meta ;" to get the
count.

Regards,
Arindam

On 29 January 2016 at 13:39, Kai Wang <de...@gmail.com> wrote:

> Arindam,
>
> what's the table schema and what does your query to retrieve the rows look
> like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudhury@ackstorm.com> wrote:
>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader. The
>> sstableloader runs without any warning or error. But I am missing around
>> 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>
>

Re: missing rows while importing data using sstable loader

Posted by Kai Wang <de...@gmail.com>.
Arindam,

what's the table schema and what does your query to retrieve the rows look
like?

On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
arindam.choudhury@ackstorm.com> wrote:

> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>