You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rahul Reddy <ra...@gmail.com> on 2019/08/21 10:13:48 UTC

Cassandra copy command

Hello,

I have 3 datacenters . Want to make sure record count is same in all dc's .
If I run copy command node1 in dc1 does it get the data from only dc1?
Nodetool cfstats I'm seeing discrepancies in partitions count is it because
we didn't run cleanup after adding few nodes and remove them?. To rule out
any discripencies I want to run copy command from 3 DC's and compare.
Please let me know if copy command extracts data from the DC only I ran it
from?

Re: Cassandra copy command

Posted by Ahmed Eljami <ah...@gmail.com>.
Hello,

As Jean said il will be preferable to use  http://cassandra-reaper.io

So you don't have to manually manage the consistency of your cassandra ring
nor the list of nodes to repair.


Le mer. 21 août 2019 à 15:57, Rahul Reddy <ra...@gmail.com> a
écrit :

> Thanks Jean,
>
> I have dc1 and dc2 existing. added dc3 from dc1 and dc4 from dc2. If I
> want to run repair on one node in dc3 from dc1 only is it possible?
>
> On Wed, Aug 21, 2019, 8:11 AM Jean Carlo <je...@gmail.com>
> wrote:
>
>> Hello Rahul,
>>
>> To ensure the consistency among the DCs, it is enough to run a repair
>> command.
>>
>> You can do it using http://cassandra-reaper.io/
>> or runing the commande *nodetool repair* with the respectively options
>> in every node.
>>
>> You do not need to count the rows in every DC to ensure cassandra is sync
>> amongs DC after you have run the repair. But if you still want to do it,
>> use Spark for it.
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>>
>> On Wed, Aug 21, 2019 at 1:51 PM Rahul Reddy <ra...@gmail.com>
>> wrote:
>>
>>> Yep I did run rebuild   on each new node
>>>
>>> On Wed, Aug 21, 2019, 7:25 AM Stefan Miklosovic <
>>> stefan.miklosovic@instaclustr.com> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>> how did you add that dc3 to cluster? The rule of thumb here is to do
>>>> rebuild from each node, for example like here
>>>>
>>>> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>>>>
>>>> On Wed, 21 Aug 2019 at 12:57, Rahul Reddy <ra...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi sefan,
>>>> >
>>>> > I'm adding new DC3 to exiting cluster and see discripencies couple of
>>>> millions in Nodetool cfstats in new DC.
>>>> >
>>>> > My table size is 50gb
>>>> > I'm trying to run copy entire table.
>>>> >
>>>> > Copy table to 'full_tablr.csv' with delimiter ',';
>>>> >
>>>> > If I run above command from dc3. Does it get the data only from dc3?
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic <
>>>> stefan.miklosovic@instaclustr.com> wrote:
>>>> >>
>>>> >> Hi Rahul,
>>>> >>
>>>> >> what is your motivation behind this? Why do you want to make sure the
>>>> >> count is same? What is the purpose of that? All you should care about
>>>> >> is that Cassandra will return you right results. It was designed from
>>>> >> the very bottom to do that for you, you should not be bothered too
>>>> >> much about such discrepancies, they will be always there in general.
>>>> >> But the important fact is that once queried, you can rest assured it
>>>> >> is returned (and consequentially repaired if data not match) as they
>>>> >> should.
>>>> >>
>>>> >> What copy command you are talking about precisely, why you cant use
>>>> just repair?
>>>> >>
>>>> >> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy <ra...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Hello,
>>>> >> >
>>>> >> > I have 3 datacenters . Want to make sure record count is same in
>>>> all dc's . If I run copy command node1 in dc1 does it get the data from
>>>> only dc1? Nodetool cfstats I'm seeing discrepancies in partitions count is
>>>> it because we didn't run cleanup after adding few nodes and remove them?.
>>>> To rule out any discripencies I want to run copy command from 3 DC's and
>>>> compare. Please let me know if copy command extracts data from the DC only
>>>> I ran it from?
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>> >> For additional commands, e-mail: user-help@cassandra.apache.org
>>>> >>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>
>>>>

-- 
Cordialement;

Ahmed ELJAMI

Re: Cassandra copy command

Posted by Rahul Reddy <ra...@gmail.com>.
Thanks Jean,

I have dc1 and dc2 existing. added dc3 from dc1 and dc4 from dc2. If I want
to run repair on one node in dc3 from dc1 only is it possible?

On Wed, Aug 21, 2019, 8:11 AM Jean Carlo <je...@gmail.com> wrote:

> Hello Rahul,
>
> To ensure the consistency among the DCs, it is enough to run a repair
> command.
>
> You can do it using http://cassandra-reaper.io/
> or runing the commande *nodetool repair* with the respectively options in
> every node.
>
> You do not need to count the rows in every DC to ensure cassandra is sync
> amongs DC after you have run the repair. But if you still want to do it,
> use Spark for it.
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Wed, Aug 21, 2019 at 1:51 PM Rahul Reddy <ra...@gmail.com>
> wrote:
>
>> Yep I did run rebuild   on each new node
>>
>> On Wed, Aug 21, 2019, 7:25 AM Stefan Miklosovic <
>> stefan.miklosovic@instaclustr.com> wrote:
>>
>>> Hi Rahul,
>>>
>>> how did you add that dc3 to cluster? The rule of thumb here is to do
>>> rebuild from each node, for example like here
>>>
>>> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>>>
>>> On Wed, 21 Aug 2019 at 12:57, Rahul Reddy <ra...@gmail.com>
>>> wrote:
>>> >
>>> > Hi sefan,
>>> >
>>> > I'm adding new DC3 to exiting cluster and see discripencies couple of
>>> millions in Nodetool cfstats in new DC.
>>> >
>>> > My table size is 50gb
>>> > I'm trying to run copy entire table.
>>> >
>>> > Copy table to 'full_tablr.csv' with delimiter ',';
>>> >
>>> > If I run above command from dc3. Does it get the data only from dc3?
>>> >
>>> >
>>> >
>>> > On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic <
>>> stefan.miklosovic@instaclustr.com> wrote:
>>> >>
>>> >> Hi Rahul,
>>> >>
>>> >> what is your motivation behind this? Why do you want to make sure the
>>> >> count is same? What is the purpose of that? All you should care about
>>> >> is that Cassandra will return you right results. It was designed from
>>> >> the very bottom to do that for you, you should not be bothered too
>>> >> much about such discrepancies, they will be always there in general.
>>> >> But the important fact is that once queried, you can rest assured it
>>> >> is returned (and consequentially repaired if data not match) as they
>>> >> should.
>>> >>
>>> >> What copy command you are talking about precisely, why you cant use
>>> just repair?
>>> >>
>>> >> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy <ra...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Hello,
>>> >> >
>>> >> > I have 3 datacenters . Want to make sure record count is same in
>>> all dc's . If I run copy command node1 in dc1 does it get the data from
>>> only dc1? Nodetool cfstats I'm seeing discrepancies in partitions count is
>>> it because we didn't run cleanup after adding few nodes and remove them?.
>>> To rule out any discripencies I want to run copy command from 3 DC's and
>>> compare. Please let me know if copy command extracts data from the DC only
>>> I ran it from?
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> >> For additional commands, e-mail: user-help@cassandra.apache.org
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>
>>>

Re: Cassandra copy command

Posted by Jean Carlo <je...@gmail.com>.
Hello Rahul,

To ensure the consistency among the DCs, it is enough to run a repair
command.

You can do it using http://cassandra-reaper.io/
or runing the commande *nodetool repair* with the respectively options in
every node.

You do not need to count the rows in every DC to ensure cassandra is sync
amongs DC after you have run the repair. But if you still want to do it,
use Spark for it.

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Wed, Aug 21, 2019 at 1:51 PM Rahul Reddy <ra...@gmail.com>
wrote:

> Yep I did run rebuild   on each new node
>
> On Wed, Aug 21, 2019, 7:25 AM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
>> Hi Rahul,
>>
>> how did you add that dc3 to cluster? The rule of thumb here is to do
>> rebuild from each node, for example like here
>>
>> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>>
>> On Wed, 21 Aug 2019 at 12:57, Rahul Reddy <ra...@gmail.com>
>> wrote:
>> >
>> > Hi sefan,
>> >
>> > I'm adding new DC3 to exiting cluster and see discripencies couple of
>> millions in Nodetool cfstats in new DC.
>> >
>> > My table size is 50gb
>> > I'm trying to run copy entire table.
>> >
>> > Copy table to 'full_tablr.csv' with delimiter ',';
>> >
>> > If I run above command from dc3. Does it get the data only from dc3?
>> >
>> >
>> >
>> > On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic <
>> stefan.miklosovic@instaclustr.com> wrote:
>> >>
>> >> Hi Rahul,
>> >>
>> >> what is your motivation behind this? Why do you want to make sure the
>> >> count is same? What is the purpose of that? All you should care about
>> >> is that Cassandra will return you right results. It was designed from
>> >> the very bottom to do that for you, you should not be bothered too
>> >> much about such discrepancies, they will be always there in general.
>> >> But the important fact is that once queried, you can rest assured it
>> >> is returned (and consequentially repaired if data not match) as they
>> >> should.
>> >>
>> >> What copy command you are talking about precisely, why you cant use
>> just repair?
>> >>
>> >> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy <ra...@gmail.com>
>> wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > I have 3 datacenters . Want to make sure record count is same in all
>> dc's . If I run copy command node1 in dc1 does it get the data from only
>> dc1? Nodetool cfstats I'm seeing discrepancies in partitions count is it
>> because we didn't run cleanup after adding few nodes and remove them?. To
>> rule out any discripencies I want to run copy command from 3 DC's and
>> compare. Please let me know if copy command extracts data from the DC only
>> I ran it from?
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> >> For additional commands, e-mail: user-help@cassandra.apache.org
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>

Re: Cassandra copy command

Posted by Rahul Reddy <ra...@gmail.com>.
Yep I did run rebuild   on each new node

On Wed, Aug 21, 2019, 7:25 AM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> Hi Rahul,
>
> how did you add that dc3 to cluster? The rule of thumb here is to do
> rebuild from each node, for example like here
>
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>
> On Wed, 21 Aug 2019 at 12:57, Rahul Reddy <ra...@gmail.com>
> wrote:
> >
> > Hi sefan,
> >
> > I'm adding new DC3 to exiting cluster and see discripencies couple of
> millions in Nodetool cfstats in new DC.
> >
> > My table size is 50gb
> > I'm trying to run copy entire table.
> >
> > Copy table to 'full_tablr.csv' with delimiter ',';
> >
> > If I run above command from dc3. Does it get the data only from dc3?
> >
> >
> >
> > On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
> >>
> >> Hi Rahul,
> >>
> >> what is your motivation behind this? Why do you want to make sure the
> >> count is same? What is the purpose of that? All you should care about
> >> is that Cassandra will return you right results. It was designed from
> >> the very bottom to do that for you, you should not be bothered too
> >> much about such discrepancies, they will be always there in general.
> >> But the important fact is that once queried, you can rest assured it
> >> is returned (and consequentially repaired if data not match) as they
> >> should.
> >>
> >> What copy command you are talking about precisely, why you cant use
> just repair?
> >>
> >> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy <ra...@gmail.com>
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I have 3 datacenters . Want to make sure record count is same in all
> dc's . If I run copy command node1 in dc1 does it get the data from only
> dc1? Nodetool cfstats I'm seeing discrepancies in partitions count is it
> because we didn't run cleanup after adding few nodes and remove them?. To
> rule out any discripencies I want to run copy command from 3 DC's and
> compare. Please let me know if copy command extracts data from the DC only
> I ran it from?
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: user-help@cassandra.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Cassandra copy command

Posted by Stefan Miklosovic <st...@instaclustr.com>.
Hi Rahul,

how did you add that dc3 to cluster? The rule of thumb here is to do
rebuild from each node, for example like here
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

On Wed, 21 Aug 2019 at 12:57, Rahul Reddy <ra...@gmail.com> wrote:
>
> Hi sefan,
>
> I'm adding new DC3 to exiting cluster and see discripencies couple of millions in Nodetool cfstats in new DC.
>
> My table size is 50gb
> I'm trying to run copy entire table.
>
> Copy table to 'full_tablr.csv' with delimiter ',';
>
> If I run above command from dc3. Does it get the data only from dc3?
>
>
>
> On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic <st...@instaclustr.com> wrote:
>>
>> Hi Rahul,
>>
>> what is your motivation behind this? Why do you want to make sure the
>> count is same? What is the purpose of that? All you should care about
>> is that Cassandra will return you right results. It was designed from
>> the very bottom to do that for you, you should not be bothered too
>> much about such discrepancies, they will be always there in general.
>> But the important fact is that once queried, you can rest assured it
>> is returned (and consequentially repaired if data not match) as they
>> should.
>>
>> What copy command you are talking about precisely, why you cant use just repair?
>>
>> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy <ra...@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I have 3 datacenters . Want to make sure record count is same in all dc's . If I run copy command node1 in dc1 does it get the data from only dc1? Nodetool cfstats I'm seeing discrepancies in partitions count is it because we didn't run cleanup after adding few nodes and remove them?. To rule out any discripencies I want to run copy command from 3 DC's and compare. Please let me know if copy command extracts data from the DC only I ran it from?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Cassandra copy command

Posted by Rahul Reddy <ra...@gmail.com>.
Hi sefan,

I'm adding new DC3 to exiting cluster and see discripencies couple of
millions in Nodetool cfstats in new DC.

My table size is 50gb
I'm trying to run copy entire table.

Copy table to 'full_tablr.csv' with delimiter ',';

If I run above command from dc3. Does it get the data only from dc3?



On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> Hi Rahul,
>
> what is your motivation behind this? Why do you want to make sure the
> count is same? What is the purpose of that? All you should care about
> is that Cassandra will return you right results. It was designed from
> the very bottom to do that for you, you should not be bothered too
> much about such discrepancies, they will be always there in general.
> But the important fact is that once queried, you can rest assured it
> is returned (and consequentially repaired if data not match) as they
> should.
>
> What copy command you are talking about precisely, why you cant use just
> repair?
>
> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy <ra...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > I have 3 datacenters . Want to make sure record count is same in all
> dc's . If I run copy command node1 in dc1 does it get the data from only
> dc1? Nodetool cfstats I'm seeing discrepancies in partitions count is it
> because we didn't run cleanup after adding few nodes and remove them?. To
> rule out any discripencies I want to run copy command from 3 DC's and
> compare. Please let me know if copy command extracts data from the DC only
> I ran it from?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Cassandra copy command

Posted by Stefan Miklosovic <st...@instaclustr.com>.
Hi Rahul,

what is your motivation behind this? Why do you want to make sure the
count is same? What is the purpose of that? All you should care about
is that Cassandra will return you right results. It was designed from
the very bottom to do that for you, you should not be bothered too
much about such discrepancies, they will be always there in general.
But the important fact is that once queried, you can rest assured it
is returned (and consequentially repaired if data not match) as they
should.

What copy command you are talking about precisely, why you cant use just repair?

On Wed, 21 Aug 2019 at 12:14, Rahul Reddy <ra...@gmail.com> wrote:
>
> Hello,
>
> I have 3 datacenters . Want to make sure record count is same in all dc's . If I run copy command node1 in dc1 does it get the data from only dc1? Nodetool cfstats I'm seeing discrepancies in partitions count is it because we didn't run cleanup after adding few nodes and remove them?. To rule out any discripencies I want to run copy command from 3 DC's and compare. Please let me know if copy command extracts data from the DC only I ran it from?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org