You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Subrahmanya Harve <su...@gmail.com> on 2011/11/10 21:27:38 UTC

Data retrieval inconsistent

I am facing an issue in 0.8.7 cluster -

- I have two clusters in two DCs (rather one cross dc cluster) and two
keyspaces. But i have only configured one keyspace to replicate data to the
other DC and the other keyspace to not replicate over to the other DC.
Basically this is the way i ran the keyspace creation  -
    create keyspace K1 with
placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and
strategy_options = [{replication_factor:1}];
    create keyspace K2 with
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options = [{DC1:2, DC2:2}];

I had to do this because i expect that K1 will get a large volume of data
and i do not want this wired over to the other DC.

I am writing the data at CL=ONE and reading the data at CL=ONE. I am seeing
an issue where sometimes i get the data and other times i do not see the
data. Does anyone know what could be going on here?

A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i can
see that there are large changes in the yaml file, but a specific question
i had was - how do i configure disk_access_mode like it used to be in 0.7.4?

One observation i have made is that some nodes of the cross dc cluster are
at different system times. This is something to fix but could this be why
data is sometimes retrieved and other times not? Or is there some other
thing to it?

Would appreciate a quick response.

Re: Data retrieval inconsistent

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Nov 10, 2011 at 3:27 PM, Subrahmanya Harve <
subrahmanyaharve@gmail.com> wrote:

> I am facing an issue in 0.8.7 cluster -
>
> - I have two clusters in two DCs (rather one cross dc cluster) and two
> keyspaces. But i have only configured one keyspace to replicate data to the
> other DC and the other keyspace to not replicate over to the other DC.
> Basically this is the way i ran the keyspace creation  -
>    create keyspace K1 with
> placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and
> strategy_options = [{replication_factor:1}];
>    create keyspace K2 with
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
> and strategy_options = [{DC1:2, DC2:2}];
>
> I had to do this because i expect that K1 will get a large volume of data
> and i do not want this wired over to the other DC.
>
> I am writing the data at CL=ONE and reading the data at CL=ONE. I am seeing
> an issue where sometimes i get the data and other times i do not see the
> data. Does anyone know what could be going on here?
>
> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i can
> see that there are large changes in the yaml file, but a specific question
> i had was - how do i configure disk_access_mode like it used to be in
> 0.7.4?
>
> One observation i have made is that some nodes of the cross dc cluster are
> at different system times. This is something to fix but could this be why
> data is sometimes retrieved and other times not? Or is there some other
> thing to it?
>
> Would appreciate a quick response.
>

disk_access_mode is no longer required. Cassandra attempts to detect 64 bit
and set this correctly.

If your systems are not running NTP and as a result their clocks are not in
sync this can cause issues (with ttl columns). Your client machines must
have correct time as well.

Re: Data retrieval inconsistent

Posted by Subrahmanya Harve <su...@gmail.com>.
Thanks.
I'm gonna try and use QUORUM to read and/or write and see if data is
returned consistently.


On Thu, Nov 10, 2011 at 3:00 PM, Jeremiah Jordan <
jeremiah.jordan@morningstar.com> wrote:

>  No, that is what I thought you wanted.  I was thinking your machines in
> DC1 had extra disk space or something...
>
> (I stopped replying to the dev list)
>
>
> On 11/10/2011 04:09 PM, Subrahmanya Harve wrote:
>
> Thanks Ed and Jeremiah for that useful info.
> "I am pretty sure the way you have K1 configured it will be placed across
> both DC's as if you had large ring.  If you want it only in DC1 you need to
> say DC1:1, DC2:0."
> Infact i do want K1 to be available across both DCs as if i had a large
> ring. I just do not want them to replicate over across DCs. Also i did try
> doing it like you said DC1:1, DC2:0 but wont that mean that, all my data
> goes into DC1 irrespective of whether the data is getting into the nodes of
> DC1 or DC2, thereby creating a "hot DC"? Since the volume of data for this
> case is huge, that might create a load imbalance on DC1? (Am i missing
> something?)
>
>
> On Thu, Nov 10, 2011 at 1:30 PM, Jeremiah Jordan <
> jeremiah.jordan@morningstar.com> wrote:
>
> > I am pretty sure the way you have K1 configured it will be placed across
> > both DC's as if you had large ring.  If you want it only in DC1 you need
> to
> > say DC1:1, DC2:0.
> > If you are writing and reading at ONE you are not guaranteed to get the
> > data if RF > 1.  If RF = 2, and you write with ONE, you data could be
> > written to server 1, and then read from server 2 before it gets over
> there.
> >
> > The differing on server times will only really matter for TTL's.  Most
> > everything else works off comparing user supplied times.
> >
> > -Jeremiah
> >
> >
> > On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:
> >
> >>
> >> I am facing an issue in 0.8.7 cluster -
> >>
> >> - I have two clusters in two DCs (rather one cross dc cluster) and two
> >> keyspaces. But i have only configured one keyspace to replicate data to
> the
> >> other DC and the other keyspace to not replicate over to the other DC.
> >> Basically this is the way i ran the keyspace creation  -
> >>    create keyspace K1 with placement_strategy='org.**
> >> apache.cassandra.locator.**SimpleStrategy' and strategy_options =
> >> [{replication_factor:1}];
> >>    create keyspace K2 with placement_strategy='org.**
> >> apache.cassandra.locator.**NetworkTopologyStrategy' and strategy_options
>
> >> = [{DC1:2, DC2:2}];
> >>
> >> I had to do this because i expect that K1 will get a large volume of
> data
> >> and i do not want this wired over to the other DC.
> >>
> >> I am writing the data at CL=ONE and reading the data at CL=ONE. I am
> >> seeing an issue where sometimes i get the data and other times i do not
> see
> >> the data. Does anyone know what could be going on here?
> >>
> >> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i
> can
> >> see that there are large changes in the yaml file, but a specific
> question
> >> i had was - how do i configure disk_access_mode like it used to be in
> 0.7.4?
> >>
> >> One observation i have made is that some nodes of the cross dc cluster
> >> are at different system times. This is something to fix but could this
> be
> >> why data is sometimes retrieved and other times not? Or is there some
> other
> >> thing to it?
> >>
> >> Would appreciate a quick response.
> >>
> >
>
>

Re: Data retrieval inconsistent

Posted by Jeremiah Jordan <je...@morningstar.com>.
No, that is what I thought you wanted.  I was thinking your machines in 
DC1 had extra disk space or something...

(I stopped replying to the dev list)

On 11/10/2011 04:09 PM, Subrahmanya Harve wrote:
>
> Thanks Ed and Jeremiah for that useful info.
> "I am pretty sure the way you have K1 configured it will be placed across
> both DC's as if you had large ring.  If you want it only in DC1 you 
> need to
> say DC1:1, DC2:0."
> Infact i do want K1 to be available across both DCs as if i had a large
> ring. I just do not want them to replicate over across DCs. Also i did try
> doing it like you said DC1:1, DC2:0 but wont that mean that, all my data
> goes into DC1 irrespective of whether the data is getting into the 
> nodes of
> DC1 or DC2, thereby creating a "hot DC"? Since the volume of data for this
> case is huge, that might create a load imbalance on DC1? (Am i missing
> something?)
>
>
> On Thu, Nov 10, 2011 at 1:30 PM, Jeremiah Jordan <
> jeremiah.jordan@morningstar.com> wrote:
>
> > I am pretty sure the way you have K1 configured it will be placed across
> > both DC's as if you had large ring.  If you want it only in DC1 you 
> need to
> > say DC1:1, DC2:0.
> > If you are writing and reading at ONE you are not guaranteed to get the
> > data if RF > 1.  If RF = 2, and you write with ONE, you data could be
> > written to server 1, and then read from server 2 before it gets over 
> there.
> >
> > The differing on server times will only really matter for TTL's.  Most
> > everything else works off comparing user supplied times.
> >
> > -Jeremiah
> >
> >
> > On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:
> >
> >>
> >> I am facing an issue in 0.8.7 cluster -
> >>
> >> - I have two clusters in two DCs (rather one cross dc cluster) and two
> >> keyspaces. But i have only configured one keyspace to replicate 
> data to the
> >> other DC and the other keyspace to not replicate over to the other DC.
> >> Basically this is the way i ran the keyspace creation  -
> >>    create keyspace K1 with placement_strategy='org.**
> >> apache.cassandra.locator.**SimpleStrategy' and strategy_options =
> >> [{replication_factor:1}];
> >>    create keyspace K2 with placement_strategy='org.**
> >> apache.cassandra.locator.**NetworkTopologyStrategy' and 
> strategy_options
> >> = [{DC1:2, DC2:2}];
> >>
> >> I had to do this because i expect that K1 will get a large volume 
> of data
> >> and i do not want this wired over to the other DC.
> >>
> >> I am writing the data at CL=ONE and reading the data at CL=ONE. I am
> >> seeing an issue where sometimes i get the data and other times i do 
> not see
> >> the data. Does anyone know what could be going on here?
> >>
> >> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , 
> i can
> >> see that there are large changes in the yaml file, but a specific 
> question
> >> i had was - how do i configure disk_access_mode like it used to be 
> in 0.7.4?
> >>
> >> One observation i have made is that some nodes of the cross dc cluster
> >> are at different system times. This is something to fix but could 
> this be
> >> why data is sometimes retrieved and other times not? Or is there 
> some other
> >> thing to it?
> >>
> >> Would appreciate a quick response.
> >>
> >
>

Re: Data retrieval inconsistent

Posted by Subrahmanya Harve <su...@gmail.com>.
Thanks Ed and Jeremiah for that useful info.
"I am pretty sure the way you have K1 configured it will be placed across
both DC's as if you had large ring.  If you want it only in DC1 you need to
say DC1:1, DC2:0."
Infact i do want K1 to be available across both DCs as if i had a large
ring. I just do not want them to replicate over across DCs. Also i did try
doing it like you said DC1:1, DC2:0 but wont that mean that, all my data
goes into DC1 irrespective of whether the data is getting into the nodes of
DC1 or DC2, thereby creating a "hot DC"? Since the volume of data for this
case is huge, that might create a load imbalance on DC1? (Am i missing
something?)


On Thu, Nov 10, 2011 at 1:30 PM, Jeremiah Jordan <
jeremiah.jordan@morningstar.com> wrote:

> I am pretty sure the way you have K1 configured it will be placed across
> both DC's as if you had large ring.  If you want it only in DC1 you need to
> say DC1:1, DC2:0.
> If you are writing and reading at ONE you are not guaranteed to get the
> data if RF > 1.  If RF = 2, and you write with ONE, you data could be
> written to server 1, and then read from server 2 before it gets over there.
>
> The differing on server times will only really matter for TTL's.  Most
> everything else works off comparing user supplied times.
>
> -Jeremiah
>
>
> On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:
>
>>
>> I am facing an issue in 0.8.7 cluster -
>>
>> - I have two clusters in two DCs (rather one cross dc cluster) and two
>> keyspaces. But i have only configured one keyspace to replicate data to the
>> other DC and the other keyspace to not replicate over to the other DC.
>> Basically this is the way i ran the keyspace creation  -
>>    create keyspace K1 with placement_strategy='org.**
>> apache.cassandra.locator.**SimpleStrategy' and strategy_options =
>> [{replication_factor:1}];
>>    create keyspace K2 with placement_strategy='org.**
>> apache.cassandra.locator.**NetworkTopologyStrategy' and strategy_options
>> = [{DC1:2, DC2:2}];
>>
>> I had to do this because i expect that K1 will get a large volume of data
>> and i do not want this wired over to the other DC.
>>
>> I am writing the data at CL=ONE and reading the data at CL=ONE. I am
>> seeing an issue where sometimes i get the data and other times i do not see
>> the data. Does anyone know what could be going on here?
>>
>> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i can
>> see that there are large changes in the yaml file, but a specific question
>> i had was - how do i configure disk_access_mode like it used to be in 0.7.4?
>>
>> One observation i have made is that some nodes of the cross dc cluster
>> are at different system times. This is something to fix but could this be
>> why data is sometimes retrieved and other times not? Or is there some other
>> thing to it?
>>
>> Would appreciate a quick response.
>>
>

Re: Data retrieval inconsistent

Posted by Subrahmanya Harve <su...@gmail.com>.
Thanks Ed and Jeremiah for that useful info.
"I am pretty sure the way you have K1 configured it will be placed across
both DC's as if you had large ring.  If you want it only in DC1 you need to
say DC1:1, DC2:0."
Infact i do want K1 to be available across both DCs as if i had a large
ring. I just do not want them to replicate over across DCs. Also i did try
doing it like you said DC1:1, DC2:0 but wont that mean that, all my data
goes into DC1 irrespective of whether the data is getting into the nodes of
DC1 or DC2, thereby creating a "hot DC"? Since the volume of data for this
case is huge, that might create a load imbalance on DC1? (Am i missing
something?)


On Thu, Nov 10, 2011 at 1:30 PM, Jeremiah Jordan <
jeremiah.jordan@morningstar.com> wrote:

> I am pretty sure the way you have K1 configured it will be placed across
> both DC's as if you had large ring.  If you want it only in DC1 you need to
> say DC1:1, DC2:0.
> If you are writing and reading at ONE you are not guaranteed to get the
> data if RF > 1.  If RF = 2, and you write with ONE, you data could be
> written to server 1, and then read from server 2 before it gets over there.
>
> The differing on server times will only really matter for TTL's.  Most
> everything else works off comparing user supplied times.
>
> -Jeremiah
>
>
> On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:
>
>>
>> I am facing an issue in 0.8.7 cluster -
>>
>> - I have two clusters in two DCs (rather one cross dc cluster) and two
>> keyspaces. But i have only configured one keyspace to replicate data to the
>> other DC and the other keyspace to not replicate over to the other DC.
>> Basically this is the way i ran the keyspace creation  -
>>    create keyspace K1 with placement_strategy='org.**
>> apache.cassandra.locator.**SimpleStrategy' and strategy_options =
>> [{replication_factor:1}];
>>    create keyspace K2 with placement_strategy='org.**
>> apache.cassandra.locator.**NetworkTopologyStrategy' and strategy_options
>> = [{DC1:2, DC2:2}];
>>
>> I had to do this because i expect that K1 will get a large volume of data
>> and i do not want this wired over to the other DC.
>>
>> I am writing the data at CL=ONE and reading the data at CL=ONE. I am
>> seeing an issue where sometimes i get the data and other times i do not see
>> the data. Does anyone know what could be going on here?
>>
>> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i can
>> see that there are large changes in the yaml file, but a specific question
>> i had was - how do i configure disk_access_mode like it used to be in 0.7.4?
>>
>> One observation i have made is that some nodes of the cross dc cluster
>> are at different system times. This is something to fix but could this be
>> why data is sometimes retrieved and other times not? Or is there some other
>> thing to it?
>>
>> Would appreciate a quick response.
>>
>

Re: Data retrieval inconsistent

Posted by Jeremiah Jordan <je...@morningstar.com>.
I am pretty sure the way you have K1 configured it will be placed across 
both DC's as if you had large ring.  If you want it only in DC1 you need 
to say DC1:1, DC2:0.
If you are writing and reading at ONE you are not guaranteed to get the 
data if RF > 1.  If RF = 2, and you write with ONE, you data could be 
written to server 1, and then read from server 2 before it gets over there.

The differing on server times will only really matter for TTL's.  Most 
everything else works off comparing user supplied times.

-Jeremiah

On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:
>
> I am facing an issue in 0.8.7 cluster -
>
> - I have two clusters in two DCs (rather one cross dc cluster) and two 
> keyspaces. But i have only configured one keyspace to replicate data 
> to the other DC and the other keyspace to not replicate over to the 
> other DC. Basically this is the way i ran the keyspace creation  -
>     create keyspace K1 with 
> placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and 
> strategy_options = [{replication_factor:1}];
>     create keyspace K2 with 
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' 
> and strategy_options = [{DC1:2, DC2:2}];
>
> I had to do this because i expect that K1 will get a large volume of 
> data and i do not want this wired over to the other DC.
>
> I am writing the data at CL=ONE and reading the data at CL=ONE. I am 
> seeing an issue where sometimes i get the data and other times i do 
> not see the data. Does anyone know what could be going on here?
>
> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i 
> can see that there are large changes in the yaml file, but a specific 
> question i had was - how do i configure disk_access_mode like it used 
> to be in 0.7.4?
>
> One observation i have made is that some nodes of the cross dc cluster 
> are at different system times. This is something to fix but could this 
> be why data is sometimes retrieved and other times not? Or is there 
> some other thing to it?
>
> Would appreciate a quick response.

Re: Data retrieval inconsistent

Posted by Jeremiah Jordan <je...@morningstar.com>.
I am pretty sure the way you have K1 configured it will be placed across 
both DC's as if you had large ring.  If you want it only in DC1 you need 
to say DC1:1, DC2:0.
If you are writing and reading at ONE you are not guaranteed to get the 
data if RF > 1.  If RF = 2, and you write with ONE, you data could be 
written to server 1, and then read from server 2 before it gets over there.

The differing on server times will only really matter for TTL's.  Most 
everything else works off comparing user supplied times.

-Jeremiah

On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:
>
> I am facing an issue in 0.8.7 cluster -
>
> - I have two clusters in two DCs (rather one cross dc cluster) and two 
> keyspaces. But i have only configured one keyspace to replicate data 
> to the other DC and the other keyspace to not replicate over to the 
> other DC. Basically this is the way i ran the keyspace creation  -
>     create keyspace K1 with 
> placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and 
> strategy_options = [{replication_factor:1}];
>     create keyspace K2 with 
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' 
> and strategy_options = [{DC1:2, DC2:2}];
>
> I had to do this because i expect that K1 will get a large volume of 
> data and i do not want this wired over to the other DC.
>
> I am writing the data at CL=ONE and reading the data at CL=ONE. I am 
> seeing an issue where sometimes i get the data and other times i do 
> not see the data. Does anyone know what could be going on here?
>
> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i 
> can see that there are large changes in the yaml file, but a specific 
> question i had was - how do i configure disk_access_mode like it used 
> to be in 0.7.4?
>
> One observation i have made is that some nodes of the cross dc cluster 
> are at different system times. This is something to fix but could this 
> be why data is sometimes retrieved and other times not? Or is there 
> some other thing to it?
>
> Would appreciate a quick response.

Re: Data retrieval inconsistent

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Nov 10, 2011 at 3:27 PM, Subrahmanya Harve <
subrahmanyaharve@gmail.com> wrote:

> I am facing an issue in 0.8.7 cluster -
>
> - I have two clusters in two DCs (rather one cross dc cluster) and two
> keyspaces. But i have only configured one keyspace to replicate data to the
> other DC and the other keyspace to not replicate over to the other DC.
> Basically this is the way i ran the keyspace creation  -
>    create keyspace K1 with
> placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and
> strategy_options = [{replication_factor:1}];
>    create keyspace K2 with
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
> and strategy_options = [{DC1:2, DC2:2}];
>
> I had to do this because i expect that K1 will get a large volume of data
> and i do not want this wired over to the other DC.
>
> I am writing the data at CL=ONE and reading the data at CL=ONE. I am seeing
> an issue where sometimes i get the data and other times i do not see the
> data. Does anyone know what could be going on here?
>
> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i can
> see that there are large changes in the yaml file, but a specific question
> i had was - how do i configure disk_access_mode like it used to be in
> 0.7.4?
>
> One observation i have made is that some nodes of the cross dc cluster are
> at different system times. This is something to fix but could this be why
> data is sometimes retrieved and other times not? Or is there some other
> thing to it?
>
> Would appreciate a quick response.
>

disk_access_mode is no longer required. Cassandra attempts to detect 64 bit
and set this correctly.

If your systems are not running NTP and as a result their clocks are not in
sync this can cause issues (with ttl columns). Your client machines must
have correct time as well.