You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by pankaj soni <pa...@gmail.com> on 2011/04/25 11:26:52 UTC

IP address resolution in MultiDC setup

Hi,

We have a scenario for which we are considering using apache Cassandra for
deployment for our data storage needs.The setup is to be spread across
multiple data centers in different regions(physical locations). With each
data center having multiple nodes. However we can afford at most 1 public IP
address for each data center. With nodes inside the data center
communicating over private IP. We plan to use RF=3 and
OldNetworkTopologyStrategy
for replica placement.


1 . This leads us to question that how will node discovery take place and
how will Cassandra ring be formed between multiple data centers?

2.  How is data partitioning to be carried on in this scenario?

3. If say data resides in Data center 1 node 2 and read query is sent to
Data center 2 node 1, assuming it DC2 has no local replica than how is read
query to be serviced? This is our biggest concern as articles relating to
public/private IPs for cassandra could not be found.


As in Cassandra any node can be queried for data and same goes for write
requests, cassandra is our first choice in environments we have to deploy.

Any suggestion is welcome.

pankaj

Re: IP address resolution in MultiDC setup

Posted by pankaj soni <pa...@gmail.com>.
Could you give the exact name of your paper. It will be easier to search.

thanks

On Mon, Apr 25, 2011 at 5:13 PM, Milind Parikh <mi...@gmail.com>wrote:

> I have authored exactly this paper....please search this ml. Please be
> aware about ec2's internal network as you design your deployment. Ec2 also
> does not support multicast; which is a pain,but not unsurmountable.
>
> /***********************
> sent from my android...please pardon occasional typos as I respond @ the
> speed of thought
> ************************/
>
> On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com> wrote:
>
> We are expecting to deploy it on amazon cloud ec2, if it may help. I am
> sure people would have deployed Cassandra data centers in different regions
> on cloud before. But I am unable to find documentation of any such
> deployment online.
>
> Because of this multi-regions the public-private IP address issue is
> important.
>
> pankaj
>
>
>
> On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <mi...@gmail.com>
> wrote:
> >
> > It will be thro...
>
>

Re: IP address resolution in MultiDC setup

Posted by Milind Parikh <mi...@gmail.com>.
You can't route traffic over private ips across data centers.....this is the
point of the patch.

/***********************
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
************************/

On Apr 26, 2011 6:59 AM, "pankaj soni" <pa...@gmail.com> wrote:


one last doubt is pending after reading your document:

1. when deploying cassandra across multiple dcs using your patch, is it
possible to have internal network of nodes in each data center talking over
private ip? then I assume the node with public ip will act as coordinator.
But if it goes down the link between data centers will be down?

could you clear this one.

thnks
pankaj



On Mon, Apr 25, 2011 at 7:00 PM, pankaj soni <pa...@gmail.com>
wrote:
>
> scrap the last ...

Re: IP address resolution in MultiDC setup

Posted by pankaj soni <pa...@gmail.com>.
one last doubt is pending after reading your document:

1. when deploying cassandra across multiple dcs using your patch, is it
possible to have internal network of nodes in each data center talking over
private ip? then I assume the node with public ip will act as coordinator.
But if it goes down the link between data centers will be down?

could you clear this one.

thnks
pankaj

On Mon, Apr 25, 2011 at 7:00 PM, pankaj soni <pa...@gmail.com>wrote:

> scrap the last mail, just finished reading Amazon ec2 resource policy.
>
> @milind when deploying cassandra across multiple dcs using your patch, is
> it possible to have internal network of nodes in each data center talking
> over private ip?
> then I assume the node with public ip will act as co-ordinator. If it goes
> down the link between data centers will be down?
>
> Thanks
> pankaj
>
>
> On Mon, Apr 25, 2011 at 6:09 PM, pankaj soni <pa...@gmail.com>wrote:
>
>> Just read your paper on this. Must say helped a great deal.
>>
>> 1 more query does amazon by default award both external and internal IP
>> address for each node? or we have to explicitly buy the external IP's?
>>
>> I am looking into overlay n/w's.
>>
>>
>> On Mon, Apr 25, 2011 at 5:20 PM, Milind Parikh <mi...@gmail.com>wrote:
>>
>>> I stand corrected....I show how cassandra can be deployed in multiple dcs
>>> through a simple patch; using public ips. In your scenario with an overlay
>>> n/w, you will not require this patch.
>>>
>>> /***********************
>>> sent from my android...please pardon occasional typos as I respond @ the
>>> speed of thought
>>> ************************/
>>>
>>> On Apr 25, 2011 7:43 AM, "Milind Parikh" <mi...@gmail.com> wrote:
>>>
>>> I have authored exactly this paper....please search this ml. Please be
>>> aware about ec2's internal network as you design your deployment. Ec2 also
>>> does not support multicast; which is a pain,but not unsurmountable.
>>>
>>>
>>>
>>> /***********************
>>> sent from my android...please pardon occasional typos as I respond @ the
>>> ...
>>>
>>>
>>> >
>>> > On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com>
>>> wrote:
>>> >
>>> > We are expecting t...
>>>
>>> pankaj
>>>
>>>
>>> >
>>> >
>>> >
>>> > On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <mi...@gmail.com>
>>> wrote:
>>> > >
>>> > It will be thro...
>>>
>>>
>>
>

Re: IP address resolution in MultiDC setup

Posted by pankaj soni <pa...@gmail.com>.
Hi,

I have a question regarding Vyatta or any providing VIP in general. While
routing through gateway do we bind it to ec2 nodes private IP or public IP?

Also, in general could you explain how VIP might help for I am new towards
this side of field.


thanks

On Mon, Apr 25, 2011 at 9:47 PM, Sasha Dolgy <sd...@gmail.com> wrote:

> honest opinion?  smoke and mirrors.  i really have no idea.  i was
> surprised to see the latency drop when we started using the VIP's we
> assigned routing through our ec2 vyatta gateways.  it makes it nice
> because it unties you from being 100% stuck on amazon.  you can design
> your environment for cassandra with local nodes in an office if you
> wanted ... it also solved the security problems i was coming across in
> that before cassandra 0.8, intra-node communication IS NOT encrypted
> or secured....
>
> anyway .. the biggest thing for me was to ensure we are not tied to
> one provider.  this was the best for my business case....also allowed
> us to not be harmed by the
> https://twitter.com/#!/search/amazonpocalypse ...
>
> -sd
>
>
> On Mon, Apr 25, 2011 at 6:11 PM, Milind Parikh <mi...@gmail.com>
> wrote:
> > @Sasha
> > Very interesting that you find a big difference in latency between nodes.
> > Any hypothesis on what is going on in internal aws routing that makes it
> > inefficient?
> > Milind
>

Re: IP address resolution in MultiDC setup

Posted by Sasha Dolgy <sd...@gmail.com>.
honest opinion?  smoke and mirrors.  i really have no idea.  i was
surprised to see the latency drop when we started using the VIP's we
assigned routing through our ec2 vyatta gateways.  it makes it nice
because it unties you from being 100% stuck on amazon.  you can design
your environment for cassandra with local nodes in an office if you
wanted ... it also solved the security problems i was coming across in
that before cassandra 0.8, intra-node communication IS NOT encrypted
or secured....

anyway .. the biggest thing for me was to ensure we are not tied to
one provider.  this was the best for my business case....also allowed
us to not be harmed by the
https://twitter.com/#!/search/amazonpocalypse ...

-sd


On Mon, Apr 25, 2011 at 6:11 PM, Milind Parikh <mi...@gmail.com> wrote:
> @Sasha
> Very interesting that you find a big difference in latency between nodes.
> Any hypothesis on what is going on in internal aws routing that makes it
> inefficient?
> Milind

Re: IP address resolution in MultiDC setup

Posted by Milind Parikh <mi...@gmail.com>.
@Sasha
Very interesting that you find a big difference in latency between nodes.
Any hypothesis on what is going on in internal aws routing that makes it
inefficient?
Milind





On Mon, Apr 25, 2011 at 9:48 AM, Sasha Dolgy <sd...@gmail.com> wrote:

> We use vyatta to create a vip on each instance and act as the gateway in
> each zone & region.  this allows us to bridge into our own facilities
> outside of aws.  we still can leverage ec2snitch and find a big speed
> difference wrt latency between nodes when by passing internal aws routing...
>   On Apr 25, 2011 3:30 PM, "pankaj soni" <pa...@gmail.com> wrote:
> > scrap the last mail, just finished reading Amazon ec2 resource policy.
> >
> > @milind when deploying cassandra across multiple dcs using your patch, is
> it
> > possible to have internal network of nodes in each data center talking
> over
> > private ip?
> > then I assume the node with public ip will act as co-ordinator. If it
> goes
> > down the link between data centers will be down?
> >
> > Thanks
> > pankaj
> >
> > On Mon, Apr 25, 2011 at 6:09 PM, pankaj soni <pankajsoni0126@gmail.com
> >wrote:
> >
> >> Just read your paper on this. Must say helped a great deal.
> >>
> >> 1 more query does amazon by default award both external and internal IP
> >> address for each node? or we have to explicitly buy the external IP's?
> >>
> >> I am looking into overlay n/w's.
> >>
> >>
> >> On Mon, Apr 25, 2011 at 5:20 PM, Milind Parikh <milindparikh@gmail.com
> >wrote:
> >>
> >>> I stand corrected....I show how cassandra can be deployed in multiple
> dcs
> >>> through a simple patch; using public ips. In your scenario with an
> overlay
> >>> n/w, you will not require this patch.
> >>>
> >>> /***********************
> >>> sent from my android...please pardon occasional typos as I respond @
> the
> >>> speed of thought
> >>> ************************/
> >>>
> >>> On Apr 25, 2011 7:43 AM, "Milind Parikh" <mi...@gmail.com>
> wrote:
> >>>
> >>> I have authored exactly this paper....please search this ml. Please be
> >>> aware about ec2's internal network as you design your deployment. Ec2
> also
> >>> does not support multicast; which is a pain,but not unsurmountable.
> >>>
> >>>
> >>>
> >>> /***********************
> >>> sent from my android...please pardon occasional typos as I respond @
> the
> >>> ...
> >>>
> >>>
> >>> >
> >>> > On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com>
> >>> wrote:
> >>> >
> >>> > We are expecting t...
> >>>
> >>> pankaj
> >>>
> >>>
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <
> milindparikh@gmail.com>
> >>> wrote:
> >>> > >
> >>> > It will be thro...
> >>>
> >>>
> >>
>

Re: IP address resolution in MultiDC setup

Posted by Sasha Dolgy <sd...@gmail.com>.
We use vyatta to create a vip on each instance and act as the gateway in
each zone & region.  this allows us to bridge into our own facilities
outside of aws.  we still can leverage ec2snitch and find a big speed
difference wrt latency between nodes when by passing internal aws routing...
On Apr 25, 2011 3:30 PM, "pankaj soni" <pa...@gmail.com> wrote:
> scrap the last mail, just finished reading Amazon ec2 resource policy.
>
> @milind when deploying cassandra across multiple dcs using your patch, is
it
> possible to have internal network of nodes in each data center talking
over
> private ip?
> then I assume the node with public ip will act as co-ordinator. If it goes
> down the link between data centers will be down?
>
> Thanks
> pankaj
>
> On Mon, Apr 25, 2011 at 6:09 PM, pankaj soni <pankajsoni0126@gmail.com
>wrote:
>
>> Just read your paper on this. Must say helped a great deal.
>>
>> 1 more query does amazon by default award both external and internal IP
>> address for each node? or we have to explicitly buy the external IP's?
>>
>> I am looking into overlay n/w's.
>>
>>
>> On Mon, Apr 25, 2011 at 5:20 PM, Milind Parikh <milindparikh@gmail.com
>wrote:
>>
>>> I stand corrected....I show how cassandra can be deployed in multiple
dcs
>>> through a simple patch; using public ips. In your scenario with an
overlay
>>> n/w, you will not require this patch.
>>>
>>> /***********************
>>> sent from my android...please pardon occasional typos as I respond @ the
>>> speed of thought
>>> ************************/
>>>
>>> On Apr 25, 2011 7:43 AM, "Milind Parikh" <mi...@gmail.com> wrote:
>>>
>>> I have authored exactly this paper....please search this ml. Please be
>>> aware about ec2's internal network as you design your deployment. Ec2
also
>>> does not support multicast; which is a pain,but not unsurmountable.
>>>
>>>
>>>
>>> /***********************
>>> sent from my android...please pardon occasional typos as I respond @ the
>>> ...
>>>
>>>
>>> >
>>> > On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com>
>>> wrote:
>>> >
>>> > We are expecting t...
>>>
>>> pankaj
>>>
>>>
>>> >
>>> >
>>> >
>>> > On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <milindparikh@gmail.com
>
>>> wrote:
>>> > >
>>> > It will be thro...
>>>
>>>
>>

Re: IP address resolution in MultiDC setup

Posted by pankaj soni <pa...@gmail.com>.
scrap the last mail, just finished reading Amazon ec2 resource policy.

@milind when deploying cassandra across multiple dcs using your patch, is it
possible to have internal network of nodes in each data center talking over
private ip?
then I assume the node with public ip will act as co-ordinator. If it goes
down the link between data centers will be down?

Thanks
pankaj

On Mon, Apr 25, 2011 at 6:09 PM, pankaj soni <pa...@gmail.com>wrote:

> Just read your paper on this. Must say helped a great deal.
>
> 1 more query does amazon by default award both external and internal IP
> address for each node? or we have to explicitly buy the external IP's?
>
> I am looking into overlay n/w's.
>
>
> On Mon, Apr 25, 2011 at 5:20 PM, Milind Parikh <mi...@gmail.com>wrote:
>
>> I stand corrected....I show how cassandra can be deployed in multiple dcs
>> through a simple patch; using public ips. In your scenario with an overlay
>> n/w, you will not require this patch.
>>
>> /***********************
>> sent from my android...please pardon occasional typos as I respond @ the
>> speed of thought
>> ************************/
>>
>> On Apr 25, 2011 7:43 AM, "Milind Parikh" <mi...@gmail.com> wrote:
>>
>> I have authored exactly this paper....please search this ml. Please be
>> aware about ec2's internal network as you design your deployment. Ec2 also
>> does not support multicast; which is a pain,but not unsurmountable.
>>
>>
>>
>> /***********************
>> sent from my android...please pardon occasional typos as I respond @ the
>> ...
>>
>>
>> >
>> > On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com>
>> wrote:
>> >
>> > We are expecting t...
>>
>> pankaj
>>
>>
>> >
>> >
>> >
>> > On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <mi...@gmail.com>
>> wrote:
>> > >
>> > It will be thro...
>>
>>
>

Re: IP address resolution in MultiDC setup

Posted by pankaj soni <pa...@gmail.com>.
Just read your paper on this. Must say helped a great deal.

1 more query does amazon by default award both external and internal IP
address for each node? or we have to explicitly buy the external IP's?

I am looking into overlay n/w's.

On Mon, Apr 25, 2011 at 5:20 PM, Milind Parikh <mi...@gmail.com>wrote:

> I stand corrected....I show how cassandra can be deployed in multiple dcs
> through a simple patch; using public ips. In your scenario with an overlay
> n/w, you will not require this patch.
>
> /***********************
> sent from my android...please pardon occasional typos as I respond @ the
> speed of thought
> ************************/
>
> On Apr 25, 2011 7:43 AM, "Milind Parikh" <mi...@gmail.com> wrote:
>
> I have authored exactly this paper....please search this ml. Please be
> aware about ec2's internal network as you design your deployment. Ec2 also
> does not support multicast; which is a pain,but not unsurmountable.
>
>
>
> /***********************
> sent from my android...please pardon occasional typos as I respond @ the
> ...
>
>
> >
> > On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com> wrote:
> >
> > We are expecting t...
>
> pankaj
>
>
> >
> >
> >
> > On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <mi...@gmail.com>
> wrote:
> > >
> > It will be thro...
>
>

Re: IP address resolution in MultiDC setup

Posted by Milind Parikh <mi...@gmail.com>.
I stand corrected....I show how cassandra can be deployed in multiple dcs
through a simple patch; using public ips. In your scenario with an overlay
n/w, you will not require this patch.

/***********************
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
************************/

On Apr 25, 2011 7:43 AM, "Milind Parikh" <mi...@gmail.com> wrote:

I have authored exactly this paper....please search this ml. Please be aware
about ec2's internal network as you design your deployment. Ec2 also does
not support multicast; which is a pain,but not unsurmountable.



/***********************
sent from my android...please pardon occasional typos as I respond @ the ...


>
> On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com> wrote:
>
> We are expecting t...
pankaj


>
>
>
> On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <mi...@gmail.com>
wrote:
> >
> It will be thro...

Re: IP address resolution in MultiDC setup

Posted by Milind Parikh <mi...@gmail.com>.
I have authored exactly this paper....please search this ml. Please be aware
about ec2's internal network as you design your deployment. Ec2 also does
not support multicast; which is a pain,but not unsurmountable.

/***********************
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
************************/

On Apr 25, 2011 7:31 AM, "pankaj soni" <pa...@gmail.com> wrote:

We are expecting to deploy it on amazon cloud ec2, if it may help. I am sure
people would have deployed Cassandra data centers in different regions on
cloud before. But I am unable to find documentation of any such deployment
online.

Because of this multi-regions the public-private IP address issue is
important.

pankaj



On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <mi...@gmail.com>
wrote:
>
> It will be thro...

Re: IP address resolution in MultiDC setup

Posted by pankaj soni <pa...@gmail.com>.
We are expecting to deploy it on amazon cloud ec2, if it may help. I am sure
people would have deployed Cassandra data centers in different regions on
cloud before. But I am unable to find documentation of any such deployment
online.

Because of this multi-regions the public-private IP address issue is
important.

pankaj

On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh <mi...@gmail.com>wrote:

> It will be through an overlay n/w. unfortunately setting up such n/w is
> complex. Look @ something like openvpn.
>
> If multicast is supported, it will be easier. With complex software such as
> Cassandra, it is much better to go with the expected flow; rather than
> devicing your own flows.....my2c.
>
> /***********************
> sent from my android...please pardon occasional typos as I respond @ the
> speed of thought
> ************************/
>
> On Apr 25, 2011 5:27 AM, "pankaj soni" <pa...@gmail.com> wrote:
>
> Hi,
>
> We have a scenario for which we are considering using apache Cassandra for
> deployment for our data storage needs.The setup is to be spread across
> multiple data centers in different regions(physical locations). With each
> data center having multiple nodes. However we can afford at most 1 public IP
> address for each data center. With nodes inside the data center
> communicating over private IP. We plan to use RF=3 and OldNetworkTopologyStrategy
> for replica placement.
>
>
> 1 . This leads us to question that how will node discovery take place and
> how will Cassandra ring be formed between multiple data centers?
>
> 2.  How is data partitioning to be carried on in this scenario?
>
> 3. If say data resides in Data center 1 node 2 and read query is sent to
> Data center 2 node 1, assuming it DC2 has no local replica than how is read
> query to be serviced? This is our biggest concern as articles relating to
> public/private IPs for cassandra could not be found.
>
>
> As in Cassandra any node can be queried for data and same goes for write
> requests, cassandra is our first choice in environments we have to deploy.
>
> Any suggestion is welcome.
>
> pankaj
>
>

Re: IP address resolution in MultiDC setup

Posted by Milind Parikh <mi...@gmail.com>.
It will be through an overlay n/w. unfortunately setting up such n/w is
complex. Look @ something like openvpn.

If multicast is supported, it will be easier. With complex software such as
Cassandra, it is much better to go with the expected flow; rather than
devicing your own flows.....my2c.

/***********************
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
************************/

On Apr 25, 2011 5:27 AM, "pankaj soni" <pa...@gmail.com> wrote:

Hi,

We have a scenario for which we are considering using apache Cassandra for
deployment for our data storage needs.The setup is to be spread across
multiple data centers in different regions(physical locations). With each
data center having multiple nodes. However we can afford at most 1 public IP
address for each data center. With nodes inside the data center
communicating over private IP. We plan to use RF=3 and
OldNetworkTopologyStrategy
for replica placement.


1 . This leads us to question that how will node discovery take place and
how will Cassandra ring be formed between multiple data centers?

2.  How is data partitioning to be carried on in this scenario?

3. If say data resides in Data center 1 node 2 and read query is sent to
Data center 2 node 1, assuming it DC2 has no local replica than how is read
query to be serviced? This is our biggest concern as articles relating to
public/private IPs for cassandra could not be found.


As in Cassandra any node can be queried for data and same goes for write
requests, cassandra is our first choice in environments we have to deploy.

Any suggestion is welcome.

pankaj