You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by "Leonard, Michael" <Mi...@opco.com> on 2015/10/26 16:01:40 UTC

Hadoop on premise versus cloud

Hi,

I work at a large financial institution. I'm exploring deploying Hadoop and I'm trying to understand why I would deploy on premise when the cloud is faster and easier. What are the pros/cons of each? How does pricing compare between on premise and cloud deployments?

Any color would be very helpful. Thank you in advance.

Sincerely,
Michael



This communication and any attached files may contain information that is confidential or privileged. If this communication has been received in error, please delete or destroy it immediately.  Please go to http://www.opco.com/EmailDisclosures for important information and further disclosures pertaining to this transmission.

Re: Hadoop on premise versus cloud

Posted by daemeon reiydelle <da...@gmail.com>.
In addition to the data privacy concern described well below, there are a
couple of other areas you might consider (you can also respond privately
where I can be a bit more candid). . My experience with most banks (I work
with most of the players in the EU and US) are such that (1) below drives
development heavily into the cloud, in spite of (2)


   1. Physical plant processes?
   1. Your culture and processes may add months to a hardware delivery
      cycle for production systems,
      2. Does the data center under your employer's control even have the
      available racks, power, network ports and router software versions to
      support (bonding/stacking/teaming multiple 10gbit ports)
      3. Big data gets much less interesting and viable when layers like
      SAN and heavy hypervisors (VMware, Citrix) get into the mix, ditto "full
      nightly backups" and other interesting confusions about the tech, not to
      mention  heavily committed to this, and requier full DR and zero
data loss
      backups.
      2. Data issues
   1. Are the data sources "inside" your employer's network which would
      require extra authorizations to allow them to connect to a cloud provider?
      2. Are the consumers of the data going to be able to access the
      cluster (similar questions if an intermediating data manipulation tool is
      access by your employee/consumers)
   3. As to data privacy
      1. There are several data center providers who are legally and
      entirely based within either the continental (Netherlands, Germany) or
      adding UK if you think that is an alternative. To my knowledge Amazon is
      not yet there but will be, I do not know if it is generally available but
      Google Compute does have such ringfenced facilities in the EU, etc.
   4. Now the real motiviations:
      1. Startup costs as you figure out your data ingest complexity and as
      your user expectations get clarified mean you seldom know what you will
      need in a manner that management needs for their planning cycle.
      2. Capital (hardware) costs are zero and all costs can be written off
      in the current period. (Management has no hit to their capital expenses
      budget)
      3. Chainging (increasing ;{) costs can be directly tied to specific
      activities as they occur (customer wants more X, additional data
sources Y,
      and data Z% dirtier than expected ...  you know the drill, yes?)




*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Oct 26, 2015 at 8:01 AM, Leonard, Michael <Mi...@opco.com>
wrote:

> Hi,
>
>
>
> I work at a large financial institution. I’m exploring deploying Hadoop
> and I’m trying to understand why I would deploy on premise when the cloud
> is faster and easier. What are the pros/cons of each? How does pricing
> compare between on premise and cloud deployments?
>
>
>
> Any color would be very helpful. Thank you in advance.
>
>
>
> Sincerely,
>
> Michael
>
>
>
>
> This communication and any attached files may contain information that is
> confidential or privileged. If this communication has been received in
> error, please delete or destroy it immediately. Please go to
> www.opco.com/EmailDisclosures
>

Re: Hadoop on premise versus cloud

Posted by Daniel Schulz <da...@hotmail.com>.
Hi Michael,

Thank you for your message.

To European customers data privacy is a major concern. So they are quiet reluctant to use pubic clouds — even iff their data will be encrypted. This is one reason may organisations strongly prefer on-premise clouds to public ones. So companies use public clouds, other plan to migrate to a private one, but most major companies do not want to rely on a third party when it comes to company data.

On the other hand, latency is lower due to all processes running in your local network. But throughput may mitigate that for production data loads.

However, the major advantage of public clouds if the relatively small cost to get started. AWS and others are ready to go — whereas a private cloud needs to be installed and later maintained. 

Hope this helps a little.

Kind regards, Daniel.


> On 26 Oct 2015, at 16:01, Leonard, Michael <Mi...@opco.com> wrote:
> 
> Hi,
>  
> I work at a large financial institution. I’m exploring deploying Hadoop and I’m trying to understand why I would deploy on premise when the cloud is faster and easier. What are the pros/cons of each? How does pricing compare between on premise and cloud deployments?
>  
> Any color would be very helpful. Thank you in advance.
>  
> Sincerely,
> Michael
>  
>  
> This communication and any attached files may contain information that is confidential or privileged. If this communication has been received in error, please delete or destroy it immediately. Please go to www.opco.com/EmailDisclosures <http://www.opco.com/EmailDisclosures>

Re: Hadoop on premise versus cloud

Posted by daemeon reiydelle <da...@gmail.com>.
In addition to the data privacy concern described well below, there are a
couple of other areas you might consider (you can also respond privately
where I can be a bit more candid). . My experience with most banks (I work
with most of the players in the EU and US) are such that (1) below drives
development heavily into the cloud, in spite of (2)


   1. Physical plant processes?
   1. Your culture and processes may add months to a hardware delivery
      cycle for production systems,
      2. Does the data center under your employer's control even have the
      available racks, power, network ports and router software versions to
      support (bonding/stacking/teaming multiple 10gbit ports)
      3. Big data gets much less interesting and viable when layers like
      SAN and heavy hypervisors (VMware, Citrix) get into the mix, ditto "full
      nightly backups" and other interesting confusions about the tech, not to
      mention  heavily committed to this, and requier full DR and zero
data loss
      backups.
      2. Data issues
   1. Are the data sources "inside" your employer's network which would
      require extra authorizations to allow them to connect to a cloud provider?
      2. Are the consumers of the data going to be able to access the
      cluster (similar questions if an intermediating data manipulation tool is
      access by your employee/consumers)
   3. As to data privacy
      1. There are several data center providers who are legally and
      entirely based within either the continental (Netherlands, Germany) or
      adding UK if you think that is an alternative. To my knowledge Amazon is
      not yet there but will be, I do not know if it is generally available but
      Google Compute does have such ringfenced facilities in the EU, etc.
   4. Now the real motiviations:
      1. Startup costs as you figure out your data ingest complexity and as
      your user expectations get clarified mean you seldom know what you will
      need in a manner that management needs for their planning cycle.
      2. Capital (hardware) costs are zero and all costs can be written off
      in the current period. (Management has no hit to their capital expenses
      budget)
      3. Chainging (increasing ;{) costs can be directly tied to specific
      activities as they occur (customer wants more X, additional data
sources Y,
      and data Z% dirtier than expected ...  you know the drill, yes?)




*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Oct 26, 2015 at 8:01 AM, Leonard, Michael <Mi...@opco.com>
wrote:

> Hi,
>
>
>
> I work at a large financial institution. I’m exploring deploying Hadoop
> and I’m trying to understand why I would deploy on premise when the cloud
> is faster and easier. What are the pros/cons of each? How does pricing
> compare between on premise and cloud deployments?
>
>
>
> Any color would be very helpful. Thank you in advance.
>
>
>
> Sincerely,
>
> Michael
>
>
>
>
> This communication and any attached files may contain information that is
> confidential or privileged. If this communication has been received in
> error, please delete or destroy it immediately. Please go to
> www.opco.com/EmailDisclosures
>

Re: Hadoop on premise versus cloud

Posted by Daniel Schulz <da...@hotmail.com>.
Hi Michael,

Thank you for your message.

To European customers data privacy is a major concern. So they are quiet reluctant to use pubic clouds — even iff their data will be encrypted. This is one reason may organisations strongly prefer on-premise clouds to public ones. So companies use public clouds, other plan to migrate to a private one, but most major companies do not want to rely on a third party when it comes to company data.

On the other hand, latency is lower due to all processes running in your local network. But throughput may mitigate that for production data loads.

However, the major advantage of public clouds if the relatively small cost to get started. AWS and others are ready to go — whereas a private cloud needs to be installed and later maintained. 

Hope this helps a little.

Kind regards, Daniel.


> On 26 Oct 2015, at 16:01, Leonard, Michael <Mi...@opco.com> wrote:
> 
> Hi,
>  
> I work at a large financial institution. I’m exploring deploying Hadoop and I’m trying to understand why I would deploy on premise when the cloud is faster and easier. What are the pros/cons of each? How does pricing compare between on premise and cloud deployments?
>  
> Any color would be very helpful. Thank you in advance.
>  
> Sincerely,
> Michael
>  
>  
> This communication and any attached files may contain information that is confidential or privileged. If this communication has been received in error, please delete or destroy it immediately. Please go to www.opco.com/EmailDisclosures <http://www.opco.com/EmailDisclosures>

Re: Hadoop on premise versus cloud

Posted by daemeon reiydelle <da...@gmail.com>.
In addition to the data privacy concern described well below, there are a
couple of other areas you might consider (you can also respond privately
where I can be a bit more candid). . My experience with most banks (I work
with most of the players in the EU and US) are such that (1) below drives
development heavily into the cloud, in spite of (2)


   1. Physical plant processes?
   1. Your culture and processes may add months to a hardware delivery
      cycle for production systems,
      2. Does the data center under your employer's control even have the
      available racks, power, network ports and router software versions to
      support (bonding/stacking/teaming multiple 10gbit ports)
      3. Big data gets much less interesting and viable when layers like
      SAN and heavy hypervisors (VMware, Citrix) get into the mix, ditto "full
      nightly backups" and other interesting confusions about the tech, not to
      mention  heavily committed to this, and requier full DR and zero
data loss
      backups.
      2. Data issues
   1. Are the data sources "inside" your employer's network which would
      require extra authorizations to allow them to connect to a cloud provider?
      2. Are the consumers of the data going to be able to access the
      cluster (similar questions if an intermediating data manipulation tool is
      access by your employee/consumers)
   3. As to data privacy
      1. There are several data center providers who are legally and
      entirely based within either the continental (Netherlands, Germany) or
      adding UK if you think that is an alternative. To my knowledge Amazon is
      not yet there but will be, I do not know if it is generally available but
      Google Compute does have such ringfenced facilities in the EU, etc.
   4. Now the real motiviations:
      1. Startup costs as you figure out your data ingest complexity and as
      your user expectations get clarified mean you seldom know what you will
      need in a manner that management needs for their planning cycle.
      2. Capital (hardware) costs are zero and all costs can be written off
      in the current period. (Management has no hit to their capital expenses
      budget)
      3. Chainging (increasing ;{) costs can be directly tied to specific
      activities as they occur (customer wants more X, additional data
sources Y,
      and data Z% dirtier than expected ...  you know the drill, yes?)




*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Oct 26, 2015 at 8:01 AM, Leonard, Michael <Mi...@opco.com>
wrote:

> Hi,
>
>
>
> I work at a large financial institution. I’m exploring deploying Hadoop
> and I’m trying to understand why I would deploy on premise when the cloud
> is faster and easier. What are the pros/cons of each? How does pricing
> compare between on premise and cloud deployments?
>
>
>
> Any color would be very helpful. Thank you in advance.
>
>
>
> Sincerely,
>
> Michael
>
>
>
>
> This communication and any attached files may contain information that is
> confidential or privileged. If this communication has been received in
> error, please delete or destroy it immediately. Please go to
> www.opco.com/EmailDisclosures
>

Re: Hadoop on premise versus cloud

Posted by Daniel Schulz <da...@hotmail.com>.
Hi Michael,

Thank you for your message.

To European customers data privacy is a major concern. So they are quiet reluctant to use pubic clouds — even iff their data will be encrypted. This is one reason may organisations strongly prefer on-premise clouds to public ones. So companies use public clouds, other plan to migrate to a private one, but most major companies do not want to rely on a third party when it comes to company data.

On the other hand, latency is lower due to all processes running in your local network. But throughput may mitigate that for production data loads.

However, the major advantage of public clouds if the relatively small cost to get started. AWS and others are ready to go — whereas a private cloud needs to be installed and later maintained. 

Hope this helps a little.

Kind regards, Daniel.


> On 26 Oct 2015, at 16:01, Leonard, Michael <Mi...@opco.com> wrote:
> 
> Hi,
>  
> I work at a large financial institution. I’m exploring deploying Hadoop and I’m trying to understand why I would deploy on premise when the cloud is faster and easier. What are the pros/cons of each? How does pricing compare between on premise and cloud deployments?
>  
> Any color would be very helpful. Thank you in advance.
>  
> Sincerely,
> Michael
>  
>  
> This communication and any attached files may contain information that is confidential or privileged. If this communication has been received in error, please delete or destroy it immediately. Please go to www.opco.com/EmailDisclosures <http://www.opco.com/EmailDisclosures>

Re: Hadoop on premise versus cloud

Posted by Daniel Schulz <da...@hotmail.com>.
Hi Michael,

Thank you for your message.

To European customers data privacy is a major concern. So they are quiet reluctant to use pubic clouds — even iff their data will be encrypted. This is one reason may organisations strongly prefer on-premise clouds to public ones. So companies use public clouds, other plan to migrate to a private one, but most major companies do not want to rely on a third party when it comes to company data.

On the other hand, latency is lower due to all processes running in your local network. But throughput may mitigate that for production data loads.

However, the major advantage of public clouds if the relatively small cost to get started. AWS and others are ready to go — whereas a private cloud needs to be installed and later maintained. 

Hope this helps a little.

Kind regards, Daniel.


> On 26 Oct 2015, at 16:01, Leonard, Michael <Mi...@opco.com> wrote:
> 
> Hi,
>  
> I work at a large financial institution. I’m exploring deploying Hadoop and I’m trying to understand why I would deploy on premise when the cloud is faster and easier. What are the pros/cons of each? How does pricing compare between on premise and cloud deployments?
>  
> Any color would be very helpful. Thank you in advance.
>  
> Sincerely,
> Michael
>  
>  
> This communication and any attached files may contain information that is confidential or privileged. If this communication has been received in error, please delete or destroy it immediately. Please go to www.opco.com/EmailDisclosures <http://www.opco.com/EmailDisclosures>

Re: Hadoop on premise versus cloud

Posted by daemeon reiydelle <da...@gmail.com>.
In addition to the data privacy concern described well below, there are a
couple of other areas you might consider (you can also respond privately
where I can be a bit more candid). . My experience with most banks (I work
with most of the players in the EU and US) are such that (1) below drives
development heavily into the cloud, in spite of (2)


   1. Physical plant processes?
   1. Your culture and processes may add months to a hardware delivery
      cycle for production systems,
      2. Does the data center under your employer's control even have the
      available racks, power, network ports and router software versions to
      support (bonding/stacking/teaming multiple 10gbit ports)
      3. Big data gets much less interesting and viable when layers like
      SAN and heavy hypervisors (VMware, Citrix) get into the mix, ditto "full
      nightly backups" and other interesting confusions about the tech, not to
      mention  heavily committed to this, and requier full DR and zero
data loss
      backups.
      2. Data issues
   1. Are the data sources "inside" your employer's network which would
      require extra authorizations to allow them to connect to a cloud provider?
      2. Are the consumers of the data going to be able to access the
      cluster (similar questions if an intermediating data manipulation tool is
      access by your employee/consumers)
   3. As to data privacy
      1. There are several data center providers who are legally and
      entirely based within either the continental (Netherlands, Germany) or
      adding UK if you think that is an alternative. To my knowledge Amazon is
      not yet there but will be, I do not know if it is generally available but
      Google Compute does have such ringfenced facilities in the EU, etc.
   4. Now the real motiviations:
      1. Startup costs as you figure out your data ingest complexity and as
      your user expectations get clarified mean you seldom know what you will
      need in a manner that management needs for their planning cycle.
      2. Capital (hardware) costs are zero and all costs can be written off
      in the current period. (Management has no hit to their capital expenses
      budget)
      3. Chainging (increasing ;{) costs can be directly tied to specific
      activities as they occur (customer wants more X, additional data
sources Y,
      and data Z% dirtier than expected ...  you know the drill, yes?)




*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Oct 26, 2015 at 8:01 AM, Leonard, Michael <Mi...@opco.com>
wrote:

> Hi,
>
>
>
> I work at a large financial institution. I’m exploring deploying Hadoop
> and I’m trying to understand why I would deploy on premise when the cloud
> is faster and easier. What are the pros/cons of each? How does pricing
> compare between on premise and cloud deployments?
>
>
>
> Any color would be very helpful. Thank you in advance.
>
>
>
> Sincerely,
>
> Michael
>
>
>
>
> This communication and any attached files may contain information that is
> confidential or privileged. If this communication has been received in
> error, please delete or destroy it immediately. Please go to
> www.opco.com/EmailDisclosures
>