You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "ados1984@gmail.com" <ad...@gmail.com> on 2014/03/13 14:58:23 UTC

Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture
and so after discussing my current scenario with experts here, i think it
would be better for me to start using sandbox version rather then using
actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or
Cloudera, my goal is to get started as soon as possible and not spend most
time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera
Impala is more efficient and provides near real time ad-hoc queries
capabilities and based on that am thinking of going towards Cloudera
sandbox distribution and wanted to learn from experts opinion before moving
in that direction.

Also - if am going through sandbox approach, what kind of cluster
configuration can i have, meaning how many slave and master nodes will
sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.

Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thanks Nick, appreciate your inputs on this.


On Thu, Mar 13, 2014 at 12:51 PM, Martin, Nick <Ni...@pssd.com> wrote:

>  Start here
> http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
>
>
>
> The list of things you might consider before picking a distribution is
> quite likely limited only by one's imagination. So, start with the basics
> like hosted vs. in-house, what your use case(s) cover, etc. Basically,
> anything you'd consider when looking at a new technology solution to
> address a need at your organization. If that doesn't get you to a list of
> things you need to consider then do a search for something akin to
> "choosing a Hadoop distribution" and maybe that'll spark some thoughts.
>
>
>
> Best of luck, happy researching!
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 10:22 AM
> *To:* user
> *Subject:* Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Thank you Martin. I will make sure that I do not have vendor specific
> question on this forum.
>
>
>
> But since am starting out with Hadoop, I wanted to learn about what are
> the keys things that we have to keep in mind while deciding on which
> distribution to take...open source hadoop, mapr m7, hortonworks HDP or
> cloudera CDH.
>
>
>
> If I can get very brief idea of factors that one should consider then it
> would certainly be very helpful to me.
>
>
>
> Thanks Again, Andy.
>
>
>
> On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:
>
> Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>
>
>

Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thanks Nick, appreciate your inputs on this.


On Thu, Mar 13, 2014 at 12:51 PM, Martin, Nick <Ni...@pssd.com> wrote:

>  Start here
> http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
>
>
>
> The list of things you might consider before picking a distribution is
> quite likely limited only by one's imagination. So, start with the basics
> like hosted vs. in-house, what your use case(s) cover, etc. Basically,
> anything you'd consider when looking at a new technology solution to
> address a need at your organization. If that doesn't get you to a list of
> things you need to consider then do a search for something akin to
> "choosing a Hadoop distribution" and maybe that'll spark some thoughts.
>
>
>
> Best of luck, happy researching!
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 10:22 AM
> *To:* user
> *Subject:* Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Thank you Martin. I will make sure that I do not have vendor specific
> question on this forum.
>
>
>
> But since am starting out with Hadoop, I wanted to learn about what are
> the keys things that we have to keep in mind while deciding on which
> distribution to take...open source hadoop, mapr m7, hortonworks HDP or
> cloudera CDH.
>
>
>
> If I can get very brief idea of factors that one should consider then it
> would certainly be very helpful to me.
>
>
>
> Thanks Again, Andy.
>
>
>
> On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:
>
> Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>
>
>

Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thanks Nick, appreciate your inputs on this.


On Thu, Mar 13, 2014 at 12:51 PM, Martin, Nick <Ni...@pssd.com> wrote:

>  Start here
> http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
>
>
>
> The list of things you might consider before picking a distribution is
> quite likely limited only by one's imagination. So, start with the basics
> like hosted vs. in-house, what your use case(s) cover, etc. Basically,
> anything you'd consider when looking at a new technology solution to
> address a need at your organization. If that doesn't get you to a list of
> things you need to consider then do a search for something akin to
> "choosing a Hadoop distribution" and maybe that'll spark some thoughts.
>
>
>
> Best of luck, happy researching!
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 10:22 AM
> *To:* user
> *Subject:* Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Thank you Martin. I will make sure that I do not have vendor specific
> question on this forum.
>
>
>
> But since am starting out with Hadoop, I wanted to learn about what are
> the keys things that we have to keep in mind while deciding on which
> distribution to take...open source hadoop, mapr m7, hortonworks HDP or
> cloudera CDH.
>
>
>
> If I can get very brief idea of factors that one should consider then it
> would certainly be very helpful to me.
>
>
>
> Thanks Again, Andy.
>
>
>
> On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:
>
> Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>
>
>

Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thanks Nick, appreciate your inputs on this.


On Thu, Mar 13, 2014 at 12:51 PM, Martin, Nick <Ni...@pssd.com> wrote:

>  Start here
> http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
>
>
>
> The list of things you might consider before picking a distribution is
> quite likely limited only by one's imagination. So, start with the basics
> like hosted vs. in-house, what your use case(s) cover, etc. Basically,
> anything you'd consider when looking at a new technology solution to
> address a need at your organization. If that doesn't get you to a list of
> things you need to consider then do a search for something akin to
> "choosing a Hadoop distribution" and maybe that'll spark some thoughts.
>
>
>
> Best of luck, happy researching!
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 10:22 AM
> *To:* user
> *Subject:* Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Thank you Martin. I will make sure that I do not have vendor specific
> question on this forum.
>
>
>
> But since am starting out with Hadoop, I wanted to learn about what are
> the keys things that we have to keep in mind while deciding on which
> distribution to take...open source hadoop, mapr m7, hortonworks HDP or
> cloudera CDH.
>
>
>
> If I can get very brief idea of factors that one should consider then it
> would certainly be very helpful to me.
>
>
>
> Thanks Again, Andy.
>
>
>
> On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:
>
> Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>
>
>

RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Start here http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support

The list of things you might consider before picking a distribution is quite likely limited only by one's imagination. So, start with the basics like hosted vs. in-house, what your use case(s) cover, etc. Basically, anything you'd consider when looking at a new technology solution to address a need at your organization. If that doesn't get you to a list of things you need to consider then do a search for something akin to "choosing a Hadoop distribution" and maybe that'll spark some thoughts.

Best of luck, happy researching!

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 10:22 AM
To: user
Subject: Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Thank you Martin. I will make sure that I do not have vendor specific question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the keys things that we have to keep in mind while deciding on which distribution to take...open source hadoop, mapr m7, hortonworks HDP or cloudera CDH.

If I can get very brief idea of factors that one should consider then it would certainly be very helpful to me.

Thanks Again, Andy.

On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com>> wrote:
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com<ma...@gmail.com> [mailto:ados1984@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.


RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Start here http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support

The list of things you might consider before picking a distribution is quite likely limited only by one's imagination. So, start with the basics like hosted vs. in-house, what your use case(s) cover, etc. Basically, anything you'd consider when looking at a new technology solution to address a need at your organization. If that doesn't get you to a list of things you need to consider then do a search for something akin to "choosing a Hadoop distribution" and maybe that'll spark some thoughts.

Best of luck, happy researching!

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 10:22 AM
To: user
Subject: Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Thank you Martin. I will make sure that I do not have vendor specific question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the keys things that we have to keep in mind while deciding on which distribution to take...open source hadoop, mapr m7, hortonworks HDP or cloudera CDH.

If I can get very brief idea of factors that one should consider then it would certainly be very helpful to me.

Thanks Again, Andy.

On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com>> wrote:
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com<ma...@gmail.com> [mailto:ados1984@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.


RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Start here http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support

The list of things you might consider before picking a distribution is quite likely limited only by one's imagination. So, start with the basics like hosted vs. in-house, what your use case(s) cover, etc. Basically, anything you'd consider when looking at a new technology solution to address a need at your organization. If that doesn't get you to a list of things you need to consider then do a search for something akin to "choosing a Hadoop distribution" and maybe that'll spark some thoughts.

Best of luck, happy researching!

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 10:22 AM
To: user
Subject: Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Thank you Martin. I will make sure that I do not have vendor specific question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the keys things that we have to keep in mind while deciding on which distribution to take...open source hadoop, mapr m7, hortonworks HDP or cloudera CDH.

If I can get very brief idea of factors that one should consider then it would certainly be very helpful to me.

Thanks Again, Andy.

On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com>> wrote:
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com<ma...@gmail.com> [mailto:ados1984@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.


RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Start here http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support

The list of things you might consider before picking a distribution is quite likely limited only by one's imagination. So, start with the basics like hosted vs. in-house, what your use case(s) cover, etc. Basically, anything you'd consider when looking at a new technology solution to address a need at your organization. If that doesn't get you to a list of things you need to consider then do a search for something akin to "choosing a Hadoop distribution" and maybe that'll spark some thoughts.

Best of luck, happy researching!

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 10:22 AM
To: user
Subject: Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Thank you Martin. I will make sure that I do not have vendor specific question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the keys things that we have to keep in mind while deciding on which distribution to take...open source hadoop, mapr m7, hortonworks HDP or cloudera CDH.

If I can get very brief idea of factors that one should consider then it would certainly be very helpful to me.

Thanks Again, Andy.

On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com>> wrote:
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com<ma...@gmail.com> [mailto:ados1984@gmail.com<ma...@gmail.com>]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.


Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Martin. I will make sure that I do not have vendor specific
question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the
keys things that we have to keep in mind while deciding on which
distribution to take...open source hadoop, mapr m7, hortonworks HDP or
cloudera CDH.

If I can get very brief idea of factors that one should consider then it
would certainly be very helpful to me.

Thanks Again, Andy.


On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:

>  Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>

Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Martin. I will make sure that I do not have vendor specific
question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the
keys things that we have to keep in mind while deciding on which
distribution to take...open source hadoop, mapr m7, hortonworks HDP or
cloudera CDH.

If I can get very brief idea of factors that one should consider then it
would certainly be very helpful to me.

Thanks Again, Andy.


On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:

>  Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>

Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Martin. I will make sure that I do not have vendor specific
question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the
keys things that we have to keep in mind while deciding on which
distribution to take...open source hadoop, mapr m7, hortonworks HDP or
cloudera CDH.

If I can get very brief idea of factors that one should consider then it
would certainly be very helpful to me.

Thanks Again, Andy.


On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:

>  Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>

Re: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Martin. I will make sure that I do not have vendor specific
question on this forum.

But since am starting out with Hadoop, I wanted to learn about what are the
keys things that we have to keep in mind while deciding on which
distribution to take...open source hadoop, mapr m7, hortonworks HDP or
cloudera CDH.

If I can get very brief idea of factors that one should consider then it
would certainly be very helpful to me.

Thanks Again, Andy.


On Thu, Mar 13, 2014 at 10:17 AM, Martin, Nick <Ni...@pssd.com> wrote:

>  Hi Andy,
>
>
>
> Generally speaking, the folks participating on this list avoid questions
> of distribution preference. There are, perhaps obviously, both minor and
> significant differences in distributions that you should research and
> evaluate to find the best fit for your organization's strategy. Asking the
> members of this list to publically advocate one distribution over another
> is outside the scope of our collective purpose here, in my opinion. Upon
> thorough review of the topic history of this list you'll doubtless find the
> questions and responses are almost always distribution agnostic, which is
> how things should be with a community like this.
>
>
>
> No matter which distribution you choose, said distribution will assuredly
> have ample documentation regarding cluster configuration readily available
> via a quick search from your web browser. Further, the two distributions
> you mention below also have several methods by which you can ask their
> experts specific questions related to configuring their solutions in your
> environment (forums, separate lists, Google groups, etc.).
>
>
>
> *From:* ados1984@gmail.com [mailto:ados1984@gmail.com]
> *Sent:* Thursday, March 13, 2014 9:58 AM
> *To:* user
> *Subject:* Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
>
>
>
> Hello Team,
>
>
>
> I am initiating an POC to see value of having hadoop in our architecture
> and so after discussing my current scenario with experts here, i think it
> would be better for me to start using sandbox version rather then using
> actual distribution from POC point of view.
>
>
>
> My query here is how to decide what sandbox version to use Hortonworks or
> Cloudera, my goal is to get started as soon as possible and not spend most
> time on configuration part of the equation.
>
>
>
> Also, from online research that i have done, it appears that Cloudera
> Impala is more efficient and provides near real time ad-hoc queries
> capabilities and based on that am thinking of going towards Cloudera
> sandbox distribution and wanted to learn from experts opinion before moving
> in that direction.
>
>
>
> Also - if am going through sandbox approach, what kind of cluster
> configuration can i have, meaning how many slave and master nodes will
> sandbox support.
>
>
>
> Pardon my question if they sound to basic.
>
>
>
> Thanks again, Andy.
>

RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.

RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.

RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.

RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Posted by "Martin, Nick" <Ni...@pssd.com>.
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference. There are, perhaps obviously, both minor and significant differences in distributions that you should research and evaluate to find the best fit for your organization's strategy. Asking the members of this list to publically advocate one distribution over another is outside the scope of our collective purpose here, in my opinion. Upon thorough review of the topic history of this list you'll doubtless find the questions and responses are almost always distribution agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation regarding cluster configuration readily available via a quick search from your web browser. Further, the two distributions you mention below also have several methods by which you can ask their experts specific questions related to configuring their solutions in your environment (forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing my current scenario with experts here, i think it would be better for me to start using sandbox version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal is to get started as soon as possible and not spend most time on configuration part of the equation.

Also, from online research that i have done, it appears that Cloudera Impala is more efficient and provides near real time ad-hoc queries capabilities and based on that am thinking of going towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have, meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.