You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by so...@accenture.com on 2014/02/03 19:42:29 UTC

RE: Cluster Installation

Thanks Chris/Gary.

I have an existing Zookeeper and YARN Cluster. However, the YARN version that I have (that came preinstalled with Pivotal HD) is 2.0.5. So from what you're saying I cannot reuse it for my Samza deployment.

So then my option is:
1. Reuse zookeeper. So I'll have to configure Samza to point to the right cluster
2. Run Samza with its YARN grid and Kafka Installation (I can do this on multiple servers right? 1 RM, 2 NM kind of situation)

Thanks,
Sonali


-----Original Message-----
From: Chris Riccomini [mailto:criccomini@linkedin.com]
Sent: Friday, January 31, 2014 11:24 AM
To: dev@samza.incubator.apache.org
Subject: Re: Cluster Installation

Hey Sonali,

Everything Gary said is correct.

One other item of note is that if you're interested in running stuff locally in a dev-mode fashion, you don't need YARN. You can use the LocalJobFactory instead of the YarnJobFactory factory when configuring your job's "job.factory.class" setting.

For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka. They can be deployed using any standard way of shipping software around to a cluster of machines.

Cheers,
Chris

On 1/31/14 12:58 AM, "Garry Turkington" <g....@improvedigital.com>
wrote:

>Hi Sonali,
>
>This was something that I had some questions about originally as well.
>In terms of required components then yes, for any size of Samza
>deployment you will  need all those pieces.
>
>In terms of actual deployment, from what I understand from the LinkedIn
>guys they do run Samza on a dedicated YARN grid that also has a Kafka
>broker collocated on each node. These decisions though appear to be
>more down to convenience than a hard requirement.
>
>In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>pointing Samza at but do need to run a dedicated YARN grid because my
>Hadoop cluster has a pre-2.2 version of YARN running on it.
>
>So if you have existing components you can reuse them, if not then
>repurposing the Hello Samza package is a good starting point to get all
>the things you want on the required hosts. Only caveat would be to not
>drop a ZK node on each host, the ZK quorum should follow the usual
>advice of an odd number of servers and likely no more than 3, 5 or 7
>depending on your deployment size.
>
>Garry
>
>-----Original Message-----
>From: sonali.parthasarathy@accenture.com
>[mailto:sonali.parthasarathy@accenture.com]
>Sent: 30 January 2014 23:38
>To: dev@samza.incubator.apache.org
>Subject: Cluster Installation
>
>Hi All,
>
>I'm new to working with Samza and have been trying to figure out the
>best cluster configuration. I understand that Samza comes with
>yarn,kafka and zookeeper out of the box. Is that the model just for a
>standalone/local configuration. What if I want a bigger cluster? Do I
>have to install yarn, kafka and zookeeper separately? Any suggestions would be great!
>
>Thanks,
>Sonali
>
>Sonali Parthasarathy
>R&D Developer, Data Insights
>Accenture Technology Labs
>703-341-7432
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture
>and its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy. .
>_______________________________________________________________________
>___
>____________
>
>www.accenture.com
>
>-----
>No virus found in this message.
>Checked by AVG - www.avg.com
>Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>01/30/14



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. .
______________________________________________________________________________________

www.accenture.com


Re: Cluster Installation

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey TJ,

Also, for reference, here's an example yarn-site.xml:

  http://pastebin.com/6B90YbQh

Cheers,
Chris

On 2/20/14 9:16 AM, "Chris Riccomini" <cr...@linkedin.com> wrote:

>Hey TJ,
>
>The yarn-site.xml file is found via the YARN_HOME environment variable.
>This variable must be set (export YARN_HOME=Š) when you start your NM.
>From there on out, everything gets access to it. When the AM creates a
>YarnConfiguration, the object will load its values from the yarn-site.xml
>(and use the YARN_HOME environment variable to find its location).
>
>You should also verify that your yarn-site.xml for the NMs is
>appropriately configured to point at the RM's host/port.
>
>Also, when you go to the RM's webpage, do you see all of your Active Nodes
>listed? (http://your-rm-host:port/cluster/nodes)
>
>Cheers,
>Chris
>
>On 2/20/14 1:06 AM, "TJ Giuli" <tg...@skyportsystems.com> wrote:
>
>>Hi, to follow up on this thread of discussion, I¹ve got a three-node
>>Cloudera CDH5 YARN cluster running and I¹m having some problems deploying
>>Samza jobs on the grid.  All of the nodes are running a NodeManager and
>>just one is running a ResourceManager.  If the ApplicationMaster is
>>deployed to the node with the RM, everything is fine.  However, if the
>>job is deployed to one of the other two hosts, the job fails.  Looking at
>>the AM log (http://pastebin.com/VxbLiWST), the AM is trying to contact
>>the cluster ResourceManager at 0.0.0.0:8030, which is a YARN default.
>>Nothing is at 0.0.0.0, so the job eventually dies.
>>
>>It looks like yarn-site.xml is not being read by any component of the
>>system and so it¹s failing back to the default value for the
>>ResourceManager¹s address.  Looking at the code, it seems that
>>org.apache.samza.job.yarn.SamzaAppMaster creates a new YarnConfiguration
>>object and passes it to ClientHelper.  Is yarn-site.xml being read in
>>somewhere?  Am I missing some key configuration?  Thanks!
>>‹T
>>
>>On Feb 5, 2014, at 5:59 PM, Chris Riccomini <cr...@linkedin.com>
>>wrote:
>>
>>> Hey Sonali,
>>> 
>>> The next step you need to take is to build your Samza job package (the
>>> .tgz file that contains bin and lib directories). Take a look at
>>> hello-samza, which shows how to build a .tar.gz file with the
>>>appropriate
>>> files in it.
>>> 
>>> Once you have the .tar.gz file built, you need to publish it somewhere.
>>> This can be HDFS or an HTTP server.
>>> 
>>> == IF YOU USE HDFS, SKIP THIS STEP ==
>>> 
>>> At LinkedIn, we use an HTTP server. The easiest way to hack this up for
>>> testing is to start a local HTTP server on your developer box with
>>>Python:
>>> 
>>>  python -m SimpleHTTPServer
>>> 
>>> This command will start a simple HTTP server serving files from the
>>> current working directory. So, running that command from the directory
>>> with your .tar.gz job package should work.
>>> 
>>> You then need to setup your NMs to be able to read HTTP files, since
>>> Hadoop doesn't support an HTTP-based file system implementation out of
>>>the
>>> box. Fortunately, Samza ships with one. To use it, you need to do two
>>> things:
>>> 
>>> First, add this to your NM's core-site.xml:
>>> 
>>> <configuration>
>>>  <property>
>>>    <name>fs.http.impl</name>
>>>    <value>org.apache.samza.util.hadoop.HttpFileSystem</value>
>>>  </property>
>>> </configuration>
>>> 
>>> Second, make sure that you put the following jars into your NM's class
>>> path:
>>> 
>>> 
>>> * grizzled-slf4j
>>> * samza-yarn
>>> * scala-compiler
>>> * scala-library
>>> 
>>> Make sure that all of these libraries match the same version of Scala
>>>that
>>> samza-yarn was built with.
>>> 
>>> The easiest way to add everything to your NM's class path is to put the
>>> files in the lib directory:
>>> 
>>>  hadoop-2.2.0/share/hadoop/hdfs/lib
>>> 
>>> == END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION ==
>>> 
>>> 
>>> Now, you should have a .tar.gz file with a URI that's either:
>>> 
>>>  hdfs://foo/bar/your-job-package.tar.gz
>>> 
>>> Or:
>>> 
>>>  http://192.168.0.1/your-job-package.tar.gz
>>> 
>>> This path (either the HDFS or HTTP one, depending on which you chose to
>>> use) is what you should set your yarn.package.path configuration
>>>parameter
>>> to in your job's configuration file.
>>> 
>>>  yarn.package.path=http://192.168.0.1/your-job-package.tar.gz
>>> 
>>> This tells YARN's NMs where to download your job package from when YARN
>>> begins running it in the grid.
>>> 
>>> Finally, you'll want to start your job!
>>> 
>>> 1. Make sure that you're using the YarnJobRunner for your
>>> job.factory.class configuration setting (see hello-samza for an
>>>example).
>>> 2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on
>>> your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that
>>>there's
>>> a "conf" directory there. This is mandatory.
>>> 3. Setup an environment variable called YARN_HOME that points to the
>>> directory that has "conf" directory in it:
>>> 
>>>  export YARN_HOME=~/.yarn
>>> 
>>> 4. Execute your job with run-job.sh (see
>>> http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an
>>> example).
>>> 
>>> This should start the job on your YARN grid.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On 2/5/14 5:40 PM, "sonali.parthasarathy@accenture.com"
>>> <so...@accenture.com> wrote:
>>> 
>>>> Hi Chris,
>>>> 
>>>> So this is what I have now:
>>>> 1.  YARN-Cluster with 1 RM and 2NMs
>>>> 2.  Kafka broker running on each NM
>>>> 3.  Zookeeper running on the RM
>>>> 4. I downloaded and published(gradlew) the incubator-samza project.
>>>>It's
>>>> in my /root/m2 repository ready to be used by my project(when I create
>>>> one)
>>>> 
>>>> Where do I go from here? How do I get Samza to point to this setup
>>>> exactly?
>>>> 
>>>> Thanks,
>>>> Sonali
>>>> 
>>>> -----Original Message-----
>>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>>> Sent: Monday, February 03, 2014 12:10 PM
>>>> To: dev@samza.incubator.apache.org
>>>> Subject: Re: Cluster Installation
>>>> 
>>>> Hey Sonali,
>>>> 
>>>> You will need to setup separately in order to configure your
>>>> yarn-site.xml files for the NMs to point to the RM's host/port. They
>>>> default to localhost, which is what hello-samza is using.
>>>> 
>>>> On the Kafka side, the same things applies- you'll need to configure
>>>>each
>>>> broker with a unique broker id, etc.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On 2/3/14 11:25 AM, "sonali.parthasarathy@accenture.com"
>>>> <so...@accenture.com> wrote:
>>>> 
>>>>> Ah, makes sense
>>>>> 
>>>>> So to have a cluster setup with RM and NMs running on different
>>>>>nodes,
>>>>> Can I reuse the "grid" script from "hello-samza"? or will I have to
>>>>>do
>>>>> the setup separately and then change the config files on samza?
>>>>> 
>>>>> Thanks,
>>>>> Sonali
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>>>> Sent: Monday, February 03, 2014 11:02 AM
>>>>> To: dev@samza.incubator.apache.org
>>>>> Subject: Re: Cluster Installation
>>>>> 
>>>>> Hey Sonali,
>>>>> 
>>>>> I believe the point at which YARN became version compatible for 2.*
>>>>>as
>>>>> at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>>>>> versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>>>>> your YARN grid, or use a different one with a higher version.
>>>>> 
>>>>> For its part, Samza should work with YARN grids 2.1.0-beta and
>>>>>beyond,
>>>>> though I haven't tested this. The YARN community has given a
>>>>>commitment
>>>>> to maintaining API compatibility going forward for YARN 2.*, which
>>>>> means that future upgrades should not be required, until YARN 3 comes
>>>>> out.
>>>>> 
>>>>> The rest of your understanding is correct. You can run a 1 RM, 2 NM
>>>>> kind of cluster, throw some Kafka brokers on there, and you should be
>>>>> good to go. You can also re-use your existing ZK, if you wish.
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>> On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
>>>>> <so...@accenture.com> wrote:
>>>>> 
>>>>>> Thanks Chris/Gary.
>>>>>> 
>>>>>> I have an existing Zookeeper and YARN Cluster. However, the YARN
>>>>>> version that I have (that came preinstalled with Pivotal HD) is
>>>>>>2.0.5.
>>>>>> So from what you're saying I cannot reuse it for my Samza
>>>>>>deployment.
>>>>>> 
>>>>>> So then my option is:
>>>>>> 1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>>>>> right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>>>>> (I can do this on multiple servers right? 1 RM, 2 NM kind of
>>>>>> situation)
>>>>>> 
>>>>>> Thanks,
>>>>>> Sonali
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>>>>> Sent: Friday, January 31, 2014 11:24 AM
>>>>>> To: dev@samza.incubator.apache.org
>>>>>> Subject: Re: Cluster Installation
>>>>>> 
>>>>>> Hey Sonali,
>>>>>> 
>>>>>> Everything Gary said is correct.
>>>>>> 
>>>>>> One other item of note is that if you're interested in running stuff
>>>>>> locally in a dev-mode fashion, you don't need YARN. You can use the
>>>>>> LocalJobFactory instead of the YarnJobFactory factory when
>>>>>>configuring
>>>>>> your job's "job.factory.class" setting.
>>>>>> 
>>>>>> For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>>>>> They can be deployed using any standard way of shipping software
>>>>>> around to a cluster of machines.
>>>>>> 
>>>>>> Cheers,
>>>>>> Chris
>>>>>> 
>>>>>> On 1/31/14 12:58 AM, "Garry Turkington"
>>>>>> <g....@improvedigital.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Sonali,
>>>>>>> 
>>>>>>> This was something that I had some questions about originally as
>>>>>>>well.
>>>>>>> In terms of required components then yes, for any size of Samza
>>>>>>> deployment you will  need all those pieces.
>>>>>>> 
>>>>>>> In terms of actual deployment, from what I understand from the
>>>>>>> LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>>>>> has a Kafka broker collocated on each node. These decisions though
>>>>>>> appear to be more down to convenience than a hard requirement.
>>>>>>> 
>>>>>>> In my own setup I have existing ZooKeeper and Kafka clusters that
>>>>>>>I'm
>>>>>>> pointing Samza at but do need to run a dedicated YARN grid because
>>>>>>>my
>>>>>>> Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>>>>> 
>>>>>>> So if you have existing components you can reuse them, if not then
>>>>>>> repurposing the Hello Samza package is a good starting point to get
>>>>>>> all the things you want on the required hosts. Only caveat would be
>>>>>>> to not drop a ZK node on each host, the ZK quorum should follow the
>>>>>>> usual advice of an odd number of servers and likely no more than 3,
>>>>>>>5
>>>>>>> or 7 depending on your deployment size.
>>>>>>> 
>>>>>>> Garry
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: sonali.parthasarathy@accenture.com
>>>>>>> [mailto:sonali.parthasarathy@accenture.com]
>>>>>>> Sent: 30 January 2014 23:38
>>>>>>> To: dev@samza.incubator.apache.org
>>>>>>> Subject: Cluster Installation
>>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> I'm new to working with Samza and have been trying to figure out
>>>>>>>the
>>>>>>> best cluster configuration. I understand that Samza comes with
>>>>>>> yarn,kafka and zookeeper out of the box. Is that the model just for
>>>>>>>a
>>>>>>> standalone/local configuration. What if I want a bigger cluster? Do
>>>>>>>I
>>>>>>> have to install yarn, kafka and zookeeper separately? Any
>>>>>>>suggestions
>>>>>>> would be great!
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sonali
>>>>>>> 
>>>>>>> Sonali Parthasarathy
>>>>>>> R&D Developer, Data Insights
>>>>>>> Accenture Technology Labs
>>>>>>> 703-341-7432
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________
>>>>>>> 
>>>>>>> This message is for the designated recipient only and may contain
>>>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>>> you have received it in error, please notify the sender immediately
>>>>>>> and delete the original. Any other use of the e-mail by you is
>>>>>>> prohibited.
>>>>>>> Where allowed by local law, electronic communications with
>>>>>>>Accenture
>>>>>>> and its affiliates, including e-mail and instant messaging
>>>>>>>(including
>>>>>>> content), may be scanned by our systems for the purposes of
>>>>>>> information security and assessment of internal compliance with
>>>>>>> Accenture policy. .
>>>>>>> 
>>>>>>>____________________________________________________________________
>>>>>>>_
>>>>>>> _
>>>>>>> _
>>>>>>> ___
>>>>>>> ____________
>>>>>>> 
>>>>>>> www.accenture.com
>>>>>>> 
>>>>>>> -----
>>>>>>> No virus found in this message.
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>>>>> 01/30/14
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> This message is for the designated recipient only and may contain
>>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>>you
>>>>>> have received it in error, please notify the sender immediately and
>>>>>> delete the original. Any other use of the e-mail by you is
>>>>>>prohibited.
>>>>>> Where allowed by local law, electronic communications with Accenture
>>>>>> and its affiliates, including e-mail and instant messaging
>>>>>>(including
>>>>>> content), may be scanned by our systems for the purposes of
>>>>>> information security and assessment of internal compliance with
>>>>>> Accenture policy. .
>>>>>> 
>>>>>>_____________________________________________________________________
>>>>>>_
>>>>>> _
>>>>>> ___
>>>>>> ____________
>>>>>> 
>>>>>> www.accenture.com
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>you
>>>>> have received it in error, please notify the sender immediately and
>>>>> delete the original. Any other use of the e-mail by you is
>>>>>prohibited.
>>>>> Where allowed by local law, electronic communications with Accenture
>>>>> and its affiliates, including e-mail and instant messaging (including
>>>>> content), may be scanned by our systems for the purposes of
>>>>>information
>>>>> security and assessment of internal compliance with Accenture policy.
>>>>>.
>>>>> 
>>>>>______________________________________________________________________
>>>>>_
>>>>> ___
>>>>> ____________
>>>>> 
>>>>> www.accenture.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> 
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you
>>>> have received it in error, please notify the sender immediately and
>>>> delete the original. Any other use of the e-mail by you is prohibited.
>>>> Where allowed by local law, electronic communications with Accenture
>>>>and
>>>> its affiliates, including e-mail and instant messaging (including
>>>> content), may be scanned by our systems for the purposes of
>>>>information
>>>> security and assessment of internal compliance with Accenture policy.
>>>>.
>>>> 
>>>>_______________________________________________________________________
>>>>_
>>>>__
>>>> ____________
>>>> 
>>>> www.accenture.com
>>>> 
>>> 
>>
>


Re: Cluster Installation

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey TJ,

The yarn-site.xml file is found via the YARN_HOME environment variable.
This variable must be set (export YARN_HOME=Š) when you start your NM.
>From there on out, everything gets access to it. When the AM creates a
YarnConfiguration, the object will load its values from the yarn-site.xml
(and use the YARN_HOME environment variable to find its location).

You should also verify that your yarn-site.xml for the NMs is
appropriately configured to point at the RM's host/port.

Also, when you go to the RM's webpage, do you see all of your Active Nodes
listed? (http://your-rm-host:port/cluster/nodes)

Cheers,
Chris

On 2/20/14 1:06 AM, "TJ Giuli" <tg...@skyportsystems.com> wrote:

>Hi, to follow up on this thread of discussion, I¹ve got a three-node
>Cloudera CDH5 YARN cluster running and I¹m having some problems deploying
>Samza jobs on the grid.  All of the nodes are running a NodeManager and
>just one is running a ResourceManager.  If the ApplicationMaster is
>deployed to the node with the RM, everything is fine.  However, if the
>job is deployed to one of the other two hosts, the job fails.  Looking at
>the AM log (http://pastebin.com/VxbLiWST), the AM is trying to contact
>the cluster ResourceManager at 0.0.0.0:8030, which is a YARN default.
>Nothing is at 0.0.0.0, so the job eventually dies.
>
>It looks like yarn-site.xml is not being read by any component of the
>system and so it¹s failing back to the default value for the
>ResourceManager¹s address.  Looking at the code, it seems that
>org.apache.samza.job.yarn.SamzaAppMaster creates a new YarnConfiguration
>object and passes it to ClientHelper.  Is yarn-site.xml being read in
>somewhere?  Am I missing some key configuration?  Thanks!
>‹T
>
>On Feb 5, 2014, at 5:59 PM, Chris Riccomini <cr...@linkedin.com>
>wrote:
>
>> Hey Sonali,
>> 
>> The next step you need to take is to build your Samza job package (the
>> .tgz file that contains bin and lib directories). Take a look at
>> hello-samza, which shows how to build a .tar.gz file with the
>>appropriate
>> files in it.
>> 
>> Once you have the .tar.gz file built, you need to publish it somewhere.
>> This can be HDFS or an HTTP server.
>> 
>> == IF YOU USE HDFS, SKIP THIS STEP ==
>> 
>> At LinkedIn, we use an HTTP server. The easiest way to hack this up for
>> testing is to start a local HTTP server on your developer box with
>>Python:
>> 
>>  python -m SimpleHTTPServer
>> 
>> This command will start a simple HTTP server serving files from the
>> current working directory. So, running that command from the directory
>> with your .tar.gz job package should work.
>> 
>> You then need to setup your NMs to be able to read HTTP files, since
>> Hadoop doesn't support an HTTP-based file system implementation out of
>>the
>> box. Fortunately, Samza ships with one. To use it, you need to do two
>> things:
>> 
>> First, add this to your NM's core-site.xml:
>> 
>> <configuration>
>>  <property>
>>    <name>fs.http.impl</name>
>>    <value>org.apache.samza.util.hadoop.HttpFileSystem</value>
>>  </property>
>> </configuration>
>> 
>> Second, make sure that you put the following jars into your NM's class
>> path:
>> 
>> 
>> * grizzled-slf4j
>> * samza-yarn
>> * scala-compiler
>> * scala-library
>> 
>> Make sure that all of these libraries match the same version of Scala
>>that
>> samza-yarn was built with.
>> 
>> The easiest way to add everything to your NM's class path is to put the
>> files in the lib directory:
>> 
>>  hadoop-2.2.0/share/hadoop/hdfs/lib
>> 
>> == END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION ==
>> 
>> 
>> Now, you should have a .tar.gz file with a URI that's either:
>> 
>>  hdfs://foo/bar/your-job-package.tar.gz
>> 
>> Or:
>> 
>>  http://192.168.0.1/your-job-package.tar.gz
>> 
>> This path (either the HDFS or HTTP one, depending on which you chose to
>> use) is what you should set your yarn.package.path configuration
>>parameter
>> to in your job's configuration file.
>> 
>>  yarn.package.path=http://192.168.0.1/your-job-package.tar.gz
>> 
>> This tells YARN's NMs where to download your job package from when YARN
>> begins running it in the grid.
>> 
>> Finally, you'll want to start your job!
>> 
>> 1. Make sure that you're using the YarnJobRunner for your
>> job.factory.class configuration setting (see hello-samza for an
>>example).
>> 2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on
>> your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that
>>there's
>> a "conf" directory there. This is mandatory.
>> 3. Setup an environment variable called YARN_HOME that points to the
>> directory that has "conf" directory in it:
>> 
>>  export YARN_HOME=~/.yarn
>> 
>> 4. Execute your job with run-job.sh (see
>> http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an
>> example).
>> 
>> This should start the job on your YARN grid.
>> 
>> Cheers,
>> Chris
>> 
>> On 2/5/14 5:40 PM, "sonali.parthasarathy@accenture.com"
>> <so...@accenture.com> wrote:
>> 
>>> Hi Chris,
>>> 
>>> So this is what I have now:
>>> 1.  YARN-Cluster with 1 RM and 2NMs
>>> 2.  Kafka broker running on each NM
>>> 3.  Zookeeper running on the RM
>>> 4. I downloaded and published(gradlew) the incubator-samza project.
>>>It's
>>> in my /root/m2 repository ready to be used by my project(when I create
>>> one)
>>> 
>>> Where do I go from here? How do I get Samza to point to this setup
>>> exactly?
>>> 
>>> Thanks,
>>> Sonali
>>> 
>>> -----Original Message-----
>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>> Sent: Monday, February 03, 2014 12:10 PM
>>> To: dev@samza.incubator.apache.org
>>> Subject: Re: Cluster Installation
>>> 
>>> Hey Sonali,
>>> 
>>> You will need to setup separately in order to configure your
>>> yarn-site.xml files for the NMs to point to the RM's host/port. They
>>> default to localhost, which is what hello-samza is using.
>>> 
>>> On the Kafka side, the same things applies- you'll need to configure
>>>each
>>> broker with a unique broker id, etc.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On 2/3/14 11:25 AM, "sonali.parthasarathy@accenture.com"
>>> <so...@accenture.com> wrote:
>>> 
>>>> Ah, makes sense
>>>> 
>>>> So to have a cluster setup with RM and NMs running on different nodes,
>>>> Can I reuse the "grid" script from "hello-samza"? or will I have to do
>>>> the setup separately and then change the config files on samza?
>>>> 
>>>> Thanks,
>>>> Sonali
>>>> 
>>>> -----Original Message-----
>>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>>> Sent: Monday, February 03, 2014 11:02 AM
>>>> To: dev@samza.incubator.apache.org
>>>> Subject: Re: Cluster Installation
>>>> 
>>>> Hey Sonali,
>>>> 
>>>> I believe the point at which YARN became version compatible for 2.* as
>>>> at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>>>> versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>>>> your YARN grid, or use a different one with a higher version.
>>>> 
>>>> For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
>>>> though I haven't tested this. The YARN community has given a
>>>>commitment
>>>> to maintaining API compatibility going forward for YARN 2.*, which
>>>> means that future upgrades should not be required, until YARN 3 comes
>>>> out.
>>>> 
>>>> The rest of your understanding is correct. You can run a 1 RM, 2 NM
>>>> kind of cluster, throw some Kafka brokers on there, and you should be
>>>> good to go. You can also re-use your existing ZK, if you wish.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
>>>> <so...@accenture.com> wrote:
>>>> 
>>>>> Thanks Chris/Gary.
>>>>> 
>>>>> I have an existing Zookeeper and YARN Cluster. However, the YARN
>>>>> version that I have (that came preinstalled with Pivotal HD) is
>>>>>2.0.5.
>>>>> So from what you're saying I cannot reuse it for my Samza deployment.
>>>>> 
>>>>> So then my option is:
>>>>> 1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>>>> right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>>>> (I can do this on multiple servers right? 1 RM, 2 NM kind of
>>>>> situation)
>>>>> 
>>>>> Thanks,
>>>>> Sonali
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>>>> Sent: Friday, January 31, 2014 11:24 AM
>>>>> To: dev@samza.incubator.apache.org
>>>>> Subject: Re: Cluster Installation
>>>>> 
>>>>> Hey Sonali,
>>>>> 
>>>>> Everything Gary said is correct.
>>>>> 
>>>>> One other item of note is that if you're interested in running stuff
>>>>> locally in a dev-mode fashion, you don't need YARN. You can use the
>>>>> LocalJobFactory instead of the YarnJobFactory factory when
>>>>>configuring
>>>>> your job's "job.factory.class" setting.
>>>>> 
>>>>> For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>>>> They can be deployed using any standard way of shipping software
>>>>> around to a cluster of machines.
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>> On 1/31/14 12:58 AM, "Garry Turkington"
>>>>> <g....@improvedigital.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Sonali,
>>>>>> 
>>>>>> This was something that I had some questions about originally as
>>>>>>well.
>>>>>> In terms of required components then yes, for any size of Samza
>>>>>> deployment you will  need all those pieces.
>>>>>> 
>>>>>> In terms of actual deployment, from what I understand from the
>>>>>> LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>>>> has a Kafka broker collocated on each node. These decisions though
>>>>>> appear to be more down to convenience than a hard requirement.
>>>>>> 
>>>>>> In my own setup I have existing ZooKeeper and Kafka clusters that
>>>>>>I'm
>>>>>> pointing Samza at but do need to run a dedicated YARN grid because
>>>>>>my
>>>>>> Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>>>> 
>>>>>> So if you have existing components you can reuse them, if not then
>>>>>> repurposing the Hello Samza package is a good starting point to get
>>>>>> all the things you want on the required hosts. Only caveat would be
>>>>>> to not drop a ZK node on each host, the ZK quorum should follow the
>>>>>> usual advice of an odd number of servers and likely no more than 3,
>>>>>>5
>>>>>> or 7 depending on your deployment size.
>>>>>> 
>>>>>> Garry
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: sonali.parthasarathy@accenture.com
>>>>>> [mailto:sonali.parthasarathy@accenture.com]
>>>>>> Sent: 30 January 2014 23:38
>>>>>> To: dev@samza.incubator.apache.org
>>>>>> Subject: Cluster Installation
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> I'm new to working with Samza and have been trying to figure out the
>>>>>> best cluster configuration. I understand that Samza comes with
>>>>>> yarn,kafka and zookeeper out of the box. Is that the model just for
>>>>>>a
>>>>>> standalone/local configuration. What if I want a bigger cluster? Do
>>>>>>I
>>>>>> have to install yarn, kafka and zookeeper separately? Any
>>>>>>suggestions
>>>>>> would be great!
>>>>>> 
>>>>>> Thanks,
>>>>>> Sonali
>>>>>> 
>>>>>> Sonali Parthasarathy
>>>>>> R&D Developer, Data Insights
>>>>>> Accenture Technology Labs
>>>>>> 703-341-7432
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> 
>>>>>> This message is for the designated recipient only and may contain
>>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>> you have received it in error, please notify the sender immediately
>>>>>> and delete the original. Any other use of the e-mail by you is
>>>>>> prohibited.
>>>>>> Where allowed by local law, electronic communications with Accenture
>>>>>> and its affiliates, including e-mail and instant messaging
>>>>>>(including
>>>>>> content), may be scanned by our systems for the purposes of
>>>>>> information security and assessment of internal compliance with
>>>>>> Accenture policy. .
>>>>>> 
>>>>>>_____________________________________________________________________
>>>>>> _
>>>>>> _
>>>>>> ___
>>>>>> ____________
>>>>>> 
>>>>>> www.accenture.com
>>>>>> 
>>>>>> -----
>>>>>> No virus found in this message.
>>>>>> Checked by AVG - www.avg.com
>>>>>> Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>>>> 01/30/14
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>>you
>>>>> have received it in error, please notify the sender immediately and
>>>>> delete the original. Any other use of the e-mail by you is
>>>>>prohibited.
>>>>> Where allowed by local law, electronic communications with Accenture
>>>>> and its affiliates, including e-mail and instant messaging (including
>>>>> content), may be scanned by our systems for the purposes of
>>>>> information security and assessment of internal compliance with
>>>>> Accenture policy. .
>>>>> 
>>>>>______________________________________________________________________
>>>>> _
>>>>> ___
>>>>> ____________
>>>>> 
>>>>> www.accenture.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> 
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you
>>>> have received it in error, please notify the sender immediately and
>>>> delete the original. Any other use of the e-mail by you is prohibited.
>>>> Where allowed by local law, electronic communications with Accenture
>>>> and its affiliates, including e-mail and instant messaging (including
>>>> content), may be scanned by our systems for the purposes of
>>>>information
>>>> security and assessment of internal compliance with Accenture policy.
>>>>.
>>>> 
>>>>_______________________________________________________________________
>>>> ___
>>>> ____________
>>>> 
>>>> www.accenture.com
>>>> 
>>> 
>>> 
>>> 
>>> ________________________________
>>> 
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you
>>> have received it in error, please notify the sender immediately and
>>> delete the original. Any other use of the e-mail by you is prohibited.
>>> Where allowed by local law, electronic communications with Accenture
>>>and
>>> its affiliates, including e-mail and instant messaging (including
>>> content), may be scanned by our systems for the purposes of information
>>> security and assessment of internal compliance with Accenture policy. .
>>> 
>>>________________________________________________________________________
>>>__
>>> ____________
>>> 
>>> www.accenture.com
>>> 
>> 
>


Re: Cluster Installation

Posted by TJ Giuli <tg...@skyportsystems.com>.
Hi, to follow up on this thread of discussion, I’ve got a three-node Cloudera CDH5 YARN cluster running and I’m having some problems deploying Samza jobs on the grid.  All of the nodes are running a NodeManager and just one is running a ResourceManager.  If the ApplicationMaster is deployed to the node with the RM, everything is fine.  However, if the job is deployed to one of the other two hosts, the job fails.  Looking at the AM log (http://pastebin.com/VxbLiWST), the AM is trying to contact the cluster ResourceManager at 0.0.0.0:8030, which is a YARN default.  Nothing is at 0.0.0.0, so the job eventually dies.  

It looks like yarn-site.xml is not being read by any component of the system and so it’s failing back to the default value for the ResourceManager’s address.  Looking at the code, it seems that org.apache.samza.job.yarn.SamzaAppMaster creates a new YarnConfiguration object and passes it to ClientHelper.  Is yarn-site.xml being read in somewhere?  Am I missing some key configuration?  Thanks!
—T

On Feb 5, 2014, at 5:59 PM, Chris Riccomini <cr...@linkedin.com> wrote:

> Hey Sonali,
> 
> The next step you need to take is to build your Samza job package (the
> .tgz file that contains bin and lib directories). Take a look at
> hello-samza, which shows how to build a .tar.gz file with the appropriate
> files in it.
> 
> Once you have the .tar.gz file built, you need to publish it somewhere.
> This can be HDFS or an HTTP server.
> 
> == IF YOU USE HDFS, SKIP THIS STEP ==
> 
> At LinkedIn, we use an HTTP server. The easiest way to hack this up for
> testing is to start a local HTTP server on your developer box with Python:
> 
>  python -m SimpleHTTPServer
> 
> This command will start a simple HTTP server serving files from the
> current working directory. So, running that command from the directory
> with your .tar.gz job package should work.
> 
> You then need to setup your NMs to be able to read HTTP files, since
> Hadoop doesn't support an HTTP-based file system implementation out of the
> box. Fortunately, Samza ships with one. To use it, you need to do two
> things:
> 
> First, add this to your NM's core-site.xml:
> 
> <configuration>
>  <property>
>    <name>fs.http.impl</name>
>    <value>org.apache.samza.util.hadoop.HttpFileSystem</value>
>  </property>
> </configuration>
> 
> Second, make sure that you put the following jars into your NM's class
> path:
> 
> 
> * grizzled-slf4j
> * samza-yarn
> * scala-compiler
> * scala-library
> 
> Make sure that all of these libraries match the same version of Scala that
> samza-yarn was built with.
> 
> The easiest way to add everything to your NM's class path is to put the
> files in the lib directory:
> 
>  hadoop-2.2.0/share/hadoop/hdfs/lib
> 
> == END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION ==
> 
> 
> Now, you should have a .tar.gz file with a URI that's either:
> 
>  hdfs://foo/bar/your-job-package.tar.gz
> 
> Or:
> 
>  http://192.168.0.1/your-job-package.tar.gz
> 
> This path (either the HDFS or HTTP one, depending on which you chose to
> use) is what you should set your yarn.package.path configuration parameter
> to in your job's configuration file.
> 
>  yarn.package.path=http://192.168.0.1/your-job-package.tar.gz
> 
> This tells YARN's NMs where to download your job package from when YARN
> begins running it in the grid.
> 
> Finally, you'll want to start your job!
> 
> 1. Make sure that you're using the YarnJobRunner for your
> job.factory.class configuration setting (see hello-samza for an example).
> 2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on
> your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that there's
> a "conf" directory there. This is mandatory.
> 3. Setup an environment variable called YARN_HOME that points to the
> directory that has "conf" directory in it:
> 
>  export YARN_HOME=~/.yarn
> 
> 4. Execute your job with run-job.sh (see
> http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an
> example).
> 
> This should start the job on your YARN grid.
> 
> Cheers,
> Chris
> 
> On 2/5/14 5:40 PM, "sonali.parthasarathy@accenture.com"
> <so...@accenture.com> wrote:
> 
>> Hi Chris,
>> 
>> So this is what I have now:
>> 1.  YARN-Cluster with 1 RM and 2NMs
>> 2.  Kafka broker running on each NM
>> 3.  Zookeeper running on the RM
>> 4. I downloaded and published(gradlew) the incubator-samza project. It's
>> in my /root/m2 repository ready to be used by my project(when I create
>> one)
>> 
>> Where do I go from here? How do I get Samza to point to this setup
>> exactly?
>> 
>> Thanks,
>> Sonali
>> 
>> -----Original Message-----
>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>> Sent: Monday, February 03, 2014 12:10 PM
>> To: dev@samza.incubator.apache.org
>> Subject: Re: Cluster Installation
>> 
>> Hey Sonali,
>> 
>> You will need to setup separately in order to configure your
>> yarn-site.xml files for the NMs to point to the RM's host/port. They
>> default to localhost, which is what hello-samza is using.
>> 
>> On the Kafka side, the same things applies- you'll need to configure each
>> broker with a unique broker id, etc.
>> 
>> Cheers,
>> Chris
>> 
>> On 2/3/14 11:25 AM, "sonali.parthasarathy@accenture.com"
>> <so...@accenture.com> wrote:
>> 
>>> Ah, makes sense
>>> 
>>> So to have a cluster setup with RM and NMs running on different nodes,
>>> Can I reuse the "grid" script from "hello-samza"? or will I have to do
>>> the setup separately and then change the config files on samza?
>>> 
>>> Thanks,
>>> Sonali
>>> 
>>> -----Original Message-----
>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>> Sent: Monday, February 03, 2014 11:02 AM
>>> To: dev@samza.incubator.apache.org
>>> Subject: Re: Cluster Installation
>>> 
>>> Hey Sonali,
>>> 
>>> I believe the point at which YARN became version compatible for 2.* as
>>> at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>>> versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>>> your YARN grid, or use a different one with a higher version.
>>> 
>>> For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
>>> though I haven't tested this. The YARN community has given a commitment
>>> to maintaining API compatibility going forward for YARN 2.*, which
>>> means that future upgrades should not be required, until YARN 3 comes
>>> out.
>>> 
>>> The rest of your understanding is correct. You can run a 1 RM, 2 NM
>>> kind of cluster, throw some Kafka brokers on there, and you should be
>>> good to go. You can also re-use your existing ZK, if you wish.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
>>> <so...@accenture.com> wrote:
>>> 
>>>> Thanks Chris/Gary.
>>>> 
>>>> I have an existing Zookeeper and YARN Cluster. However, the YARN
>>>> version that I have (that came preinstalled with Pivotal HD) is 2.0.5.
>>>> So from what you're saying I cannot reuse it for my Samza deployment.
>>>> 
>>>> So then my option is:
>>>> 1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>>> right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>>> (I can do this on multiple servers right? 1 RM, 2 NM kind of
>>>> situation)
>>>> 
>>>> Thanks,
>>>> Sonali
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>>> Sent: Friday, January 31, 2014 11:24 AM
>>>> To: dev@samza.incubator.apache.org
>>>> Subject: Re: Cluster Installation
>>>> 
>>>> Hey Sonali,
>>>> 
>>>> Everything Gary said is correct.
>>>> 
>>>> One other item of note is that if you're interested in running stuff
>>>> locally in a dev-mode fashion, you don't need YARN. You can use the
>>>> LocalJobFactory instead of the YarnJobFactory factory when configuring
>>>> your job's "job.factory.class" setting.
>>>> 
>>>> For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>>> They can be deployed using any standard way of shipping software
>>>> around to a cluster of machines.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On 1/31/14 12:58 AM, "Garry Turkington"
>>>> <g....@improvedigital.com>
>>>> wrote:
>>>> 
>>>>> Hi Sonali,
>>>>> 
>>>>> This was something that I had some questions about originally as well.
>>>>> In terms of required components then yes, for any size of Samza
>>>>> deployment you will  need all those pieces.
>>>>> 
>>>>> In terms of actual deployment, from what I understand from the
>>>>> LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>>> has a Kafka broker collocated on each node. These decisions though
>>>>> appear to be more down to convenience than a hard requirement.
>>>>> 
>>>>> In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>>>>> pointing Samza at but do need to run a dedicated YARN grid because my
>>>>> Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>>> 
>>>>> So if you have existing components you can reuse them, if not then
>>>>> repurposing the Hello Samza package is a good starting point to get
>>>>> all the things you want on the required hosts. Only caveat would be
>>>>> to not drop a ZK node on each host, the ZK quorum should follow the
>>>>> usual advice of an odd number of servers and likely no more than 3, 5
>>>>> or 7 depending on your deployment size.
>>>>> 
>>>>> Garry
>>>>> 
>>>>> -----Original Message-----
>>>>> From: sonali.parthasarathy@accenture.com
>>>>> [mailto:sonali.parthasarathy@accenture.com]
>>>>> Sent: 30 January 2014 23:38
>>>>> To: dev@samza.incubator.apache.org
>>>>> Subject: Cluster Installation
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I'm new to working with Samza and have been trying to figure out the
>>>>> best cluster configuration. I understand that Samza comes with
>>>>> yarn,kafka and zookeeper out of the box. Is that the model just for a
>>>>> standalone/local configuration. What if I want a bigger cluster? Do I
>>>>> have to install yarn, kafka and zookeeper separately? Any suggestions
>>>>> would be great!
>>>>> 
>>>>> Thanks,
>>>>> Sonali
>>>>> 
>>>>> Sonali Parthasarathy
>>>>> R&D Developer, Data Insights
>>>>> Accenture Technology Labs
>>>>> 703-341-7432
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise confidential information. If
>>>>> you have received it in error, please notify the sender immediately
>>>>> and delete the original. Any other use of the e-mail by you is
>>>>> prohibited.
>>>>> Where allowed by local law, electronic communications with Accenture
>>>>> and its affiliates, including e-mail and instant messaging (including
>>>>> content), may be scanned by our systems for the purposes of
>>>>> information security and assessment of internal compliance with
>>>>> Accenture policy. .
>>>>> _____________________________________________________________________
>>>>> _
>>>>> _
>>>>> ___
>>>>> ____________
>>>>> 
>>>>> www.accenture.com
>>>>> 
>>>>> -----
>>>>> No virus found in this message.
>>>>> Checked by AVG - www.avg.com
>>>>> Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>>> 01/30/14
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> 
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you
>>>> have received it in error, please notify the sender immediately and
>>>> delete the original. Any other use of the e-mail by you is prohibited.
>>>> Where allowed by local law, electronic communications with Accenture
>>>> and its affiliates, including e-mail and instant messaging (including
>>>> content), may be scanned by our systems for the purposes of
>>>> information security and assessment of internal compliance with
>>>> Accenture policy. .
>>>> ______________________________________________________________________
>>>> _
>>>> ___
>>>> ____________
>>>> 
>>>> www.accenture.com
>>>> 
>>> 
>>> 
>>> 
>>> ________________________________
>>> 
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you
>>> have received it in error, please notify the sender immediately and
>>> delete the original. Any other use of the e-mail by you is prohibited.
>>> Where allowed by local law, electronic communications with Accenture
>>> and its affiliates, including e-mail and instant messaging (including
>>> content), may be scanned by our systems for the purposes of information
>>> security and assessment of internal compliance with Accenture policy. .
>>> _______________________________________________________________________
>>> ___
>>> ____________
>>> 
>>> www.accenture.com
>>> 
>> 
>> 
>> 
>> ________________________________
>> 
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have received it in error, please notify the sender immediately and
>> delete the original. Any other use of the e-mail by you is prohibited.
>> Where allowed by local law, electronic communications with Accenture and
>> its affiliates, including e-mail and instant messaging (including
>> content), may be scanned by our systems for the purposes of information
>> security and assessment of internal compliance with Accenture policy. .
>> __________________________________________________________________________
>> ____________
>> 
>> www.accenture.com
>> 
> 


Re: Cluster Installation

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey Sonali,

The next step you need to take is to build your Samza job package (the
.tgz file that contains bin and lib directories). Take a look at
hello-samza, which shows how to build a .tar.gz file with the appropriate
files in it.

Once you have the .tar.gz file built, you need to publish it somewhere.
This can be HDFS or an HTTP server.

== IF YOU USE HDFS, SKIP THIS STEP ==

At LinkedIn, we use an HTTP server. The easiest way to hack this up for
testing is to start a local HTTP server on your developer box with Python:

  python -m SimpleHTTPServer

This command will start a simple HTTP server serving files from the
current working directory. So, running that command from the directory
with your .tar.gz job package should work.

You then need to setup your NMs to be able to read HTTP files, since
Hadoop doesn't support an HTTP-based file system implementation out of the
box. Fortunately, Samza ships with one. To use it, you need to do two
things:

First, add this to your NM's core-site.xml:

<configuration>
  <property>
    <name>fs.http.impl</name>
    <value>org.apache.samza.util.hadoop.HttpFileSystem</value>
  </property>
</configuration>

Second, make sure that you put the following jars into your NM's class
path:


* grizzled-slf4j
* samza-yarn
* scala-compiler
* scala-library

Make sure that all of these libraries match the same version of Scala that
samza-yarn was built with.

The easiest way to add everything to your NM's class path is to put the
files in the lib directory:

  hadoop-2.2.0/share/hadoop/hdfs/lib

== END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION ==


Now, you should have a .tar.gz file with a URI that's either:

  hdfs://foo/bar/your-job-package.tar.gz

Or:

  http://192.168.0.1/your-job-package.tar.gz

This path (either the HDFS or HTTP one, depending on which you chose to
use) is what you should set your yarn.package.path configuration parameter
to in your job's configuration file.

  yarn.package.path=http://192.168.0.1/your-job-package.tar.gz

This tells YARN's NMs where to download your job package from when YARN
begins running it in the grid.

Finally, you'll want to start your job!

1. Make sure that you're using the YarnJobRunner for your
job.factory.class configuration setting (see hello-samza for an example).
2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on
your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that there's
a "conf" directory there. This is mandatory.
3. Setup an environment variable called YARN_HOME that points to the
directory that has "conf" directory in it:

  export YARN_HOME=~/.yarn

4. Execute your job with run-job.sh (see
http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an
example).

This should start the job on your YARN grid.

Cheers,
Chris

On 2/5/14 5:40 PM, "sonali.parthasarathy@accenture.com"
<so...@accenture.com> wrote:

>Hi Chris,
>
>So this is what I have now:
>1.  YARN-Cluster with 1 RM and 2NMs
>2.  Kafka broker running on each NM
>3.  Zookeeper running on the RM
>4. I downloaded and published(gradlew) the incubator-samza project. It's
>in my /root/m2 repository ready to be used by my project(when I create
>one)
>
>Where do I go from here? How do I get Samza to point to this setup
>exactly?
>
>Thanks,
>Sonali
>
>-----Original Message-----
>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>Sent: Monday, February 03, 2014 12:10 PM
>To: dev@samza.incubator.apache.org
>Subject: Re: Cluster Installation
>
>Hey Sonali,
>
>You will need to setup separately in order to configure your
>yarn-site.xml files for the NMs to point to the RM's host/port. They
>default to localhost, which is what hello-samza is using.
>
>On the Kafka side, the same things applies- you'll need to configure each
>broker with a unique broker id, etc.
>
>Cheers,
>Chris
>
>On 2/3/14 11:25 AM, "sonali.parthasarathy@accenture.com"
><so...@accenture.com> wrote:
>
>>Ah, makes sense
>>
>>So to have a cluster setup with RM and NMs running on different nodes,
>>Can I reuse the "grid" script from "hello-samza"? or will I have to do
>>the setup separately and then change the config files on samza?
>>
>>Thanks,
>>Sonali
>>
>>-----Original Message-----
>>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>Sent: Monday, February 03, 2014 11:02 AM
>>To: dev@samza.incubator.apache.org
>>Subject: Re: Cluster Installation
>>
>>Hey Sonali,
>>
>>I believe the point at which YARN became version compatible for 2.* as
>>at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>>versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>>your YARN grid, or use a different one with a higher version.
>>
>>For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
>>though I haven't tested this. The YARN community has given a commitment
>>to maintaining API compatibility going forward for YARN 2.*, which
>>means that future upgrades should not be required, until YARN 3 comes
>>out.
>>
>>The rest of your understanding is correct. You can run a 1 RM, 2 NM
>>kind of cluster, throw some Kafka brokers on there, and you should be
>>good to go. You can also re-use your existing ZK, if you wish.
>>
>>Cheers,
>>Chris
>>
>>On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
>><so...@accenture.com> wrote:
>>
>>>Thanks Chris/Gary.
>>>
>>>I have an existing Zookeeper and YARN Cluster. However, the YARN
>>>version that I have (that came preinstalled with Pivotal HD) is 2.0.5.
>>>So from what you're saying I cannot reuse it for my Samza deployment.
>>>
>>>So then my option is:
>>>1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>>right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>>(I can do this on multiple servers right? 1 RM, 2 NM kind of
>>>situation)
>>>
>>>Thanks,
>>>Sonali
>>>
>>>
>>>-----Original Message-----
>>>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>>Sent: Friday, January 31, 2014 11:24 AM
>>>To: dev@samza.incubator.apache.org
>>>Subject: Re: Cluster Installation
>>>
>>>Hey Sonali,
>>>
>>>Everything Gary said is correct.
>>>
>>>One other item of note is that if you're interested in running stuff
>>>locally in a dev-mode fashion, you don't need YARN. You can use the
>>>LocalJobFactory instead of the YarnJobFactory factory when configuring
>>>your job's "job.factory.class" setting.
>>>
>>>For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>>They can be deployed using any standard way of shipping software
>>>around to a cluster of machines.
>>>
>>>Cheers,
>>>Chris
>>>
>>>On 1/31/14 12:58 AM, "Garry Turkington"
>>><g....@improvedigital.com>
>>>wrote:
>>>
>>>>Hi Sonali,
>>>>
>>>>This was something that I had some questions about originally as well.
>>>>In terms of required components then yes, for any size of Samza
>>>>deployment you will  need all those pieces.
>>>>
>>>>In terms of actual deployment, from what I understand from the
>>>>LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>>has a Kafka broker collocated on each node. These decisions though
>>>>appear to be more down to convenience than a hard requirement.
>>>>
>>>>In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>>>>pointing Samza at but do need to run a dedicated YARN grid because my
>>>>Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>>
>>>>So if you have existing components you can reuse them, if not then
>>>>repurposing the Hello Samza package is a good starting point to get
>>>>all the things you want on the required hosts. Only caveat would be
>>>>to not drop a ZK node on each host, the ZK quorum should follow the
>>>>usual advice of an odd number of servers and likely no more than 3, 5
>>>>or 7 depending on your deployment size.
>>>>
>>>>Garry
>>>>
>>>>-----Original Message-----
>>>>From: sonali.parthasarathy@accenture.com
>>>>[mailto:sonali.parthasarathy@accenture.com]
>>>>Sent: 30 January 2014 23:38
>>>>To: dev@samza.incubator.apache.org
>>>>Subject: Cluster Installation
>>>>
>>>>Hi All,
>>>>
>>>>I'm new to working with Samza and have been trying to figure out the
>>>>best cluster configuration. I understand that Samza comes with
>>>>yarn,kafka and zookeeper out of the box. Is that the model just for a
>>>>standalone/local configuration. What if I want a bigger cluster? Do I
>>>>have to install yarn, kafka and zookeeper separately? Any suggestions
>>>>would be great!
>>>>
>>>>Thanks,
>>>>Sonali
>>>>
>>>>Sonali Parthasarathy
>>>>R&D Developer, Data Insights
>>>>Accenture Technology Labs
>>>>703-341-7432
>>>>
>>>>
>>>>________________________________
>>>>
>>>>This message is for the designated recipient only and may contain
>>>>privileged, proprietary, or otherwise confidential information. If
>>>>you have received it in error, please notify the sender immediately
>>>>and delete the original. Any other use of the e-mail by you is
>>>>prohibited.
>>>>Where allowed by local law, electronic communications with Accenture
>>>>and its affiliates, including e-mail and instant messaging (including
>>>>content), may be scanned by our systems for the purposes of
>>>>information security and assessment of internal compliance with
>>>>Accenture policy. .
>>>>_____________________________________________________________________
>>>>_
>>>>_
>>>>___
>>>>____________
>>>>
>>>>www.accenture.com
>>>>
>>>>-----
>>>>No virus found in this message.
>>>>Checked by AVG - www.avg.com
>>>>Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>>01/30/14
>>>
>>>
>>>
>>>________________________________
>>>
>>>This message is for the designated recipient only and may contain
>>>privileged, proprietary, or otherwise confidential information. If you
>>>have received it in error, please notify the sender immediately and
>>>delete the original. Any other use of the e-mail by you is prohibited.
>>>Where allowed by local law, electronic communications with Accenture
>>>and its affiliates, including e-mail and instant messaging (including
>>>content), may be scanned by our systems for the purposes of
>>>information security and assessment of internal compliance with
>>>Accenture policy. .
>>>______________________________________________________________________
>>>_
>>>___
>>>____________
>>>
>>>www.accenture.com
>>>
>>
>>
>>
>>________________________________
>>
>>This message is for the designated recipient only and may contain
>>privileged, proprietary, or otherwise confidential information. If you
>>have received it in error, please notify the sender immediately and
>>delete the original. Any other use of the e-mail by you is prohibited.
>>Where allowed by local law, electronic communications with Accenture
>>and its affiliates, including e-mail and instant messaging (including
>>content), may be scanned by our systems for the purposes of information
>>security and assessment of internal compliance with Accenture policy. .
>>_______________________________________________________________________
>>___
>>____________
>>
>>www.accenture.com
>>
>
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture and
>its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy. .
>__________________________________________________________________________
>____________
>
>www.accenture.com
>


RE: Cluster Installation

Posted by so...@accenture.com.
Hi Chris,

So this is what I have now:
1.  YARN-Cluster with 1 RM and 2NMs
2.  Kafka broker running on each NM
3.  Zookeeper running on the RM
4. I downloaded and published(gradlew) the incubator-samza project. It's in my /root/m2 repository ready to be used by my project(when I create one)

Where do I go from here? How do I get Samza to point to this setup exactly?

Thanks,
Sonali

-----Original Message-----
From: Chris Riccomini [mailto:criccomini@linkedin.com]
Sent: Monday, February 03, 2014 12:10 PM
To: dev@samza.incubator.apache.org
Subject: Re: Cluster Installation

Hey Sonali,

You will need to setup separately in order to configure your yarn-site.xml files for the NMs to point to the RM's host/port. They default to localhost, which is what hello-samza is using.

On the Kafka side, the same things applies- you'll need to configure each broker with a unique broker id, etc.

Cheers,
Chris

On 2/3/14 11:25 AM, "sonali.parthasarathy@accenture.com"
<so...@accenture.com> wrote:

>Ah, makes sense
>
>So to have a cluster setup with RM and NMs running on different nodes,
>Can I reuse the "grid" script from "hello-samza"? or will I have to do
>the setup separately and then change the config files on samza?
>
>Thanks,
>Sonali
>
>-----Original Message-----
>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>Sent: Monday, February 03, 2014 11:02 AM
>To: dev@samza.incubator.apache.org
>Subject: Re: Cluster Installation
>
>Hey Sonali,
>
>I believe the point at which YARN became version compatible for 2.* as
>at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>your YARN grid, or use a different one with a higher version.
>
>For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
>though I haven't tested this. The YARN community has given a commitment
>to maintaining API compatibility going forward for YARN 2.*, which
>means that future upgrades should not be required, until YARN 3 comes out.
>
>The rest of your understanding is correct. You can run a 1 RM, 2 NM
>kind of cluster, throw some Kafka brokers on there, and you should be
>good to go. You can also re-use your existing ZK, if you wish.
>
>Cheers,
>Chris
>
>On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
><so...@accenture.com> wrote:
>
>>Thanks Chris/Gary.
>>
>>I have an existing Zookeeper and YARN Cluster. However, the YARN
>>version that I have (that came preinstalled with Pivotal HD) is 2.0.5.
>>So from what you're saying I cannot reuse it for my Samza deployment.
>>
>>So then my option is:
>>1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>(I can do this on multiple servers right? 1 RM, 2 NM kind of
>>situation)
>>
>>Thanks,
>>Sonali
>>
>>
>>-----Original Message-----
>>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>Sent: Friday, January 31, 2014 11:24 AM
>>To: dev@samza.incubator.apache.org
>>Subject: Re: Cluster Installation
>>
>>Hey Sonali,
>>
>>Everything Gary said is correct.
>>
>>One other item of note is that if you're interested in running stuff
>>locally in a dev-mode fashion, you don't need YARN. You can use the
>>LocalJobFactory instead of the YarnJobFactory factory when configuring
>>your job's "job.factory.class" setting.
>>
>>For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>They can be deployed using any standard way of shipping software
>>around to a cluster of machines.
>>
>>Cheers,
>>Chris
>>
>>On 1/31/14 12:58 AM, "Garry Turkington"
>><g....@improvedigital.com>
>>wrote:
>>
>>>Hi Sonali,
>>>
>>>This was something that I had some questions about originally as well.
>>>In terms of required components then yes, for any size of Samza
>>>deployment you will  need all those pieces.
>>>
>>>In terms of actual deployment, from what I understand from the
>>>LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>has a Kafka broker collocated on each node. These decisions though
>>>appear to be more down to convenience than a hard requirement.
>>>
>>>In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>>>pointing Samza at but do need to run a dedicated YARN grid because my
>>>Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>
>>>So if you have existing components you can reuse them, if not then
>>>repurposing the Hello Samza package is a good starting point to get
>>>all the things you want on the required hosts. Only caveat would be
>>>to not drop a ZK node on each host, the ZK quorum should follow the
>>>usual advice of an odd number of servers and likely no more than 3, 5
>>>or 7 depending on your deployment size.
>>>
>>>Garry
>>>
>>>-----Original Message-----
>>>From: sonali.parthasarathy@accenture.com
>>>[mailto:sonali.parthasarathy@accenture.com]
>>>Sent: 30 January 2014 23:38
>>>To: dev@samza.incubator.apache.org
>>>Subject: Cluster Installation
>>>
>>>Hi All,
>>>
>>>I'm new to working with Samza and have been trying to figure out the
>>>best cluster configuration. I understand that Samza comes with
>>>yarn,kafka and zookeeper out of the box. Is that the model just for a
>>>standalone/local configuration. What if I want a bigger cluster? Do I
>>>have to install yarn, kafka and zookeeper separately? Any suggestions
>>>would be great!
>>>
>>>Thanks,
>>>Sonali
>>>
>>>Sonali Parthasarathy
>>>R&D Developer, Data Insights
>>>Accenture Technology Labs
>>>703-341-7432
>>>
>>>
>>>________________________________
>>>
>>>This message is for the designated recipient only and may contain
>>>privileged, proprietary, or otherwise confidential information. If
>>>you have received it in error, please notify the sender immediately
>>>and delete the original. Any other use of the e-mail by you is prohibited.
>>>Where allowed by local law, electronic communications with Accenture
>>>and its affiliates, including e-mail and instant messaging (including
>>>content), may be scanned by our systems for the purposes of
>>>information security and assessment of internal compliance with
>>>Accenture policy. .
>>>_____________________________________________________________________
>>>_
>>>_
>>>___
>>>____________
>>>
>>>www.accenture.com
>>>
>>>-----
>>>No virus found in this message.
>>>Checked by AVG - www.avg.com
>>>Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>01/30/14
>>
>>
>>
>>________________________________
>>
>>This message is for the designated recipient only and may contain
>>privileged, proprietary, or otherwise confidential information. If you
>>have received it in error, please notify the sender immediately and
>>delete the original. Any other use of the e-mail by you is prohibited.
>>Where allowed by local law, electronic communications with Accenture
>>and its affiliates, including e-mail and instant messaging (including
>>content), may be scanned by our systems for the purposes of
>>information security and assessment of internal compliance with Accenture policy. .
>>______________________________________________________________________
>>_
>>___
>>____________
>>
>>www.accenture.com
>>
>
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture
>and its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy. .
>_______________________________________________________________________
>___
>____________
>
>www.accenture.com
>



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. .
______________________________________________________________________________________

www.accenture.com


RE: Cluster Installation

Posted by so...@accenture.com.
Cool! I'll let you know how it goes!

Thanks,
S

-----Original Message-----
From: Chris Riccomini [mailto:criccomini@linkedin.com]
Sent: Monday, February 03, 2014 12:10 PM
To: dev@samza.incubator.apache.org
Subject: Re: Cluster Installation

Hey Sonali,

You will need to setup separately in order to configure your yarn-site.xml files for the NMs to point to the RM's host/port. They default to localhost, which is what hello-samza is using.

On the Kafka side, the same things applies- you'll need to configure each broker with a unique broker id, etc.

Cheers,
Chris

On 2/3/14 11:25 AM, "sonali.parthasarathy@accenture.com"
<so...@accenture.com> wrote:

>Ah, makes sense
>
>So to have a cluster setup with RM and NMs running on different nodes,
>Can I reuse the "grid" script from "hello-samza"? or will I have to do
>the setup separately and then change the config files on samza?
>
>Thanks,
>Sonali
>
>-----Original Message-----
>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>Sent: Monday, February 03, 2014 11:02 AM
>To: dev@samza.incubator.apache.org
>Subject: Re: Cluster Installation
>
>Hey Sonali,
>
>I believe the point at which YARN became version compatible for 2.* as
>at 2.1.0-beta. I believe 2.0.5 is not API compatible with later
>versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade
>your YARN grid, or use a different one with a higher version.
>
>For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
>though I haven't tested this. The YARN community has given a commitment
>to maintaining API compatibility going forward for YARN 2.*, which
>means that future upgrades should not be required, until YARN 3 comes out.
>
>The rest of your understanding is correct. You can run a 1 RM, 2 NM
>kind of cluster, throw some Kafka brokers on there, and you should be
>good to go. You can also re-use your existing ZK, if you wish.
>
>Cheers,
>Chris
>
>On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
><so...@accenture.com> wrote:
>
>>Thanks Chris/Gary.
>>
>>I have an existing Zookeeper and YARN Cluster. However, the YARN
>>version that I have (that came preinstalled with Pivotal HD) is 2.0.5.
>>So from what you're saying I cannot reuse it for my Samza deployment.
>>
>>So then my option is:
>>1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>right cluster 2. Run Samza with its YARN grid and Kafka Installation
>>(I can do this on multiple servers right? 1 RM, 2 NM kind of
>>situation)
>>
>>Thanks,
>>Sonali
>>
>>
>>-----Original Message-----
>>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>Sent: Friday, January 31, 2014 11:24 AM
>>To: dev@samza.incubator.apache.org
>>Subject: Re: Cluster Installation
>>
>>Hey Sonali,
>>
>>Everything Gary said is correct.
>>
>>One other item of note is that if you're interested in running stuff
>>locally in a dev-mode fashion, you don't need YARN. You can use the
>>LocalJobFactory instead of the YarnJobFactory factory when configuring
>>your job's "job.factory.class" setting.
>>
>>For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>They can be deployed using any standard way of shipping software
>>around to a cluster of machines.
>>
>>Cheers,
>>Chris
>>
>>On 1/31/14 12:58 AM, "Garry Turkington"
>><g....@improvedigital.com>
>>wrote:
>>
>>>Hi Sonali,
>>>
>>>This was something that I had some questions about originally as well.
>>>In terms of required components then yes, for any size of Samza
>>>deployment you will  need all those pieces.
>>>
>>>In terms of actual deployment, from what I understand from the
>>>LinkedIn guys they do run Samza on a dedicated YARN grid that also
>>>has a Kafka broker collocated on each node. These decisions though
>>>appear to be more down to convenience than a hard requirement.
>>>
>>>In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>>>pointing Samza at but do need to run a dedicated YARN grid because my
>>>Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>
>>>So if you have existing components you can reuse them, if not then
>>>repurposing the Hello Samza package is a good starting point to get
>>>all the things you want on the required hosts. Only caveat would be
>>>to not drop a ZK node on each host, the ZK quorum should follow the
>>>usual advice of an odd number of servers and likely no more than 3, 5
>>>or 7 depending on your deployment size.
>>>
>>>Garry
>>>
>>>-----Original Message-----
>>>From: sonali.parthasarathy@accenture.com
>>>[mailto:sonali.parthasarathy@accenture.com]
>>>Sent: 30 January 2014 23:38
>>>To: dev@samza.incubator.apache.org
>>>Subject: Cluster Installation
>>>
>>>Hi All,
>>>
>>>I'm new to working with Samza and have been trying to figure out the
>>>best cluster configuration. I understand that Samza comes with
>>>yarn,kafka and zookeeper out of the box. Is that the model just for a
>>>standalone/local configuration. What if I want a bigger cluster? Do I
>>>have to install yarn, kafka and zookeeper separately? Any suggestions
>>>would be great!
>>>
>>>Thanks,
>>>Sonali
>>>
>>>Sonali Parthasarathy
>>>R&D Developer, Data Insights
>>>Accenture Technology Labs
>>>703-341-7432
>>>
>>>
>>>________________________________
>>>
>>>This message is for the designated recipient only and may contain
>>>privileged, proprietary, or otherwise confidential information. If
>>>you have received it in error, please notify the sender immediately
>>>and delete the original. Any other use of the e-mail by you is prohibited.
>>>Where allowed by local law, electronic communications with Accenture
>>>and its affiliates, including e-mail and instant messaging (including
>>>content), may be scanned by our systems for the purposes of
>>>information security and assessment of internal compliance with
>>>Accenture policy. .
>>>_____________________________________________________________________
>>>_
>>>_
>>>___
>>>____________
>>>
>>>www.accenture.com
>>>
>>>-----
>>>No virus found in this message.
>>>Checked by AVG - www.avg.com
>>>Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>01/30/14
>>
>>
>>
>>________________________________
>>
>>This message is for the designated recipient only and may contain
>>privileged, proprietary, or otherwise confidential information. If you
>>have received it in error, please notify the sender immediately and
>>delete the original. Any other use of the e-mail by you is prohibited.
>>Where allowed by local law, electronic communications with Accenture
>>and its affiliates, including e-mail and instant messaging (including
>>content), may be scanned by our systems for the purposes of
>>information security and assessment of internal compliance with Accenture policy. .
>>______________________________________________________________________
>>_
>>___
>>____________
>>
>>www.accenture.com
>>
>
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture
>and its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy. .
>_______________________________________________________________________
>___
>____________
>
>www.accenture.com
>



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. .
______________________________________________________________________________________

www.accenture.com


Re: Cluster Installation

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey Sonali,

You will need to setup separately in order to configure your yarn-site.xml
files for the NMs to point to the RM's host/port. They default to
localhost, which is what hello-samza is using.

On the Kafka side, the same things applies- you'll need to configure each
broker with a unique broker id, etc.

Cheers,
Chris

On 2/3/14 11:25 AM, "sonali.parthasarathy@accenture.com"
<so...@accenture.com> wrote:

>Ah, makes sense
>
>So to have a cluster setup with RM and NMs running on different nodes,
>Can I reuse the "grid" script from "hello-samza"? or will I have to do
>the setup separately and then change the config files on samza?
>
>Thanks,
>Sonali
>
>-----Original Message-----
>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>Sent: Monday, February 03, 2014 11:02 AM
>To: dev@samza.incubator.apache.org
>Subject: Re: Cluster Installation
>
>Hey Sonali,
>
>I believe the point at which YARN became version compatible for 2.* as at
>2.1.0-beta. I believe 2.0.5 is not API compatible with later versions of
>YARN (e.g. 2.2). For this reason, you'll need to upgrade your YARN grid,
>or use a different one with a higher version.
>
>For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
>though I haven't tested this. The YARN community has given a commitment
>to maintaining API compatibility going forward for YARN 2.*, which means
>that future upgrades should not be required, until YARN 3 comes out.
>
>The rest of your understanding is correct. You can run a 1 RM, 2 NM kind
>of cluster, throw some Kafka brokers on there, and you should be good to
>go. You can also re-use your existing ZK, if you wish.
>
>Cheers,
>Chris
>
>On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
><so...@accenture.com> wrote:
>
>>Thanks Chris/Gary.
>>
>>I have an existing Zookeeper and YARN Cluster. However, the YARN
>>version that I have (that came preinstalled with Pivotal HD) is 2.0.5.
>>So from what you're saying I cannot reuse it for my Samza deployment.
>>
>>So then my option is:
>>1. Reuse zookeeper. So I'll have to configure Samza to point to the
>>right cluster 2. Run Samza with its YARN grid and Kafka Installation (I
>>can do this on multiple servers right? 1 RM, 2 NM kind of situation)
>>
>>Thanks,
>>Sonali
>>
>>
>>-----Original Message-----
>>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>>Sent: Friday, January 31, 2014 11:24 AM
>>To: dev@samza.incubator.apache.org
>>Subject: Re: Cluster Installation
>>
>>Hey Sonali,
>>
>>Everything Gary said is correct.
>>
>>One other item of note is that if you're interested in running stuff
>>locally in a dev-mode fashion, you don't need YARN. You can use the
>>LocalJobFactory instead of the YarnJobFactory factory when configuring
>>your job's "job.factory.class" setting.
>>
>>For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>>They can be deployed using any standard way of shipping software around
>>to a cluster of machines.
>>
>>Cheers,
>>Chris
>>
>>On 1/31/14 12:58 AM, "Garry Turkington"
>><g....@improvedigital.com>
>>wrote:
>>
>>>Hi Sonali,
>>>
>>>This was something that I had some questions about originally as well.
>>>In terms of required components then yes, for any size of Samza
>>>deployment you will  need all those pieces.
>>>
>>>In terms of actual deployment, from what I understand from the
>>>LinkedIn guys they do run Samza on a dedicated YARN grid that also has
>>>a Kafka broker collocated on each node. These decisions though appear
>>>to be more down to convenience than a hard requirement.
>>>
>>>In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>>>pointing Samza at but do need to run a dedicated YARN grid because my
>>>Hadoop cluster has a pre-2.2 version of YARN running on it.
>>>
>>>So if you have existing components you can reuse them, if not then
>>>repurposing the Hello Samza package is a good starting point to get
>>>all the things you want on the required hosts. Only caveat would be to
>>>not drop a ZK node on each host, the ZK quorum should follow the usual
>>>advice of an odd number of servers and likely no more than 3, 5 or 7
>>>depending on your deployment size.
>>>
>>>Garry
>>>
>>>-----Original Message-----
>>>From: sonali.parthasarathy@accenture.com
>>>[mailto:sonali.parthasarathy@accenture.com]
>>>Sent: 30 January 2014 23:38
>>>To: dev@samza.incubator.apache.org
>>>Subject: Cluster Installation
>>>
>>>Hi All,
>>>
>>>I'm new to working with Samza and have been trying to figure out the
>>>best cluster configuration. I understand that Samza comes with
>>>yarn,kafka and zookeeper out of the box. Is that the model just for a
>>>standalone/local configuration. What if I want a bigger cluster? Do I
>>>have to install yarn, kafka and zookeeper separately? Any suggestions
>>>would be great!
>>>
>>>Thanks,
>>>Sonali
>>>
>>>Sonali Parthasarathy
>>>R&D Developer, Data Insights
>>>Accenture Technology Labs
>>>703-341-7432
>>>
>>>
>>>________________________________
>>>
>>>This message is for the designated recipient only and may contain
>>>privileged, proprietary, or otherwise confidential information. If you
>>>have received it in error, please notify the sender immediately and
>>>delete the original. Any other use of the e-mail by you is prohibited.
>>>Where allowed by local law, electronic communications with Accenture
>>>and its affiliates, including e-mail and instant messaging (including
>>>content), may be scanned by our systems for the purposes of
>>>information security and assessment of internal compliance with
>>>Accenture policy. .
>>>______________________________________________________________________
>>>_
>>>___
>>>____________
>>>
>>>www.accenture.com
>>>
>>>-----
>>>No virus found in this message.
>>>Checked by AVG - www.avg.com
>>>Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>>01/30/14
>>
>>
>>
>>________________________________
>>
>>This message is for the designated recipient only and may contain
>>privileged, proprietary, or otherwise confidential information. If you
>>have received it in error, please notify the sender immediately and
>>delete the original. Any other use of the e-mail by you is prohibited.
>>Where allowed by local law, electronic communications with Accenture
>>and its affiliates, including e-mail and instant messaging (including
>>content), may be scanned by our systems for the purposes of information
>>security and assessment of internal compliance with Accenture policy. .
>>_______________________________________________________________________
>>___
>>____________
>>
>>www.accenture.com
>>
>
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture and
>its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy. .
>__________________________________________________________________________
>____________
>
>www.accenture.com
>


RE: Cluster Installation

Posted by so...@accenture.com.
Ah, makes sense

So to have a cluster setup with RM and NMs running on different nodes, Can I reuse the "grid" script from "hello-samza"? or will I have to do the setup separately and then change the config files on samza?

Thanks,
Sonali

-----Original Message-----
From: Chris Riccomini [mailto:criccomini@linkedin.com]
Sent: Monday, February 03, 2014 11:02 AM
To: dev@samza.incubator.apache.org
Subject: Re: Cluster Installation

Hey Sonali,

I believe the point at which YARN became version compatible for 2.* as at 2.1.0-beta. I believe 2.0.5 is not API compatible with later versions of YARN (e.g. 2.2). For this reason, you'll need to upgrade your YARN grid, or use a different one with a higher version.

For its part, Samza should work with YARN grids 2.1.0-beta and beyond, though I haven't tested this. The YARN community has given a commitment to maintaining API compatibility going forward for YARN 2.*, which means that future upgrades should not be required, until YARN 3 comes out.

The rest of your understanding is correct. You can run a 1 RM, 2 NM kind of cluster, throw some Kafka brokers on there, and you should be good to go. You can also re-use your existing ZK, if you wish.

Cheers,
Chris

On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
<so...@accenture.com> wrote:

>Thanks Chris/Gary.
>
>I have an existing Zookeeper and YARN Cluster. However, the YARN
>version that I have (that came preinstalled with Pivotal HD) is 2.0.5.
>So from what you're saying I cannot reuse it for my Samza deployment.
>
>So then my option is:
>1. Reuse zookeeper. So I'll have to configure Samza to point to the
>right cluster 2. Run Samza with its YARN grid and Kafka Installation (I
>can do this on multiple servers right? 1 RM, 2 NM kind of situation)
>
>Thanks,
>Sonali
>
>
>-----Original Message-----
>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>Sent: Friday, January 31, 2014 11:24 AM
>To: dev@samza.incubator.apache.org
>Subject: Re: Cluster Installation
>
>Hey Sonali,
>
>Everything Gary said is correct.
>
>One other item of note is that if you're interested in running stuff
>locally in a dev-mode fashion, you don't need YARN. You can use the
>LocalJobFactory instead of the YarnJobFactory factory when configuring
>your job's "job.factory.class" setting.
>
>For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka.
>They can be deployed using any standard way of shipping software around
>to a cluster of machines.
>
>Cheers,
>Chris
>
>On 1/31/14 12:58 AM, "Garry Turkington"
><g....@improvedigital.com>
>wrote:
>
>>Hi Sonali,
>>
>>This was something that I had some questions about originally as well.
>>In terms of required components then yes, for any size of Samza
>>deployment you will  need all those pieces.
>>
>>In terms of actual deployment, from what I understand from the
>>LinkedIn guys they do run Samza on a dedicated YARN grid that also has
>>a Kafka broker collocated on each node. These decisions though appear
>>to be more down to convenience than a hard requirement.
>>
>>In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>>pointing Samza at but do need to run a dedicated YARN grid because my
>>Hadoop cluster has a pre-2.2 version of YARN running on it.
>>
>>So if you have existing components you can reuse them, if not then
>>repurposing the Hello Samza package is a good starting point to get
>>all the things you want on the required hosts. Only caveat would be to
>>not drop a ZK node on each host, the ZK quorum should follow the usual
>>advice of an odd number of servers and likely no more than 3, 5 or 7
>>depending on your deployment size.
>>
>>Garry
>>
>>-----Original Message-----
>>From: sonali.parthasarathy@accenture.com
>>[mailto:sonali.parthasarathy@accenture.com]
>>Sent: 30 January 2014 23:38
>>To: dev@samza.incubator.apache.org
>>Subject: Cluster Installation
>>
>>Hi All,
>>
>>I'm new to working with Samza and have been trying to figure out the
>>best cluster configuration. I understand that Samza comes with
>>yarn,kafka and zookeeper out of the box. Is that the model just for a
>>standalone/local configuration. What if I want a bigger cluster? Do I
>>have to install yarn, kafka and zookeeper separately? Any suggestions
>>would be great!
>>
>>Thanks,
>>Sonali
>>
>>Sonali Parthasarathy
>>R&D Developer, Data Insights
>>Accenture Technology Labs
>>703-341-7432
>>
>>
>>________________________________
>>
>>This message is for the designated recipient only and may contain
>>privileged, proprietary, or otherwise confidential information. If you
>>have received it in error, please notify the sender immediately and
>>delete the original. Any other use of the e-mail by you is prohibited.
>>Where allowed by local law, electronic communications with Accenture
>>and its affiliates, including e-mail and instant messaging (including
>>content), may be scanned by our systems for the purposes of
>>information security and assessment of internal compliance with Accenture policy. .
>>______________________________________________________________________
>>_
>>___
>>____________
>>
>>www.accenture.com
>>
>>-----
>>No virus found in this message.
>>Checked by AVG - www.avg.com
>>Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>01/30/14
>
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture
>and its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy. .
>_______________________________________________________________________
>___
>____________
>
>www.accenture.com
>



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. .
______________________________________________________________________________________

www.accenture.com


Re: Cluster Installation

Posted by Chris Riccomini <cr...@linkedin.com>.
Hey Sonali,

I believe the point at which YARN became version compatible for 2.* as at
2.1.0-beta. I believe 2.0.5 is not API compatible with later versions of
YARN (e.g. 2.2). For this reason, you'll need to upgrade your YARN grid,
or use a different one with a higher version.

For its part, Samza should work with YARN grids 2.1.0-beta and beyond,
though I haven't tested this. The YARN community has given a commitment to
maintaining API compatibility going forward for YARN 2.*, which means that
future upgrades should not be required, until YARN 3 comes out.

The rest of your understanding is correct. You can run a 1 RM, 2 NM kind
of cluster, throw some Kafka brokers on there, and you should be good to
go. You can also re-use your existing ZK, if you wish.

Cheers,
Chris

On 2/3/14 10:42 AM, "sonali.parthasarathy@accenture.com"
<so...@accenture.com> wrote:

>Thanks Chris/Gary.
>
>I have an existing Zookeeper and YARN Cluster. However, the YARN version
>that I have (that came preinstalled with Pivotal HD) is 2.0.5. So from
>what you're saying I cannot reuse it for my Samza deployment.
>
>So then my option is:
>1. Reuse zookeeper. So I'll have to configure Samza to point to the right
>cluster
>2. Run Samza with its YARN grid and Kafka Installation (I can do this on
>multiple servers right? 1 RM, 2 NM kind of situation)
>
>Thanks,
>Sonali
>
>
>-----Original Message-----
>From: Chris Riccomini [mailto:criccomini@linkedin.com]
>Sent: Friday, January 31, 2014 11:24 AM
>To: dev@samza.incubator.apache.org
>Subject: Re: Cluster Installation
>
>Hey Sonali,
>
>Everything Gary said is correct.
>
>One other item of note is that if you're interested in running stuff
>locally in a dev-mode fashion, you don't need YARN. You can use the
>LocalJobFactory instead of the YarnJobFactory factory when configuring
>your job's "job.factory.class" setting.
>
>For "real" deployments, yes you'll need YARN, ZooKeeper, and Kafka. They
>can be deployed using any standard way of shipping software around to a
>cluster of machines.
>
>Cheers,
>Chris
>
>On 1/31/14 12:58 AM, "Garry Turkington" <g....@improvedigital.com>
>wrote:
>
>>Hi Sonali,
>>
>>This was something that I had some questions about originally as well.
>>In terms of required components then yes, for any size of Samza
>>deployment you will  need all those pieces.
>>
>>In terms of actual deployment, from what I understand from the LinkedIn
>>guys they do run Samza on a dedicated YARN grid that also has a Kafka
>>broker collocated on each node. These decisions though appear to be
>>more down to convenience than a hard requirement.
>>
>>In my own setup I have existing ZooKeeper and Kafka clusters that I'm
>>pointing Samza at but do need to run a dedicated YARN grid because my
>>Hadoop cluster has a pre-2.2 version of YARN running on it.
>>
>>So if you have existing components you can reuse them, if not then
>>repurposing the Hello Samza package is a good starting point to get all
>>the things you want on the required hosts. Only caveat would be to not
>>drop a ZK node on each host, the ZK quorum should follow the usual
>>advice of an odd number of servers and likely no more than 3, 5 or 7
>>depending on your deployment size.
>>
>>Garry
>>
>>-----Original Message-----
>>From: sonali.parthasarathy@accenture.com
>>[mailto:sonali.parthasarathy@accenture.com]
>>Sent: 30 January 2014 23:38
>>To: dev@samza.incubator.apache.org
>>Subject: Cluster Installation
>>
>>Hi All,
>>
>>I'm new to working with Samza and have been trying to figure out the
>>best cluster configuration. I understand that Samza comes with
>>yarn,kafka and zookeeper out of the box. Is that the model just for a
>>standalone/local configuration. What if I want a bigger cluster? Do I
>>have to install yarn, kafka and zookeeper separately? Any suggestions
>>would be great!
>>
>>Thanks,
>>Sonali
>>
>>Sonali Parthasarathy
>>R&D Developer, Data Insights
>>Accenture Technology Labs
>>703-341-7432
>>
>>
>>________________________________
>>
>>This message is for the designated recipient only and may contain
>>privileged, proprietary, or otherwise confidential information. If you
>>have received it in error, please notify the sender immediately and
>>delete the original. Any other use of the e-mail by you is prohibited.
>>Where allowed by local law, electronic communications with Accenture
>>and its affiliates, including e-mail and instant messaging (including
>>content), may be scanned by our systems for the purposes of information
>>security and assessment of internal compliance with Accenture policy. .
>>_______________________________________________________________________
>>___
>>____________
>>
>>www.accenture.com
>>
>>-----
>>No virus found in this message.
>>Checked by AVG - www.avg.com
>>Version: 2014.0.4259 / Virus Database: 3684/7046 - Release Date:
>>01/30/14
>
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture and
>its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy. .
>__________________________________________________________________________
>____________
>
>www.accenture.com
>