You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Matteo Luzzi <ma...@gmail.com> on 2015/09/02 00:25:52 UTC

Problems in running spark on Yarn

Hi all!
I'm developing a system where I need to run spark jobs over yarn. I'm using
a two node cluster (one master and one slave) for testing and I'm
submitting the application through oozie, but after the first application
starts running (the oozie container) the other one remains in accepted
stated. I am new to yarn so probably I am missing some concepts about how
containers are requested and assigned to the applications. It seems that I
can execute only one container at the time, even though there are still
free resources. When I kill the first running application, the other one
passes to running state. I'm also using the Fair Scheduler as according the
documentation, it should avoid any starvation problems.
I don't know if it is a problem of spark or yarn. Please come with
suggestion if you have any.

[image: Immagine incorporata 1]

Re: Problems in running spark on Yarn

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Hi Matteo,

It depends on configurations - yarn-site.xml (nodemanager's capacity
of memory) and requests of containers by spark(spark.yarn.am.memory
and executor's Memory) if you don't use Dominant Resource Fairness.
Could you share them?

Thanks,
- Tsuyoshi

On Wed, Sep 2, 2015 at 7:25 AM, Matteo Luzzi <ma...@gmail.com> wrote:
> Hi all!
> I'm developing a system where I need to run spark jobs over yarn. I'm using
> a two node cluster (one master and one slave) for testing and I'm submitting
> the application through oozie, but after the first application starts
> running (the oozie container) the other one remains in accepted stated. I am
> new to yarn so probably I am missing some concepts about how containers are
> requested and assigned to the applications. It seems that I can execute only
> one container at the time, even though there are still free resources. When
> I kill the first running application, the other one passes to running state.
> I'm also using the Fair Scheduler as according the documentation, it should
> avoid any starvation problems.
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi again, I actually managed to have spark the spark application working by
setting the property yarn.nodemanager.vmem-check-enabled to false. Then
yarn won't check for the virtual memory requested by spark. I found that
this is a common problem with spark application and the main suggestion is
to increase the property spark.yarn.executor.memoryOverhead as long as it
does not work. In my case, change that property had no effects.  As I said,
I am not an expert in hadoop/yarn and I don't know if it is a good
solution, so suggestions or comments are really welcome

Matteo

2015-09-02 11:06 GMT+02:00 Matteo Luzzi <ma...@gmail.com>:

> Hi Tsuyoshi,
> I made some changes to the configuration files once I read your answers
> and the documentation of yarn. Now I'm using this settings:
>
> yarn-site.xml
>
> <property>
>         <name>yarn.resourcemanager.scheduler.class</name>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>12000</value>
>     </property>
>
>     <property>
> <name>yarn.nodemanager.resource.cpu-vcores</name>
> <value>4</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.minimum-allocation-mb</name>
> <value>1024</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>12000</value>
>     </property>
>
> mapred-site.xml
>
> <property>
> <name>mapreduce.map.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
>   <name>mapreduce.jobhistory.address</name>
>   <value>172.31.25.237:10020</value>
> </property>
>
> and concerning the spark property, i'm setting them through the
> workflow.xml file of oozie
>
>  <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
> --conf spark.yarn.executor.memoryOverhead=500 --conf
> spark.scheduler.mode=FAIR --conf
> spark.yarn.driver.memoryOverhead=500</spark-opt>
>
> With this configuration I am able to launch and execute different
> containers. The spark container now fails for this reason:
>
> Container [pid=18245,containerID=container_1441180487101_0017_02_000001]
> is running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
> physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
> container.
>
> A second container is launched by the spark app, it is a copy of some
> binary files into a separate folder in order to be analyzed by the spark
> app. The logic of the code implies the spark jobs to wait for the
> completion of this action before their executions. Unfortunately also that
> app fails :
>
> INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
> 172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
> to
> file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
> 2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.IllegalArgumentException: Wrong FS:
> file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
> 172.31.25.237:9000
>
> here is the hdfs-site.xml
>
> <configuration>
> <property>
>         <name>dfs.replication</name>
>         <value>3</value>
>     </property>
>     <property>
>         <name>dfs.permissions.enabled</name>
>         <value>false</value>
>     </property>
>     <property>
>         <name>dfs.namenode.name.dir</name>
>         <value>file:///home/hduser/hdfs/namenode</value>
>     </property>
>
>     <property>
>         <name>dfs.datanode.data.dir</name>
>         <value>file:///home/hduser/hdfs/datanode</value>
>     </property>
> </configuration>
>
> Thanks for the support
>
>
>
> [image: Immagine incorporata 1]
>
>
>
> 2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:
>
>> Guys,
>>
>>
>>
>> How does Hive work with MAPREDUCE , can someone clarify unable to
>> understand this concept.
>>
>>
>>
>> Regards,
>>
>>
>>
>> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
>> *Sent:* Wednesday, September 02, 2015 2:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Problems in running spark on Yarn
>>
>>
>>
>> Hi all!
>>
>> I'm developing a system where I need to run spark jobs over yarn. I'm
>> using a two node cluster (one master and one slave) for testing and I'm
>> submitting the application through oozie, but after the first application
>> starts running (the oozie container) the other one remains in accepted
>> stated. I am new to yarn so probably I am missing some concepts about how
>> containers are requested and assigned to the applications. It seems that I
>> can execute only one container at the time, even though there are still
>> free resources. When I kill the first running application, the other one
>> passes to running state. I'm also using the Fair Scheduler as according the
>> documentation, it should avoid any starvation problems.
>>
>> I don't know if it is a problem of spark or yarn. Please come with
>> suggestion if you have any.
>>
>>
>>
>> [image: Immagine incorporata 1]
>>
>>
>> Kiran Govind
>> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
>> UAE Operations
>> E  kgovind@ega.ae
>> D +97148021153
>> M +971555430816
>>
>>
>> Emirates Global Aluminium
>> P.O. Box 3627, Dubai
>> United Arab Emirates
>>
>> www.ega.ae
>>
>>
>> ------------------------------
>>
>> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
>> confidential to the intended recipient. If you are not the intended
>> recipient be advised that you have received this email in error and that
>> any use, dissemination, forwarding, printing or copying of this e-mail is
>> strictly prohibited. It may not be disclosed to or used by anyone other
>> than its intended recipient, nor may it be copied in any way. If received
>> in error please e-mail a reply to the sender and delete it from your
>> system. Although this e-mail has been scanned for viruses, Emirates Global
>> Aluminium cannot ultimately accept any responsibility for viruses and it is
>> your responsibility to scan attachments (if any).
>>
>
>
>
> --
> Matteo Remo Luzzi
>



-- 
Matteo Remo Luzzi

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi again, I actually managed to have spark the spark application working by
setting the property yarn.nodemanager.vmem-check-enabled to false. Then
yarn won't check for the virtual memory requested by spark. I found that
this is a common problem with spark application and the main suggestion is
to increase the property spark.yarn.executor.memoryOverhead as long as it
does not work. In my case, change that property had no effects.  As I said,
I am not an expert in hadoop/yarn and I don't know if it is a good
solution, so suggestions or comments are really welcome

Matteo

2015-09-02 11:06 GMT+02:00 Matteo Luzzi <ma...@gmail.com>:

> Hi Tsuyoshi,
> I made some changes to the configuration files once I read your answers
> and the documentation of yarn. Now I'm using this settings:
>
> yarn-site.xml
>
> <property>
>         <name>yarn.resourcemanager.scheduler.class</name>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>12000</value>
>     </property>
>
>     <property>
> <name>yarn.nodemanager.resource.cpu-vcores</name>
> <value>4</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.minimum-allocation-mb</name>
> <value>1024</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>12000</value>
>     </property>
>
> mapred-site.xml
>
> <property>
> <name>mapreduce.map.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
>   <name>mapreduce.jobhistory.address</name>
>   <value>172.31.25.237:10020</value>
> </property>
>
> and concerning the spark property, i'm setting them through the
> workflow.xml file of oozie
>
>  <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
> --conf spark.yarn.executor.memoryOverhead=500 --conf
> spark.scheduler.mode=FAIR --conf
> spark.yarn.driver.memoryOverhead=500</spark-opt>
>
> With this configuration I am able to launch and execute different
> containers. The spark container now fails for this reason:
>
> Container [pid=18245,containerID=container_1441180487101_0017_02_000001]
> is running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
> physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
> container.
>
> A second container is launched by the spark app, it is a copy of some
> binary files into a separate folder in order to be analyzed by the spark
> app. The logic of the code implies the spark jobs to wait for the
> completion of this action before their executions. Unfortunately also that
> app fails :
>
> INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
> 172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
> to
> file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
> 2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.IllegalArgumentException: Wrong FS:
> file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
> 172.31.25.237:9000
>
> here is the hdfs-site.xml
>
> <configuration>
> <property>
>         <name>dfs.replication</name>
>         <value>3</value>
>     </property>
>     <property>
>         <name>dfs.permissions.enabled</name>
>         <value>false</value>
>     </property>
>     <property>
>         <name>dfs.namenode.name.dir</name>
>         <value>file:///home/hduser/hdfs/namenode</value>
>     </property>
>
>     <property>
>         <name>dfs.datanode.data.dir</name>
>         <value>file:///home/hduser/hdfs/datanode</value>
>     </property>
> </configuration>
>
> Thanks for the support
>
>
>
> [image: Immagine incorporata 1]
>
>
>
> 2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:
>
>> Guys,
>>
>>
>>
>> How does Hive work with MAPREDUCE , can someone clarify unable to
>> understand this concept.
>>
>>
>>
>> Regards,
>>
>>
>>
>> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
>> *Sent:* Wednesday, September 02, 2015 2:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Problems in running spark on Yarn
>>
>>
>>
>> Hi all!
>>
>> I'm developing a system where I need to run spark jobs over yarn. I'm
>> using a two node cluster (one master and one slave) for testing and I'm
>> submitting the application through oozie, but after the first application
>> starts running (the oozie container) the other one remains in accepted
>> stated. I am new to yarn so probably I am missing some concepts about how
>> containers are requested and assigned to the applications. It seems that I
>> can execute only one container at the time, even though there are still
>> free resources. When I kill the first running application, the other one
>> passes to running state. I'm also using the Fair Scheduler as according the
>> documentation, it should avoid any starvation problems.
>>
>> I don't know if it is a problem of spark or yarn. Please come with
>> suggestion if you have any.
>>
>>
>>
>> [image: Immagine incorporata 1]
>>
>>
>> Kiran Govind
>> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
>> UAE Operations
>> E  kgovind@ega.ae
>> D +97148021153
>> M +971555430816
>>
>>
>> Emirates Global Aluminium
>> P.O. Box 3627, Dubai
>> United Arab Emirates
>>
>> www.ega.ae
>>
>>
>> ------------------------------
>>
>> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
>> confidential to the intended recipient. If you are not the intended
>> recipient be advised that you have received this email in error and that
>> any use, dissemination, forwarding, printing or copying of this e-mail is
>> strictly prohibited. It may not be disclosed to or used by anyone other
>> than its intended recipient, nor may it be copied in any way. If received
>> in error please e-mail a reply to the sender and delete it from your
>> system. Although this e-mail has been scanned for viruses, Emirates Global
>> Aluminium cannot ultimately accept any responsibility for viruses and it is
>> your responsibility to scan attachments (if any).
>>
>
>
>
> --
> Matteo Remo Luzzi
>



-- 
Matteo Remo Luzzi

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi again, I actually managed to have spark the spark application working by
setting the property yarn.nodemanager.vmem-check-enabled to false. Then
yarn won't check for the virtual memory requested by spark. I found that
this is a common problem with spark application and the main suggestion is
to increase the property spark.yarn.executor.memoryOverhead as long as it
does not work. In my case, change that property had no effects.  As I said,
I am not an expert in hadoop/yarn and I don't know if it is a good
solution, so suggestions or comments are really welcome

Matteo

2015-09-02 11:06 GMT+02:00 Matteo Luzzi <ma...@gmail.com>:

> Hi Tsuyoshi,
> I made some changes to the configuration files once I read your answers
> and the documentation of yarn. Now I'm using this settings:
>
> yarn-site.xml
>
> <property>
>         <name>yarn.resourcemanager.scheduler.class</name>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>12000</value>
>     </property>
>
>     <property>
> <name>yarn.nodemanager.resource.cpu-vcores</name>
> <value>4</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.minimum-allocation-mb</name>
> <value>1024</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>12000</value>
>     </property>
>
> mapred-site.xml
>
> <property>
> <name>mapreduce.map.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
>   <name>mapreduce.jobhistory.address</name>
>   <value>172.31.25.237:10020</value>
> </property>
>
> and concerning the spark property, i'm setting them through the
> workflow.xml file of oozie
>
>  <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
> --conf spark.yarn.executor.memoryOverhead=500 --conf
> spark.scheduler.mode=FAIR --conf
> spark.yarn.driver.memoryOverhead=500</spark-opt>
>
> With this configuration I am able to launch and execute different
> containers. The spark container now fails for this reason:
>
> Container [pid=18245,containerID=container_1441180487101_0017_02_000001]
> is running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
> physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
> container.
>
> A second container is launched by the spark app, it is a copy of some
> binary files into a separate folder in order to be analyzed by the spark
> app. The logic of the code implies the spark jobs to wait for the
> completion of this action before their executions. Unfortunately also that
> app fails :
>
> INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
> 172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
> to
> file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
> 2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.IllegalArgumentException: Wrong FS:
> file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
> 172.31.25.237:9000
>
> here is the hdfs-site.xml
>
> <configuration>
> <property>
>         <name>dfs.replication</name>
>         <value>3</value>
>     </property>
>     <property>
>         <name>dfs.permissions.enabled</name>
>         <value>false</value>
>     </property>
>     <property>
>         <name>dfs.namenode.name.dir</name>
>         <value>file:///home/hduser/hdfs/namenode</value>
>     </property>
>
>     <property>
>         <name>dfs.datanode.data.dir</name>
>         <value>file:///home/hduser/hdfs/datanode</value>
>     </property>
> </configuration>
>
> Thanks for the support
>
>
>
> [image: Immagine incorporata 1]
>
>
>
> 2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:
>
>> Guys,
>>
>>
>>
>> How does Hive work with MAPREDUCE , can someone clarify unable to
>> understand this concept.
>>
>>
>>
>> Regards,
>>
>>
>>
>> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
>> *Sent:* Wednesday, September 02, 2015 2:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Problems in running spark on Yarn
>>
>>
>>
>> Hi all!
>>
>> I'm developing a system where I need to run spark jobs over yarn. I'm
>> using a two node cluster (one master and one slave) for testing and I'm
>> submitting the application through oozie, but after the first application
>> starts running (the oozie container) the other one remains in accepted
>> stated. I am new to yarn so probably I am missing some concepts about how
>> containers are requested and assigned to the applications. It seems that I
>> can execute only one container at the time, even though there are still
>> free resources. When I kill the first running application, the other one
>> passes to running state. I'm also using the Fair Scheduler as according the
>> documentation, it should avoid any starvation problems.
>>
>> I don't know if it is a problem of spark or yarn. Please come with
>> suggestion if you have any.
>>
>>
>>
>> [image: Immagine incorporata 1]
>>
>>
>> Kiran Govind
>> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
>> UAE Operations
>> E  kgovind@ega.ae
>> D +97148021153
>> M +971555430816
>>
>>
>> Emirates Global Aluminium
>> P.O. Box 3627, Dubai
>> United Arab Emirates
>>
>> www.ega.ae
>>
>>
>> ------------------------------
>>
>> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
>> confidential to the intended recipient. If you are not the intended
>> recipient be advised that you have received this email in error and that
>> any use, dissemination, forwarding, printing or copying of this e-mail is
>> strictly prohibited. It may not be disclosed to or used by anyone other
>> than its intended recipient, nor may it be copied in any way. If received
>> in error please e-mail a reply to the sender and delete it from your
>> system. Although this e-mail has been scanned for viruses, Emirates Global
>> Aluminium cannot ultimately accept any responsibility for viruses and it is
>> your responsibility to scan attachments (if any).
>>
>
>
>
> --
> Matteo Remo Luzzi
>



-- 
Matteo Remo Luzzi

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi again, I actually managed to have spark the spark application working by
setting the property yarn.nodemanager.vmem-check-enabled to false. Then
yarn won't check for the virtual memory requested by spark. I found that
this is a common problem with spark application and the main suggestion is
to increase the property spark.yarn.executor.memoryOverhead as long as it
does not work. In my case, change that property had no effects.  As I said,
I am not an expert in hadoop/yarn and I don't know if it is a good
solution, so suggestions or comments are really welcome

Matteo

2015-09-02 11:06 GMT+02:00 Matteo Luzzi <ma...@gmail.com>:

> Hi Tsuyoshi,
> I made some changes to the configuration files once I read your answers
> and the documentation of yarn. Now I'm using this settings:
>
> yarn-site.xml
>
> <property>
>         <name>yarn.resourcemanager.scheduler.class</name>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>12000</value>
>     </property>
>
>     <property>
> <name>yarn.nodemanager.resource.cpu-vcores</name>
> <value>4</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.minimum-allocation-mb</name>
> <value>1024</value>
>     </property>
>
>     <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>12000</value>
>     </property>
>
> mapred-site.xml
>
> <property>
> <name>mapreduce.map.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>1024</value>
> </property>
>
> <property>
>   <name>mapreduce.jobhistory.address</name>
>   <value>172.31.25.237:10020</value>
> </property>
>
> and concerning the spark property, i'm setting them through the
> workflow.xml file of oozie
>
>  <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
> --conf spark.yarn.executor.memoryOverhead=500 --conf
> spark.scheduler.mode=FAIR --conf
> spark.yarn.driver.memoryOverhead=500</spark-opt>
>
> With this configuration I am able to launch and execute different
> containers. The spark container now fails for this reason:
>
> Container [pid=18245,containerID=container_1441180487101_0017_02_000001]
> is running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
> physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
> container.
>
> A second container is launched by the spark app, it is a copy of some
> binary files into a separate folder in order to be analyzed by the spark
> app. The logic of the code implies the spark jobs to wait for the
> completion of this action before their executions. Unfortunately also that
> app fails :
>
> INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
> 172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
> to
> file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
> 2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.IllegalArgumentException: Wrong FS:
> file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
> 172.31.25.237:9000
>
> here is the hdfs-site.xml
>
> <configuration>
> <property>
>         <name>dfs.replication</name>
>         <value>3</value>
>     </property>
>     <property>
>         <name>dfs.permissions.enabled</name>
>         <value>false</value>
>     </property>
>     <property>
>         <name>dfs.namenode.name.dir</name>
>         <value>file:///home/hduser/hdfs/namenode</value>
>     </property>
>
>     <property>
>         <name>dfs.datanode.data.dir</name>
>         <value>file:///home/hduser/hdfs/datanode</value>
>     </property>
> </configuration>
>
> Thanks for the support
>
>
>
> [image: Immagine incorporata 1]
>
>
>
> 2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:
>
>> Guys,
>>
>>
>>
>> How does Hive work with MAPREDUCE , can someone clarify unable to
>> understand this concept.
>>
>>
>>
>> Regards,
>>
>>
>>
>> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
>> *Sent:* Wednesday, September 02, 2015 2:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Problems in running spark on Yarn
>>
>>
>>
>> Hi all!
>>
>> I'm developing a system where I need to run spark jobs over yarn. I'm
>> using a two node cluster (one master and one slave) for testing and I'm
>> submitting the application through oozie, but after the first application
>> starts running (the oozie container) the other one remains in accepted
>> stated. I am new to yarn so probably I am missing some concepts about how
>> containers are requested and assigned to the applications. It seems that I
>> can execute only one container at the time, even though there are still
>> free resources. When I kill the first running application, the other one
>> passes to running state. I'm also using the Fair Scheduler as according the
>> documentation, it should avoid any starvation problems.
>>
>> I don't know if it is a problem of spark or yarn. Please come with
>> suggestion if you have any.
>>
>>
>>
>> [image: Immagine incorporata 1]
>>
>>
>> Kiran Govind
>> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
>> UAE Operations
>> E  kgovind@ega.ae
>> D +97148021153
>> M +971555430816
>>
>>
>> Emirates Global Aluminium
>> P.O. Box 3627, Dubai
>> United Arab Emirates
>>
>> www.ega.ae
>>
>>
>> ------------------------------
>>
>> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
>> confidential to the intended recipient. If you are not the intended
>> recipient be advised that you have received this email in error and that
>> any use, dissemination, forwarding, printing or copying of this e-mail is
>> strictly prohibited. It may not be disclosed to or used by anyone other
>> than its intended recipient, nor may it be copied in any way. If received
>> in error please e-mail a reply to the sender and delete it from your
>> system. Although this e-mail has been scanned for viruses, Emirates Global
>> Aluminium cannot ultimately accept any responsibility for viruses and it is
>> your responsibility to scan attachments (if any).
>>
>
>
>
> --
> Matteo Remo Luzzi
>



-- 
Matteo Remo Luzzi

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi Tsuyoshi,
I made some changes to the configuration files once I read your answers and
the documentation of yarn. Now I'm using this settings:

yarn-site.xml

<property>
        <name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>12000</value>
    </property>

    <property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
    </property>

    <property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
    </property>

    <property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>12000</value>
    </property>

mapred-site.xml

<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>172.31.25.237:10020</value>
</property>

and concerning the spark property, i'm setting them through the
workflow.xml file of oozie

 <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
--conf spark.yarn.executor.memoryOverhead=500 --conf
spark.scheduler.mode=FAIR --conf
spark.yarn.driver.memoryOverhead=500</spark-opt>

With this configuration I am able to launch and execute different
containers. The spark container now fails for this reason:

Container [pid=18245,containerID=container_1441180487101_0017_02_000001] is
running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
container.

A second container is launched by the spark app, it is a copy of some
binary files into a separate folder in order to be analyzed by the spark
app. The logic of the code implies the spark jobs to wait for the
completion of this action before their executions. Unfortunately also that
app fails :

INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
to
file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.IllegalArgumentException: Wrong FS:
file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
172.31.25.237:9000

here is the hdfs-site.xml

<configuration>
<property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hdfs/namenode</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hdfs/datanode</value>
    </property>
</configuration>

Thanks for the support



[image: Immagine incorporata 1]



2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:

> Guys,
>
>
>
> How does Hive work with MAPREDUCE , can someone clarify unable to
> understand this concept.
>
>
>
> Regards,
>
>
>
> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
> *Sent:* Wednesday, September 02, 2015 2:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* Problems in running spark on Yarn
>
>
>
> Hi all!
>
> I'm developing a system where I need to run spark jobs over yarn. I'm
> using a two node cluster (one master and one slave) for testing and I'm
> submitting the application through oozie, but after the first application
> starts running (the oozie container) the other one remains in accepted
> stated. I am new to yarn so probably I am missing some concepts about how
> containers are requested and assigned to the applications. It seems that I
> can execute only one container at the time, even though there are still
> free resources. When I kill the first running application, the other one
> passes to running state. I'm also using the Fair Scheduler as according the
> documentation, it should avoid any starvation problems.
>
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>
>
> [image: Immagine incorporata 1]
>
>
> Kiran Govind
> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
> UAE Operations
> E  kgovind@ega.ae
> D +97148021153
> M +971555430816
>
>
> Emirates Global Aluminium
> P.O. Box 3627, Dubai
> United Arab Emirates
>
> www.ega.ae
>
>
> ------------------------------
>
> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
> confidential to the intended recipient. If you are not the intended
> recipient be advised that you have received this email in error and that
> any use, dissemination, forwarding, printing or copying of this e-mail is
> strictly prohibited. It may not be disclosed to or used by anyone other
> than its intended recipient, nor may it be copied in any way. If received
> in error please e-mail a reply to the sender and delete it from your
> system. Although this e-mail has been scanned for viruses, Emirates Global
> Aluminium cannot ultimately accept any responsibility for viruses and it is
> your responsibility to scan attachments (if any).
>



-- 
Matteo Remo Luzzi

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi Tsuyoshi,
I made some changes to the configuration files once I read your answers and
the documentation of yarn. Now I'm using this settings:

yarn-site.xml

<property>
        <name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>12000</value>
    </property>

    <property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
    </property>

    <property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
    </property>

    <property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>12000</value>
    </property>

mapred-site.xml

<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>172.31.25.237:10020</value>
</property>

and concerning the spark property, i'm setting them through the
workflow.xml file of oozie

 <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
--conf spark.yarn.executor.memoryOverhead=500 --conf
spark.scheduler.mode=FAIR --conf
spark.yarn.driver.memoryOverhead=500</spark-opt>

With this configuration I am able to launch and execute different
containers. The spark container now fails for this reason:

Container [pid=18245,containerID=container_1441180487101_0017_02_000001] is
running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
container.

A second container is launched by the spark app, it is a copy of some
binary files into a separate folder in order to be analyzed by the spark
app. The logic of the code implies the spark jobs to wait for the
completion of this action before their executions. Unfortunately also that
app fails :

INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
to
file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.IllegalArgumentException: Wrong FS:
file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
172.31.25.237:9000

here is the hdfs-site.xml

<configuration>
<property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hdfs/namenode</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hdfs/datanode</value>
    </property>
</configuration>

Thanks for the support



[image: Immagine incorporata 1]



2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:

> Guys,
>
>
>
> How does Hive work with MAPREDUCE , can someone clarify unable to
> understand this concept.
>
>
>
> Regards,
>
>
>
> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
> *Sent:* Wednesday, September 02, 2015 2:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* Problems in running spark on Yarn
>
>
>
> Hi all!
>
> I'm developing a system where I need to run spark jobs over yarn. I'm
> using a two node cluster (one master and one slave) for testing and I'm
> submitting the application through oozie, but after the first application
> starts running (the oozie container) the other one remains in accepted
> stated. I am new to yarn so probably I am missing some concepts about how
> containers are requested and assigned to the applications. It seems that I
> can execute only one container at the time, even though there are still
> free resources. When I kill the first running application, the other one
> passes to running state. I'm also using the Fair Scheduler as according the
> documentation, it should avoid any starvation problems.
>
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>
>
> [image: Immagine incorporata 1]
>
>
> Kiran Govind
> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
> UAE Operations
> E  kgovind@ega.ae
> D +97148021153
> M +971555430816
>
>
> Emirates Global Aluminium
> P.O. Box 3627, Dubai
> United Arab Emirates
>
> www.ega.ae
>
>
> ------------------------------
>
> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
> confidential to the intended recipient. If you are not the intended
> recipient be advised that you have received this email in error and that
> any use, dissemination, forwarding, printing or copying of this e-mail is
> strictly prohibited. It may not be disclosed to or used by anyone other
> than its intended recipient, nor may it be copied in any way. If received
> in error please e-mail a reply to the sender and delete it from your
> system. Although this e-mail has been scanned for viruses, Emirates Global
> Aluminium cannot ultimately accept any responsibility for viruses and it is
> your responsibility to scan attachments (if any).
>



-- 
Matteo Remo Luzzi

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi Tsuyoshi,
I made some changes to the configuration files once I read your answers and
the documentation of yarn. Now I'm using this settings:

yarn-site.xml

<property>
        <name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>12000</value>
    </property>

    <property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
    </property>

    <property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
    </property>

    <property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>12000</value>
    </property>

mapred-site.xml

<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>172.31.25.237:10020</value>
</property>

and concerning the spark property, i'm setting them through the
workflow.xml file of oozie

 <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
--conf spark.yarn.executor.memoryOverhead=500 --conf
spark.scheduler.mode=FAIR --conf
spark.yarn.driver.memoryOverhead=500</spark-opt>

With this configuration I am able to launch and execute different
containers. The spark container now fails for this reason:

Container [pid=18245,containerID=container_1441180487101_0017_02_000001] is
running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
container.

A second container is launched by the spark app, it is a copy of some
binary files into a separate folder in order to be analyzed by the spark
app. The logic of the code implies the spark jobs to wait for the
completion of this action before their executions. Unfortunately also that
app fails :

INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
to
file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.IllegalArgumentException: Wrong FS:
file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
172.31.25.237:9000

here is the hdfs-site.xml

<configuration>
<property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hdfs/namenode</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hdfs/datanode</value>
    </property>
</configuration>

Thanks for the support



[image: Immagine incorporata 1]



2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:

> Guys,
>
>
>
> How does Hive work with MAPREDUCE , can someone clarify unable to
> understand this concept.
>
>
>
> Regards,
>
>
>
> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
> *Sent:* Wednesday, September 02, 2015 2:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* Problems in running spark on Yarn
>
>
>
> Hi all!
>
> I'm developing a system where I need to run spark jobs over yarn. I'm
> using a two node cluster (one master and one slave) for testing and I'm
> submitting the application through oozie, but after the first application
> starts running (the oozie container) the other one remains in accepted
> stated. I am new to yarn so probably I am missing some concepts about how
> containers are requested and assigned to the applications. It seems that I
> can execute only one container at the time, even though there are still
> free resources. When I kill the first running application, the other one
> passes to running state. I'm also using the Fair Scheduler as according the
> documentation, it should avoid any starvation problems.
>
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>
>
> [image: Immagine incorporata 1]
>
>
> Kiran Govind
> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
> UAE Operations
> E  kgovind@ega.ae
> D +97148021153
> M +971555430816
>
>
> Emirates Global Aluminium
> P.O. Box 3627, Dubai
> United Arab Emirates
>
> www.ega.ae
>
>
> ------------------------------
>
> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
> confidential to the intended recipient. If you are not the intended
> recipient be advised that you have received this email in error and that
> any use, dissemination, forwarding, printing or copying of this e-mail is
> strictly prohibited. It may not be disclosed to or used by anyone other
> than its intended recipient, nor may it be copied in any way. If received
> in error please e-mail a reply to the sender and delete it from your
> system. Although this e-mail has been scanned for viruses, Emirates Global
> Aluminium cannot ultimately accept any responsibility for viruses and it is
> your responsibility to scan attachments (if any).
>



-- 
Matteo Remo Luzzi

Re: Problems in running spark on Yarn

Posted by Matteo Luzzi <ma...@gmail.com>.
Hi Tsuyoshi,
I made some changes to the configuration files once I read your answers and
the documentation of yarn. Now I'm using this settings:

yarn-site.xml

<property>
        <name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>12000</value>
    </property>

    <property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
    </property>

    <property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
    </property>

    <property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>12000</value>
    </property>

mapred-site.xml

<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>172.31.25.237:10020</value>
</property>

and concerning the spark property, i'm setting them through the
workflow.xml file of oozie

 <spark-opts>--executor-memory 2G --num-executors 1 --executor-cores 1
--conf spark.yarn.executor.memoryOverhead=500 --conf
spark.scheduler.mode=FAIR --conf
spark.yarn.driver.memoryOverhead=500</spark-opt>

With this configuration I am able to launch and execute different
containers. The spark container now fails for this reason:

Container [pid=18245,containerID=container_1441180487101_0017_02_000001] is
running beyond virtual memory limits. Current usage: 244.6 MB of 1 GB
physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing
container.

A second container is launched by the spark app, it is a copy of some
binary files into a separate folder in order to be analyzed by the spark
app. The logic of the code implies the spark jobs to wait for the
completion of this action before their executions. Unfortunately also that
app fails :

INFO [main] com.backtype.hadoop.AbstractFileCopyMapper: Copying hdfs://
172.31.25.237:9000/dataset/2015-08-31/17/00/0f15dec6-6085-4df7-87db-0083ba42ee0d.pailfile
to
file:/home/hduser/hdfs/hdfs-tmp/filecopy/a3d7dc66-11d1-488b-abf1-8f4dfa0aae7f
2015-09-02 08:52:27,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.IllegalArgumentException: Wrong FS:
file:/home/hduser/hdfs/hdfs-tmp/filecopy, expected: hdfs://
172.31.25.237:9000

here is the hdfs-site.xml

<configuration>
<property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hdfs/namenode</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hdfs/datanode</value>
    </property>
</configuration>

Thanks for the support



[image: Immagine incorporata 1]



2015-09-02 5:33 GMT+02:00 Kiran Kumar Reddy Govind <kg...@ega.ae>:

> Guys,
>
>
>
> How does Hive work with MAPREDUCE , can someone clarify unable to
> understand this concept.
>
>
>
> Regards,
>
>
>
> *From:* Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
> *Sent:* Wednesday, September 02, 2015 2:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* Problems in running spark on Yarn
>
>
>
> Hi all!
>
> I'm developing a system where I need to run spark jobs over yarn. I'm
> using a two node cluster (one master and one slave) for testing and I'm
> submitting the application through oozie, but after the first application
> starts running (the oozie container) the other one remains in accepted
> stated. I am new to yarn so probably I am missing some concepts about how
> containers are requested and assigned to the applications. It seems that I
> can execute only one container at the time, even though there are still
> free resources. When I kill the first running application, the other one
> passes to running state. I'm also using the Fair Scheduler as according the
> documentation, it should avoid any starvation problems.
>
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>
>
> [image: Immagine incorporata 1]
>
>
> Kiran Govind
> Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
> UAE Operations
> E  kgovind@ega.ae
> D +97148021153
> M +971555430816
>
>
> Emirates Global Aluminium
> P.O. Box 3627, Dubai
> United Arab Emirates
>
> www.ega.ae
>
>
> ------------------------------
>
> This is an e-mail from Emirates Global Aluminium PJSC. Its contents are
> confidential to the intended recipient. If you are not the intended
> recipient be advised that you have received this email in error and that
> any use, dissemination, forwarding, printing or copying of this e-mail is
> strictly prohibited. It may not be disclosed to or used by anyone other
> than its intended recipient, nor may it be copied in any way. If received
> in error please e-mail a reply to the sender and delete it from your
> system. Although this e-mail has been scanned for viruses, Emirates Global
> Aluminium cannot ultimately accept any responsibility for viruses and it is
> your responsibility to scan attachments (if any).
>



-- 
Matteo Remo Luzzi

RE: Problems in running spark on Yarn

Posted by Kiran Kumar Reddy Govind <kg...@ega.ae>.
Guys,

How does Hive work with MAPREDUCE , can someone clarify unable to understand this concept.

Regards,

From: Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
Sent: Wednesday, September 02, 2015 2:26 AM
To: user@hadoop.apache.org
Subject: Problems in running spark on Yarn

Hi all!
I'm developing a system where I need to run spark jobs over yarn. I'm using a two node cluster (one master and one slave) for testing and I'm submitting the application through oozie, but after the first application starts running (the oozie container) the other one remains in accepted stated. I am new to yarn so probably I am missing some concepts about how containers are requested and assigned to the applications. It seems that I can execute only one container at the time, even though there are still free resources. When I kill the first running application, the other one passes to running state. I'm also using the Fair Scheduler as according the documentation, it should avoid any starvation problems.
I don't know if it is a problem of spark or yarn. Please come with suggestion if you have any.

[Immagine incorporata 1]


Kiran Govind
Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
UAE Operations
E  kgovind@ega.ae
D +97148021153
M +971555430816

[cid:image41a6ec.JPG@6f6b2f07.4b872e7c]
Emirates Global Aluminium
P.O. Box 3627, Dubai
United Arab Emirates

www.ega.ae

________________________________

This is an e-mail from Emirates Global Aluminium PJSC. Its contents are confidential to the intended recipient. If you are not the intended recipient be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error please e-mail a reply to the sender and delete it from your system. Although this e-mail has been scanned for viruses, Emirates Global Aluminium cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any).

RE: Problems in running spark on Yarn

Posted by Kiran Kumar Reddy Govind <kg...@ega.ae>.
Guys,

How does Hive work with MAPREDUCE , can someone clarify unable to understand this concept.

Regards,

From: Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
Sent: Wednesday, September 02, 2015 2:26 AM
To: user@hadoop.apache.org
Subject: Problems in running spark on Yarn

Hi all!
I'm developing a system where I need to run spark jobs over yarn. I'm using a two node cluster (one master and one slave) for testing and I'm submitting the application through oozie, but after the first application starts running (the oozie container) the other one remains in accepted stated. I am new to yarn so probably I am missing some concepts about how containers are requested and assigned to the applications. It seems that I can execute only one container at the time, even though there are still free resources. When I kill the first running application, the other one passes to running state. I'm also using the Fair Scheduler as according the documentation, it should avoid any starvation problems.
I don't know if it is a problem of spark or yarn. Please come with suggestion if you have any.

[Immagine incorporata 1]


Kiran Govind
Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
UAE Operations
E  kgovind@ega.ae
D +97148021153
M +971555430816

[cid:image41a6ec.JPG@6f6b2f07.4b872e7c]
Emirates Global Aluminium
P.O. Box 3627, Dubai
United Arab Emirates

www.ega.ae

________________________________

This is an e-mail from Emirates Global Aluminium PJSC. Its contents are confidential to the intended recipient. If you are not the intended recipient be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error please e-mail a reply to the sender and delete it from your system. Although this e-mail has been scanned for viruses, Emirates Global Aluminium cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any).

RE: Problems in running spark on Yarn

Posted by Kiran Kumar Reddy Govind <kg...@ega.ae>.
Guys,

How does Hive work with MAPREDUCE , can someone clarify unable to understand this concept.

Regards,

From: Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
Sent: Wednesday, September 02, 2015 2:26 AM
To: user@hadoop.apache.org
Subject: Problems in running spark on Yarn

Hi all!
I'm developing a system where I need to run spark jobs over yarn. I'm using a two node cluster (one master and one slave) for testing and I'm submitting the application through oozie, but after the first application starts running (the oozie container) the other one remains in accepted stated. I am new to yarn so probably I am missing some concepts about how containers are requested and assigned to the applications. It seems that I can execute only one container at the time, even though there are still free resources. When I kill the first running application, the other one passes to running state. I'm also using the Fair Scheduler as according the documentation, it should avoid any starvation problems.
I don't know if it is a problem of spark or yarn. Please come with suggestion if you have any.

[Immagine incorporata 1]


Kiran Govind
Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
UAE Operations
E  kgovind@ega.ae
D +97148021153
M +971555430816

[cid:image41a6ec.JPG@6f6b2f07.4b872e7c]
Emirates Global Aluminium
P.O. Box 3627, Dubai
United Arab Emirates

www.ega.ae

________________________________

This is an e-mail from Emirates Global Aluminium PJSC. Its contents are confidential to the intended recipient. If you are not the intended recipient be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error please e-mail a reply to the sender and delete it from your system. Although this e-mail has been scanned for viruses, Emirates Global Aluminium cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any).

Re: Problems in running spark on Yarn

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Hi Matteo,

It depends on configurations - yarn-site.xml (nodemanager's capacity
of memory) and requests of containers by spark(spark.yarn.am.memory
and executor's Memory) if you don't use Dominant Resource Fairness.
Could you share them?

Thanks,
- Tsuyoshi

On Wed, Sep 2, 2015 at 7:25 AM, Matteo Luzzi <ma...@gmail.com> wrote:
> Hi all!
> I'm developing a system where I need to run spark jobs over yarn. I'm using
> a two node cluster (one master and one slave) for testing and I'm submitting
> the application through oozie, but after the first application starts
> running (the oozie container) the other one remains in accepted stated. I am
> new to yarn so probably I am missing some concepts about how containers are
> requested and assigned to the applications. It seems that I can execute only
> one container at the time, even though there are still free resources. When
> I kill the first running application, the other one passes to running state.
> I'm also using the Fair Scheduler as according the documentation, it should
> avoid any starvation problems.
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>

Re: Problems in running spark on Yarn

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Hi Matteo,

It depends on configurations - yarn-site.xml (nodemanager's capacity
of memory) and requests of containers by spark(spark.yarn.am.memory
and executor's Memory) if you don't use Dominant Resource Fairness.
Could you share them?

Thanks,
- Tsuyoshi

On Wed, Sep 2, 2015 at 7:25 AM, Matteo Luzzi <ma...@gmail.com> wrote:
> Hi all!
> I'm developing a system where I need to run spark jobs over yarn. I'm using
> a two node cluster (one master and one slave) for testing and I'm submitting
> the application through oozie, but after the first application starts
> running (the oozie container) the other one remains in accepted stated. I am
> new to yarn so probably I am missing some concepts about how containers are
> requested and assigned to the applications. It seems that I can execute only
> one container at the time, even though there are still free resources. When
> I kill the first running application, the other one passes to running state.
> I'm also using the Fair Scheduler as according the documentation, it should
> avoid any starvation problems.
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>

Re: Problems in running spark on Yarn

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Hi Matteo,

It depends on configurations - yarn-site.xml (nodemanager's capacity
of memory) and requests of containers by spark(spark.yarn.am.memory
and executor's Memory) if you don't use Dominant Resource Fairness.
Could you share them?

Thanks,
- Tsuyoshi

On Wed, Sep 2, 2015 at 7:25 AM, Matteo Luzzi <ma...@gmail.com> wrote:
> Hi all!
> I'm developing a system where I need to run spark jobs over yarn. I'm using
> a two node cluster (one master and one slave) for testing and I'm submitting
> the application through oozie, but after the first application starts
> running (the oozie container) the other one remains in accepted stated. I am
> new to yarn so probably I am missing some concepts about how containers are
> requested and assigned to the applications. It seems that I can execute only
> one container at the time, even though there are still free resources. When
> I kill the first running application, the other one passes to running state.
> I'm also using the Fair Scheduler as according the documentation, it should
> avoid any starvation problems.
> I don't know if it is a problem of spark or yarn. Please come with
> suggestion if you have any.
>
>

RE: Problems in running spark on Yarn

Posted by Kiran Kumar Reddy Govind <kg...@ega.ae>.
Guys,

How does Hive work with MAPREDUCE , can someone clarify unable to understand this concept.

Regards,

From: Matteo Luzzi [mailto:matteo.luzzi@gmail.com]
Sent: Wednesday, September 02, 2015 2:26 AM
To: user@hadoop.apache.org
Subject: Problems in running spark on Yarn

Hi all!
I'm developing a system where I need to run spark jobs over yarn. I'm using a two node cluster (one master and one slave) for testing and I'm submitting the application through oozie, but after the first application starts running (the oozie container) the other one remains in accepted stated. I am new to yarn so probably I am missing some concepts about how containers are requested and assigned to the applications. It seems that I can execute only one container at the time, even though there are still free resources. When I kill the first running application, the other one passes to running state. I'm also using the Fair Scheduler as according the documentation, it should avoid any starvation problems.
I don't know if it is a problem of spark or yarn. Please come with suggestion if you have any.

[Immagine incorporata 1]


Kiran Govind
Senior Superintendent - Maint. Planning, Power & Desalination Maintenance
UAE Operations
E  kgovind@ega.ae
D +97148021153
M +971555430816

[cid:image41a6ec.JPG@6f6b2f07.4b872e7c]
Emirates Global Aluminium
P.O. Box 3627, Dubai
United Arab Emirates

www.ega.ae

________________________________

This is an e-mail from Emirates Global Aluminium PJSC. Its contents are confidential to the intended recipient. If you are not the intended recipient be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error please e-mail a reply to the sender and delete it from your system. Although this e-mail has been scanned for viruses, Emirates Global Aluminium cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any).