You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Atul Rajan <at...@icloud.com> on 2017/07/27 04:11:01 UTC

DR for Data Lake

Hello all,

We are planning to implement Data lake for our financial data. How can we achieve Disaster Recovery for our Data Lake.

initially all the data marts will be pushed to data lake but we want something for our Data recovery. please suggest some ideas

Thanks and Regards
Atul Rajan


-Sent from my iPhone

On 12-Jan-2017, at 4:43 AM, Akash Mishra <ak...@gmail.com> wrote:

You are getting NPE on org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.a.getName  which is not in Hadoop codebase. I can see you are using some other Scheduler Implementation com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair, hence you can check 
SourceFile:204 for more details. 

My guess is that you need to set some Name parameter in which is requested only on Debug level.

Thanks, 



> On Wed, Jan 11, 2017 at 10:59 PM, Stephen Sprague <sp...@gmail.com> wrote:
> ok.  i would attach but... i think there might be an aversion to attachments so i'll paste inline.  hopefully its not too confusing.
> 
> $ cat fair-scheduler.xml
> 
> <?xml version="1.0"?>
> 
> <!--
>   This is a sample configuration file for the Fair Scheduler. For details
>   on the options, please refer to the fair scheduler documentation at
>   http://hadoop.apache.org/core/docs/r0.21.0/fair_scheduler.html.
> 
>   To create your own configuration, copy this file to conf/fair-scheduler.xml
>   and add the following property in mapred-site.xml to point Hadoop to the
>   file, replacing [HADOOP_HOME] with the path to your installation directory:
>     <property>
>       <name>mapred.fairscheduler.allocation.file</name>
>       <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value>
>     </property>
> 
>   Note that all the parameters in the configuration file below are optional,
>   including the parameters inside <pool> and <user> elements. It is only
>   necessary to set the ones you want to differ from the defaults.
> -->
> 
> <!-- https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html -->
> 
> <allocations>
> 
>   <!-- NOTE. ** Preemption IS NOT turn on! ** -->
> 
>   <!-- Preemption timeout for jobs below their fair share, in seconds.
>     If a job is below half its fair share for this amount of time, it
>     is allowed to kill tasks from other jobs to go up to its fair share.
>     Requires mapred.fairscheduler.preemption to be true in mapred-site.xml. -->
>   <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
> 
>   <!-- Default min share preemption timeout for pools where it is not
>     explicitly configured, in seconds. Requires mapred.fairscheduler.preemption
>     to be set to true in your mapred-site.xml. -->
>   <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
> 
>   <!-- Default running job limit pools where it is not explicitly set. -->
>   <queueMaxJobsDefault>20</queueMaxJobsDefault>
> 
>   <!-- Default running job limit users where it is not explicitly set. -->
>   <userMaxJobsDefault>10</userMaxJobsDefault>
> 
> 
> <!--  QUEUES:
>          dwr.interactive   : 10 at once
>          dwr.batch_sql     : 15 at once
>          dwr.batch_hdfs    : 5 at once   (distcp, sqoop, hfs -put, anything besides 'sql')
>          dwr.qa            : 3 at once
>          dwr.truck_lane    : 1 at once
> 
>          cad.interactive   : 5 at once
>          cad.batch         : 10 at once
> 
>          comms.interactive : 5 at once
>          comms.batch       : 3 at once
> 
>          default           : 2 at once   (to discourage its use)
> -->
> 
> 
> <!-- queue placement -->
> 
>   <queuePlacementPolicy>
>     <rule name="specified" />
>     <rule name="default" />
>   </queuePlacementPolicy>
> 
> 
> <!-- footprint -->
>  <queue name='footprint'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
> 
>     <maxRunningApps>4</maxRunningApps>
>     <aclSubmitApps>*</aclSubmitApps>
> 
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
>     <userMaxJobsDefault>50</userMaxJobsDefault>
> 
>     <maxMaps>200</maxMaps>
>     <maxReduces>200</maxReduces>
>     <minResources>20000 mb, 10 vcores</minResources>
>     <maxResources>500000 mb, 175 vcores</maxResources>
> 
>     <queue name="dev">
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>     </queue>
> 
>     <queue name="stage">
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>     </queue>
>   </queue>
> 
> <!-- comms -->
>  <queue name='comms'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
> 
>     <queue name="interactive">
>        <maxRunningApps>5</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
>     <queue name="batch">
>        <maxRunningApps>10</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
>   </queue>
> 
> <!-- cad -->
>  <queue name='cad'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
> 
>     <queue name="interactive">
>        <maxRunningApps>5</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
>     <queue name="batch">
>        <maxRunningApps>10</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
>   </queue>
> 
> 
> 
> <!-- dwr -->
>   <queue name="dwr">
> 
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
>     <userMaxJobsDefault>50</userMaxJobsDefault>
> 
>     <maxMaps>200</maxMaps>
>     <maxReduces>200</maxReduces>
>     <minResources>20000 mb, 10 vcores</minResources>
>     <maxResources>500000 mb, 175 vcores</maxResources>
> 
> <!-- INTERACTiVE. 5 at once -->
>     <queue name="interactive">
>         <weight>2.0</weight>
>         <maxRunningApps>5</maxRunningApps>
> 
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
> 
> <!-- per user. but given everything is dwr (for now) its not helpful -->
>         <userMaxAppsDefault>5</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
> <!-- BATCH. 15 at once -->
>     <queue name="batch_sql">
>         <weight>1.5</weight>
>         <maxRunningApps>15</maxRunningApps>
> 
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
> 
>         <userMaxAppsDefault>50</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
> <!-- sqoop, distcp, hdfs-put type jobs here. 3 at once -->
>     <queue name="batch_hdfs">
>         <weight>1.0</weight>
>         <maxRunningApps>3</maxRunningApps>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
> 
> 
> <!-- QA. 3 at once -->
>     <queue name="qa">
>         <weight>1.0</weight>
>         <maxRunningApps>100</maxRunningApps>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <aclSubmitApps>*</aclSubmitApps>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
> 
>     </queue>
> 
> <!-- big, unruly jobs -->
>     <queue name="truck_lane">
>         <weight>0.75</weight>
>         <maxRunningApps>1</maxRunningApps>
>         <minMaps>5</minMaps>
>         <minReduces>5</minReduces>
> 
> <!-- lets try without static values and see how the "weight" works
> -->
>         <maxMaps>192</maxMaps>
>         <maxReduces>192</maxReduces>
>         <minResources>20000 mb, 10 vcores</minResources>
>         <maxResources>500000 mb, 200 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
> <!--
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <aclSubmitApps>*</aclSubmitApps>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
> -->
>     </queue>
>   </queue>
> 
> <!-- DEFAULT. 2 at once -->
>   <queue name="default">
>        <maxRunningApps>2</maxRunningApps>
> 
>        <maxMaps>40</maxMaps>
>        <maxReduces>40</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>20000 mb, 10 vcores</maxResources>
> 
> <!-- not used. Number of seconds after which the pool can preempt other pools -->
>       <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
>       <userMaxAppsDefault>5</userMaxAppsDefault>
>       <aclSubmitApps>*</aclSubmitApps>
>   </queue>
> 
> 
> </allocations>
> 
> 
> 
> <!-- some other stuff
> 
>     <minResources>10000 mb,0vcores</minResources>
>     <maxResources>90000 mb,0vcores</maxResources>
> 
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
> 
> -->
> 
> <!-- enabling
>    * Bringing the queues in effect:
>    Once the required parameters are defined in fair-scheduler.xml file, run the command to bring the changes in effect.
>    yarn rmadmin -refreshQueues
> -->
> 
> <!-- verifying
>   Once the command runs properly, verify if the queues are setup using 2 options:
> 
>   1) hadoop queue -list
>   or
>   2) Open YARN resourcemanager GUI from Resource Manager GUI: http://<Resouremanager-hostname>:8088, click Scheduler.
> 
> -->
> 
> 
> <!-- notes
>    [fail_user@phd11-nn ~]$ id
>    uid=507(fail_user) gid=507(failgroup) groups=507(failgroup)
>    [fail_user@phd11-nn ~]$ hadoop queue -showacls
> -->
> 
> 
> <!-- submit
>    To submit an application use the parameter -Dmapred.job.queue.name=<queue-name> or -Dmapred.job.queuename=<queue-name>
> -->
> 
> 
> 
> 
> 
> *** yarn-site.xml
> 
> 
> 
> $ cat yarn-site.xml
> 
> ssprague-mbpro:~ spragues$ cat yarn-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> 
> <configuration>
> <!--Autogenerated yarn params from puppet yaml hash yarn_site_parameters__xml -->
>   <property>
>     <name>yarn.resourcemanager.hostname</name>
>     <value>FOO.sv2.trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce_shuffle</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
>     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.local-dirs</name>
>     <value>/storage0/hadoop/yarn/local,/storage1/hadoop/yarn/local,/storage2/hadoop/yarn/local,/storage3/hadoop/yarn/local,/storage4/hadoop/yarn/local,/storage5/hadoop/yarn/local</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.scheduler.class</name>
>     <value>com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair</value>
>   </property>
>   <property>
>     <name>yarn.application.classpath</name>
>     <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$TEZ_HOME/*,$TEZ_HOME/lib/*</value>
>   </property>
>   <property>
>     <name>pepperdata.license.key.specification</name>
>     <value>data://removed</value>
>   </property>
>   <property>
>     <name>pepperdata.license.key.comments</name>
>     <value>License Type: PRODUCTION Expiration Date (UTC): 2017/02/01 Company Name: Trulia, LLC Cluster Name: trulia-production Number of Nodes: 150 Contact Person Name: Deep Varma Contact Person Email: dvarma@trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.hostname</name>
>     <value>FOO.sv2.trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.webapp.address</name>
>     <value>FOO.sv2.trulia.com:8188</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.http-cross-origin.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.ttl-enable</name>
>     <value>false</value>
>   </property>
> 
> <!--
>   <property>
>     <name>yarn.timeline-service.store-class</name>
>     <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDbTimelineStore</value>
>   </property>
> -->
>   <property>
>     <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.user-as-default-queue</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.preemption</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.sizebasedweight</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-mb</name>
>     <value>2048</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-mb</name>
>     <value>8192</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
>     <value>98.5</value>
>   </property>
>   <property>
>     <name>yarn.log-aggregation.retain-seconds</name>
>     <value>604800</value>
>   </property>
>   <property>
>     <name>yarn.log-aggregation-enable</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.log-dirs</name>
>     <value>${yarn.log.dir}/userlogs</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.remote-app-log-dir</name>
>     <value>/app-logs</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.delete.debug-delay-sec</name>
>     <value>600</value>
>   </property>
>   <property>
>     <name>yarn.log.server.url</name>
>     <value>http://FOO.sv2.trulia.com:19888/jobhistory/logs</value>
>   </property>
> 
> </configuration>
> 
> 
>> On Wed, Jan 11, 2017 at 2:27 PM, Akash Mishra <ak...@gmail.com> wrote:
>> Please post your fair-scheduler.xml file and yarn-site.xml 
>> 
>>> On Wed, Jan 11, 2017 at 9:14 PM, Stephen Sprague <sp...@gmail.com> wrote:
>>> hey guys,
>>> i'm running the RM with the above options (version 2.6.1) and get an NPE upon startup.
>>> 
>>> {code}
>>> 17/01/11 12:44:45 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
>>> java.lang.NullPointerException
>>>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.a.getName(SourceFile:204)
>>>         at org.apache.hadoop.service.CompositeService.addService(CompositeService.java:73)
>>>         at org.apache.hadoop.service.CompositeService.addIfService(CompositeService.java:88)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:490)
>>>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:993)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:255)
>>>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1214)
>>> 17/01/11 12:44:45 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG:
>>> {code}
>>> 
>>> the fair-scheduler.xml file is fine and works in INFO level logging so i'm pretty sure there's nothing "wrong" with it. So with DEBUG level its making this java call and barfing.
>>> 
>>> Any ideas how to fix this?
>>> 
>>> thanks,
>>> Stephen.
>> 
>> 
>> 
>> -- 
>> Regards,
>> Akash Mishra.
>> 
>> "It's not our abilities that make us, but our decisions."--Albus Dumbledore
> 



-- 
Regards,
Akash Mishra.

"It's not our abilities that make us, but our decisions."--Albus Dumbledore

Re: DR for Data Lake

Posted by daemeon reiydelle <da...@gmail.com>.

Determine what is meant by "disaster recovery". What are the scenarious,
what data.

Architect to the business need, not the buzz words


*“Anyone who isn’t embarrassed by who they were last year probably isn’t
learning enough.” - Alain de Botton*


*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


On Wed, Jul 26, 2017 at 9:11 PM, Atul Rajan <at...@icloud.com> wrote:

> Hello all,
>
> We are planning to implement Data lake for our financial data. How can we
> achieve Disaster Recovery for our Data Lake.
>
> initially all the data marts will be pushed to data lake but we want
> something for our Data recovery. please suggest some ideas
>
> Thanks and Regards
> Atul Rajan
>
>
> -Sent from my iPhone
>
> On 12-Jan-2017, at 4:43 AM, Akash Mishra <ak...@gmail.com> wrote:
>
> You are getting NPE on *org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.a.getName
> * which is not in Hadoop codebase. I can see you are using some other
> Scheduler Implementation
> *com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair,* hence
> you can check
> *SourceFile:204 *for more details.
>
> My guess is that you need to set some Name parameter in which is requested
> only on Debug level.
>
> Thanks,
>
>
>
> On Wed, Jan 11, 2017 at 10:59 PM, Stephen Sprague <sp...@gmail.com>
> wrote:
>
>> ok.  i would attach but... i think there might be an aversion to
>> attachments so i'll paste inline.  hopefully its not too confusing.
>>
>> $ cat fair-scheduler.xml
>>
>> <?xml version="1.0"?>
>>
>> <!--
>>   This is a sample configuration file for the Fair Scheduler. For details
>>   on the options, please refer to the fair scheduler documentation at
>>   http://hadoop.apache.org/core/docs/r0.21.0/fair_scheduler.html.
>>
>>   To create your own configuration, copy this file to
>> conf/fair-scheduler.xml
>>   and add the following property in mapred-site.xml to point Hadoop to the
>>   file, replacing [HADOOP_HOME] with the path to your installation
>> directory:
>>     <property>
>>       <name>mapred.fairscheduler.allocation.file</name>
>>       <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value>
>>     </property>
>>
>>   Note that all the parameters in the configuration file below are
>> optional,
>>   including the parameters inside <pool> and <user> elements. It is only
>>   necessary to set the ones you want to differ from the defaults.
>> -->
>>
>> <!-- https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html -->
>>
>> <allocations>
>>
>>   <!-- NOTE. ** Preemption IS NOT turn on! ** -->
>>
>>   <!-- Preemption timeout for jobs below their fair share, in seconds.
>>     If a job is below half its fair share for this amount of time, it
>>     is allowed to kill tasks from other jobs to go up to its fair share.
>>     Requires mapred.fairscheduler.preemption to be true in
>> mapred-site.xml. -->
>>   <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
>>
>>   <!-- Default min share preemption timeout for pools where it is not
>>     explicitly configured, in seconds. Requires
>> mapred.fairscheduler.preemption
>>     to be set to true in your mapred-site.xml. -->
>>   <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemp
>> tionTimeout>
>>
>>   <!-- Default running job limit pools where it is not explicitly set. -->
>>   <queueMaxJobsDefault>20</queueMaxJobsDefault>
>>
>>   <!-- Default running job limit users where it is not explicitly set. -->
>>   <userMaxJobsDefault>10</userMaxJobsDefault>
>>
>>
>> <!--  QUEUES:
>>          dwr.interactive   : 10 at once
>>          dwr.batch_sql     : 15 at once
>>          dwr.batch_hdfs    : 5 at once   (distcp, sqoop, hfs -put,
>> anything besides 'sql')
>>          dwr.qa            : 3 at once
>>          dwr.truck_lane    : 1 at once
>>
>>          cad.interactive   : 5 at once
>>          cad.batch         : 10 at once
>>
>>          comms.interactive : 5 at once
>>          comms.batch       : 3 at once
>>
>>          default           : 2 at once   (to discourage its use)
>> -->
>>
>>
>> <!-- queue placement -->
>>
>>   <queuePlacementPolicy>
>>     <rule name="specified" />
>>     <rule name="default" />
>>   </queuePlacementPolicy>
>>
>>
>> <!-- footprint -->
>>  <queue name='footprint'>
>>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>>
>>     <maxRunningApps>4</maxRunningApps>
>>     <aclSubmitApps>*</aclSubmitApps>
>>
>>     <minMaps>10</minMaps>
>>     <minReduces>5</minReduces>
>>     <userMaxJobsDefault>50</userMaxJobsDefault>
>>
>>     <maxMaps>200</maxMaps>
>>     <maxReduces>200</maxReduces>
>>     <minResources>20000 mb, 10 vcores</minResources>
>>     <maxResources>500000 mb, 175 vcores</maxResources>
>>
>>     <queue name="dev">
>>        <maxMaps>200</maxMaps>
>>        <maxReduces>200</maxReduces>
>>        <minResources>20000 mb, 10 vcores</minResources>
>>        <maxResources>500000 mb, 175 vcores</maxResources>
>>     </queue>
>>
>>     <queue name="stage">
>>        <maxMaps>200</maxMaps>
>>        <maxReduces>200</maxReduces>
>>        <minResources>20000 mb, 10 vcores</minResources>
>>        <maxResources>500000 mb, 175 vcores</maxResources>
>>     </queue>
>>   </queue>
>>
>> <!-- comms -->
>>  <queue name='comms'>
>>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>>
>>     <queue name="interactive">
>>        <maxRunningApps>5</maxRunningApps>
>>        <aclSubmitApps>*</aclSubmitApps>
>>     </queue>
>>
>>     <queue name="batch">
>>        <maxRunningApps>10</maxRunningApps>
>>        <aclSubmitApps>*</aclSubmitApps>
>>     </queue>
>>
>>   </queue>
>>
>> <!-- cad -->
>>  <queue name='cad'>
>>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>>
>>     <queue name="interactive">
>>        <maxRunningApps>5</maxRunningApps>
>>        <aclSubmitApps>*</aclSubmitApps>
>>     </queue>
>>
>>
>>     <queue name="batch">
>>        <maxRunningApps>10</maxRunningApps>
>>        <aclSubmitApps>*</aclSubmitApps>
>>     </queue>
>>
>>   </queue>
>>
>>
>>
>> <!-- dwr -->
>>   <queue name="dwr">
>>
>>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>>     <minMaps>10</minMaps>
>>     <minReduces>5</minReduces>
>>     <userMaxJobsDefault>50</userMaxJobsDefault>
>>
>>     <maxMaps>200</maxMaps>
>>     <maxReduces>200</maxReduces>
>>     <minResources>20000 mb, 10 vcores</minResources>
>>     <maxResources>500000 mb, 175 vcores</maxResources>
>>
>> <!-- INTERACTiVE. 5 at once -->
>>     <queue name="interactive">
>>         <weight>2.0</weight>
>>         <maxRunningApps>5</maxRunningApps>
>>
>>        <maxMaps>200</maxMaps>
>>        <maxReduces>200</maxReduces>
>>        <minResources>20000 mb, 10 vcores</minResources>
>>        <maxResources>500000 mb, 175 vcores</maxResources>
>>
>> <!-- not used. Number of seconds after which the pool can preempt other
>> pools -->
>>         <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
>>
>> <!-- per user. but given everything is dwr (for now) its not helpful -->
>>         <userMaxAppsDefault>5</userMaxAppsDefault>
>>         <aclSubmitApps>*</aclSubmitApps>
>>     </queue>
>>
>>
>> <!-- BATCH. 15 at once -->
>>     <queue name="batch_sql">
>>         <weight>1.5</weight>
>>         <maxRunningApps>15</maxRunningApps>
>>
>>        <maxMaps>200</maxMaps>
>>        <maxReduces>200</maxReduces>
>>        <minResources>20000 mb, 10 vcores</minResources>
>>        <maxResources>500000 mb, 175 vcores</maxResources>
>>
>> <!-- not used. Number of seconds after which the pool can preempt other
>> pools -->
>>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>>
>>         <userMaxAppsDefault>50</userMaxAppsDefault>
>>         <aclSubmitApps>*</aclSubmitApps>
>>     </queue>
>>
>>
>> <!-- sqoop, distcp, hdfs-put type jobs here. 3 at once -->
>>     <queue name="batch_hdfs">
>>         <weight>1.0</weight>
>>         <maxRunningApps>3</maxRunningApps>
>>
>> <!-- not used. Number of seconds after which the pool can preempt other
>> pools -->
>>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>>         <userMaxAppsDefault>50</userMaxAppsDefault>
>>         <aclSubmitApps>*</aclSubmitApps>
>>     </queue>
>>
>>
>> <!-- QA. 3 at once -->
>>     <queue name="qa">
>>         <weight>1.0</weight>
>>         <maxRunningApps>100</maxRunningApps>
>>
>> <!-- not used. Number of seconds after which the pool can preempt other
>> pools -->
>>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>>         <aclSubmitApps>*</aclSubmitApps>
>>         <userMaxAppsDefault>50</userMaxAppsDefault>
>>
>>     </queue>
>>
>> <!-- big, unruly jobs -->
>>     <queue name="truck_lane">
>>         <weight>0.75</weight>
>>         <maxRunningApps>1</maxRunningApps>
>>         <minMaps>5</minMaps>
>>         <minReduces>5</minReduces>
>>
>> <!-- lets try without static values and see how the "weight" works
>> -->
>>         <maxMaps>192</maxMaps>
>>         <maxReduces>192</maxReduces>
>>         <minResources>20000 mb, 10 vcores</minResources>
>>         <maxResources>500000 mb, 200 vcores</maxResources>
>>
>> <!-- not used. Number of seconds after which the pool can preempt other
>> pools -->
>> <!--
>>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>>         <aclSubmitApps>*</aclSubmitApps>
>>         <userMaxAppsDefault>50</userMaxAppsDefault>
>> -->
>>     </queue>
>>   </queue>
>>
>> <!-- DEFAULT. 2 at once -->
>>   <queue name="default">
>>        <maxRunningApps>2</maxRunningApps>
>>
>>        <maxMaps>40</maxMaps>
>>        <maxReduces>40</maxReduces>
>>        <minResources>20000 mb, 10 vcores</minResources>
>>        <maxResources>20000 mb, 10 vcores</maxResources>
>>
>> <!-- not used. Number of seconds after which the pool can preempt other
>> pools -->
>>       <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
>>       <userMaxAppsDefault>5</userMaxAppsDefault>
>>       <aclSubmitApps>*</aclSubmitApps>
>>   </queue>
>>
>>
>> </allocations>
>>
>>
>>
>> <!-- some other stuff
>>
>>     <minResources>10000 mb,0vcores</minResources>
>>     <maxResources>90000 mb,0vcores</maxResources>
>>
>>     <minMaps>10</minMaps>
>>     <minReduces>5</minReduces>
>>
>> -->
>>
>> <!-- enabling
>>    * Bringing the queues in effect:
>>    Once the required parameters are defined in fair-scheduler.xml file,
>> run the command to bring the changes in effect.
>>    yarn rmadmin -refreshQueues
>> -->
>>
>> <!-- verifying
>>   Once the command runs properly, verify if the queues are setup using 2
>> options:
>>
>>   1) hadoop queue -list
>>   or
>>   2) Open YARN resourcemanager GUI from Resource Manager GUI: http://
>> <Resouremanager-hostname>:8088, click Scheduler.
>>
>> -->
>>
>>
>> <!-- notes
>>    [fail_user@phd11-nn ~]$ id
>>    uid=507(fail_user) gid=507(failgroup) groups=507(failgroup)
>>    [fail_user@phd11-nn ~]$ hadoop queue -showacls
>> -->
>>
>>
>> <!-- submit
>>    To submit an application use the parameter -Dmapred.job.queue.name
>> =<queue-name> or -Dmapred.job.queuename=<queue-name>
>> -->
>>
>>
>>
>>
>>
>> *** yarn-site.xml
>>
>>
>>
>> $ cat yarn-site.xml
>>
>> ssprague-mbpro:~ spragues$ cat yarn-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <configuration>
>> <!--Autogenerated yarn params from puppet yaml hash
>> yarn_site_parameters__xml -->
>>   <property>
>>     <name>yarn.resourcemanager.hostname</name>
>>     <value>FOO.sv2.trulia.com</value>
>>   </property>
>>   <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce_shuffle</value>
>>   </property>
>>   <property>
>>     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
>>     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>>   </property>
>>   <property>
>>     <name>yarn.nodemanager.local-dirs</name>
>>     <value>/storage0/hadoop/yarn/local,/storage1/hadoop/yarn/loc
>> al,/storage2/hadoop/yarn/local,/storage3/hadoop/yarn/local,/
>> storage4/hadoop/yarn/local,/storage5/hadoop/yarn/local</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.scheduler.class</name>
>>     <value>com.pepperdata.supervisor.scheduler.PepperdataSupervi
>> sorYarnFair</value>
>>   </property>
>>   <property>
>>     <name>yarn.application.classpath</name>
>>     <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON
>> _HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$
>> HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_
>> YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$TEZ_HOME/*,$TEZ_HOME/lib/*</value>
>>   </property>
>>   <property>
>>     <name>pepperdata.license.key.specification</name>
>>     <value>data://removed</value>
>>   </property>
>>   <property>
>>     <name>pepperdata.license.key.comments</name>
>>     <value>License Type: PRODUCTION Expiration Date (UTC): 2017/02/01
>> Company Name: Trulia, LLC Cluster Name: trulia-production Number of Nodes:
>> 150 Contact Person Name: Deep Varma Contact Person Email:
>> dvarma@trulia.com</value>
>>   </property>
>>   <property>
>>     <name>yarn.timeline-service.hostname</name>
>>     <value>FOO.sv2.trulia.com</value>
>>   </property>
>>   <property>
>>     <name>yarn.timeline-service.enabled</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.timeline-service.webapp.address</name>
>>     <value>FOO.sv2.trulia.com:8188</value>
>>   </property>
>>   <property>
>>     <name>yarn.timeline-service.http-cross-origin.enabled</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.timeline-service.ttl-enable</name>
>>     <value>false</value>
>>   </property>
>>
>> <!--
>>   <property>
>>     <name>yarn.timeline-service.store-class</name>
>>     <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDb
>> TimelineStore</value>
>>   </property>
>> -->
>>   <property>
>>     <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.scheduler.fair.user-as-default-queue</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.scheduler.fair.preemption</name>
>>     <value>false</value>
>>   </property>
>>   <property>
>>     <name>yarn.scheduler.fair.sizebasedweight</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.scheduler.minimum-allocation-mb</name>
>>     <value>2048</value>
>>   </property>
>>   <property>
>>     <name>yarn.scheduler.maximum-allocation-mb</name>
>>     <value>8192</value>
>>   </property>
>>   <property>
>>     <name>yarn.nodemanager.disk-health-checker.max-disk-utilizat
>> ion-per-disk-percentage</name>
>>     <value>98.5</value>
>>   </property>
>>   <property>
>>     <name>yarn.log-aggregation.retain-seconds</name>
>>     <value>604800</value>
>>   </property>
>>   <property>
>>     <name>yarn.log-aggregation-enable</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.nodemanager.log-dirs</name>
>>     <value>${yarn.log.dir}/userlogs</value>
>>   </property>
>>   <property>
>>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>     <value>/app-logs</value>
>>   </property>
>>   <property>
>>     <name>yarn.nodemanager.delete.debug-delay-sec</name>
>>     <value>600</value>
>>   </property>
>>   <property>
>>     <name>yarn.log.server.url</name>
>>     <value>http://FOO.sv2.trulia.com:19888/jobhistory/logs</value>
>>   </property>
>>
>> </configuration>
>>
>>
>> On Wed, Jan 11, 2017 at 2:27 PM, Akash Mishra <ak...@gmail.com>
>> wrote:
>>
>>> Please post your fair-scheduler.xml file and yarn-site.xml
>>>
>>> On Wed, Jan 11, 2017 at 9:14 PM, Stephen Sprague <sp...@gmail.com>
>>> wrote:
>>>
>>>> hey guys,
>>>> i'm running the RM with the above options (version 2.6.1) and get an
>>>> NPE upon startup.
>>>>
>>>> {code}
>>>> 17/01/11 12:44:45 FATAL resourcemanager.ResourceManager: Error
>>>> starting ResourceManager
>>>> java.lang.NullPointerException
>>>>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair
>>>> .a.getName(SourceFile:204)
>>>>         at org.apache.hadoop.service.CompositeService.addService(Compos
>>>> iteService.java:73)
>>>>         at org.apache.hadoop.service.CompositeService.addIfService(Comp
>>>> ositeService.java:88)
>>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>> r$RMActiveServices.serviceInit(ResourceManager.java:490)
>>>>         at org.apache.hadoop.service.AbstractService.init(AbstractServi
>>>> ce.java:163)
>>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>> r.createAndInitActiveServices(ResourceManager.java:993)
>>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>> r.serviceInit(ResourceManager.java:255)
>>>>         at org.apache.hadoop.service.AbstractService.init(AbstractServi
>>>> ce.java:163)
>>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>> r.main(ResourceManager.java:1214)
>>>> 17/01/11 12:44:45 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG:
>>>> {code}
>>>>
>>>> the fair-scheduler.xml file is fine and works in INFO level logging so
>>>> i'm pretty sure there's nothing "wrong" with it. So with DEBUG level its
>>>> making this java call and barfing.
>>>>
>>>> Any ideas how to fix this?
>>>>
>>>> thanks,
>>>> Stephen.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Akash Mishra.
>>>
>>>
>>> "It's not our abilities that make us, but our decisions."--Albus
>>> Dumbledore
>>>
>>
>>
>
>
> --
>
> Regards,
> Akash Mishra.
>
>
> "It's not our abilities that make us, but our decisions."--Albus Dumbledore
>