You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by clay teahouse <cl...@gmail.com> on 2015/02/19 13:46:13 UTC

HdfsBolt and hdfs in HA mode

Hi All,
Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
which hdfs node is the active node?

thanks
Clay

Re: HdfsBolt and hdfs in HA mode

Posted by Parth Brahmbhatt <pb...@hortonworks.com>.
Following links should help
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_user-guide/conte
nt/ch_storm-using-hdfs-connector.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_user-guide/conte
nt/ch_storm-using-packaging-topologies.html

Thanks
Parth

On 2/19/15, 8:58 AM, "Bobby Evans" <ev...@yahoo-inc.com.INVALID> wrote:

>Hadoop has lots of different configurations in core-site.xml,
>hdfs-site.xml, ... all of which eventually get loaded into the
>Configuration object used to create a FileSystem instance.  There are so
>many different configurations related to security, HA, etc. that it is
>almost impossible for me to guess exactly which ones you need to have set
>correctly to make this work.  Typically what we do for storm to be able
>to talk to HDFS is to package the complete set of configs that appear on
>a Hadoop Gateway with the topology jar when it is shipped.  This
>guarantees that the config is the same as on the gateway and should
>behave the same way.  You can also grab them from the name node or any of
>the hadoop compute nodes.
> This will work for the HdfsBolt that loads default configurations from
>the classpath before overriding them with any custom configurations you
>set for that bolt.
>
>- Bobby
> 
>
>     On Thursday, February 19, 2015 10:42 AM, clay teahouse
><cl...@gmail.com> wrote:
>   
>
> Bobby,What do you mean by client here? In this context, do you consider
>hdfsbolt a client? If yes, then which configuration you are referring to?
>I've seen the following, but I am not sure if I follow.
>   
>   - dfs.client.failover.proxy.provider.[nameservice ID] - the Java class
>that HDFS clients use to contact the Active NameNodeConfigure the name of
>the Java class which will be used by the DFS Client to determine which
>NameNode is the current Active, and therefore which NameNode is currently
>serving client requests. The only implementation which currently ships
>with Hadoop is the ConfiguredFailoverProxyProvider, so use this unless
>you are using a custom one. For example:   <property>
>  <name>dfs.client.failover.proxy.provider.mycluster</name>
>  
><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyPr
>ovider</value>
></property>
>thanks,Clay
>
>On Thu, Feb 19, 2015 at 8:38 AM, Bobby Evans
><ev...@yahoo-inc.com.invalid> wrote:
>
>HDFS HA provides fail-over for the name node and the client determines
>which name node is the active one but should be completely transparent to
>you if the client is configured correctly.
> - Bobby
>
>
>     On Thursday, February 19, 2015 6:47 AM, clay teahouse
><cl...@gmail.com> wrote:
>
>
> Hi All,
>Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
>which hdfs node is the active node?
>
>thanks
>Clay
>
>
>   
>
>
>
>   


Re: HdfsBolt and hdfs in HA mode

Posted by clay teahouse <cl...@gmail.com>.
I am already using hdfsbolt successfully (without hdfs HA). So, I assume
this the client java class in already in my classpath if it comes with the
hadoop jar files that I load when I run my topology, unless there is jar
specific to the hadoop HA classes which would have
dfs.clent.failover.proxy.provider class.  I mean I don't need to take any
specific action aside from configuring my dfs-site.xml.

thanks,
Clay

On Thu, Feb 19, 2015 at 11:14 AM, Harsha <st...@harsha.io> wrote:

>
> Clay,
>      When you are using storm-hdfs connector you need to package
>      core-site.xml and hdfs-site.xml form you cluster into your topology
>      jar . You can configure the storm-hdfs bolt to pass nameserviceID
>
> HdfsBolt bolt = new HdfsBolt()
>            .withFsURL("hdfs://myNameserviceID")
>            .withFileNameFormat(fileNameformat)
>            .withRecordFormat(format)
>            .withRotationPolicy(rotationPolicy)
>            .withSynPolicy(syncPolicy);
>
> The above is all that needed to use namenode HA with your storm-hdfs.
>
> -Harsha
>
> On Thu, Feb 19, 2015, at 08:58 AM, Bobby Evans wrote:
> > Hadoop has lots of different configurations in core-site.xml,
> > hdfs-site.xml, ... all of which eventually get loaded into the
> > Configuration object used to create a FileSystem instance.  There are so
> > many different configurations related to security, HA, etc. that it is
> > almost impossible for me to guess exactly which ones you need to have set
> > correctly to make this work.  Typically what we do for storm to be able
> > to talk to HDFS is to package the complete set of configs that appear on
> > a Hadoop Gateway with the topology jar when it is shipped.  This
> > guarantees that the config is the same as on the gateway and should
> > behave the same way.  You can also grab them from the name node or any of
> > the hadoop compute nodes.
> >  This will work for the HdfsBolt that loads default configurations from
> the classpath before overriding them with any custom configurations you set
> for that bolt.
> >
> > - Bobby
> >
> >
> >      On Thursday, February 19, 2015 10:42 AM, clay teahouse
> >      <cl...@gmail.com> wrote:
> >
> >
> >  Bobby,What do you mean by client here? In this context, do you consider
> >  hdfsbolt a client? If yes, then which configuration you are referring
> >  to? I've seen the following, but I am not sure if I follow.
> >
> >    - dfs.client.failover.proxy.provider.[nameservice ID] - the Java class
> >    that HDFS clients use to contact the Active NameNodeConfigure the name
> >    of the Java class which will be used by the DFS Client to determine
> >    which NameNode is the current Active, and therefore which NameNode is
> >    currently serving client requests. The only implementation which
> >    currently ships with Hadoop is the ConfiguredFailoverProxyProvider, so
> >    use this unless you are using a custom one. For example:   <property>
> >   <name>dfs.client.failover.proxy.provider.mycluster</name>
> >
>  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> > </property>
> > thanks,Clay
> >
> > On Thu, Feb 19, 2015 at 8:38 AM, Bobby Evans
> > <ev...@yahoo-inc.com.invalid> wrote:
> >
> > HDFS HA provides fail-over for the name node and the client determines
> > which name node is the active one but should be completely transparent to
> > you if the client is configured correctly.
> >  - Bobby
> >
> >
> >      On Thursday, February 19, 2015 6:47 AM, clay teahouse <
> clayteahouse@gmail.com> wrote:
> >
> >
> >  Hi All,
> > Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
> > which hdfs node is the active node?
> >
> > thanks
> > Clay
> >
> >
> >
> >
> >
> >
> >
>

Re: HdfsBolt and hdfs in HA mode

Posted by Harsha <st...@harsha.io>.
Clay,
     When you are using storm-hdfs connector you need to package
     core-site.xml and hdfs-site.xml form you cluster into your topology
     jar . You can configure the storm-hdfs bolt to pass nameserviceID

HdfsBolt bolt = new HdfsBolt()
           .withFsURL("hdfs://myNameserviceID")
           .withFileNameFormat(fileNameformat)
           .withRecordFormat(format)
           .withRotationPolicy(rotationPolicy)
           .withSynPolicy(syncPolicy);

The above is all that needed to use namenode HA with your storm-hdfs. 

-Harsha

On Thu, Feb 19, 2015, at 08:58 AM, Bobby Evans wrote:
> Hadoop has lots of different configurations in core-site.xml,
> hdfs-site.xml, ... all of which eventually get loaded into the
> Configuration object used to create a FileSystem instance.  There are so
> many different configurations related to security, HA, etc. that it is
> almost impossible for me to guess exactly which ones you need to have set
> correctly to make this work.  Typically what we do for storm to be able
> to talk to HDFS is to package the complete set of configs that appear on
> a Hadoop Gateway with the topology jar when it is shipped.  This
> guarantees that the config is the same as on the gateway and should
> behave the same way.  You can also grab them from the name node or any of
> the hadoop compute nodes. 
>  This will work for the HdfsBolt that loads default configurations from the classpath before overriding them with any custom configurations you set for that bolt.
> 
> - Bobby
>  
> 
>      On Thursday, February 19, 2015 10:42 AM, clay teahouse
>      <cl...@gmail.com> wrote:
>    
> 
>  Bobby,What do you mean by client here? In this context, do you consider
>  hdfsbolt a client? If yes, then which configuration you are referring
>  to? I've seen the following, but I am not sure if I follow.
>    
>    - dfs.client.failover.proxy.provider.[nameservice ID] - the Java class
>    that HDFS clients use to contact the Active NameNodeConfigure the name
>    of the Java class which will be used by the DFS Client to determine
>    which NameNode is the current Active, and therefore which NameNode is
>    currently serving client requests. The only implementation which
>    currently ships with Hadoop is the ConfiguredFailoverProxyProvider, so
>    use this unless you are using a custom one. For example:   <property>
>   <name>dfs.client.failover.proxy.provider.mycluster</name>
>   <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> thanks,Clay
> 
> On Thu, Feb 19, 2015 at 8:38 AM, Bobby Evans
> <ev...@yahoo-inc.com.invalid> wrote:
> 
> HDFS HA provides fail-over for the name node and the client determines
> which name node is the active one but should be completely transparent to
> you if the client is configured correctly.
>  - Bobby
> 
> 
>      On Thursday, February 19, 2015 6:47 AM, clay teahouse <cl...@gmail.com> wrote:
> 
> 
>  Hi All,
> Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
> which hdfs node is the active node?
> 
> thanks
> Clay
> 
> 
>    
> 
> 
> 
>    

Re: HdfsBolt and hdfs in HA mode

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.
Hadoop has lots of different configurations in core-site.xml, hdfs-site.xml, ... all of which eventually get loaded into the Configuration object used to create a FileSystem instance.  There are so many different configurations related to security, HA, etc. that it is almost impossible for me to guess exactly which ones you need to have set correctly to make this work.  Typically what we do for storm to be able to talk to HDFS is to package the complete set of configs that appear on a Hadoop Gateway with the topology jar when it is shipped.  This guarantees that the config is the same as on the gateway and should behave the same way.  You can also grab them from the name node or any of the hadoop compute nodes. 
 This will work for the HdfsBolt that loads default configurations from the classpath before overriding them with any custom configurations you set for that bolt.

- Bobby
 

     On Thursday, February 19, 2015 10:42 AM, clay teahouse <cl...@gmail.com> wrote:
   

 Bobby,What do you mean by client here? In this context, do you consider hdfsbolt a client? If yes, then which configuration you are referring to? I've seen the following, but I am not sure if I follow.
   
   - dfs.client.failover.proxy.provider.[nameservice ID] - the Java class that HDFS clients use to contact the Active NameNodeConfigure the name of the Java class which will be used by the DFS Client to determine which NameNode is the current Active, and therefore which NameNode is currently serving client requests. The only implementation which currently ships with Hadoop is the ConfiguredFailoverProxyProvider, so use this unless you are using a custom one. For example:   <property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
thanks,Clay

On Thu, Feb 19, 2015 at 8:38 AM, Bobby Evans <ev...@yahoo-inc.com.invalid> wrote:

HDFS HA provides fail-over for the name node and the client determines which name node is the active one but should be completely transparent to you if the client is configured correctly.
 - Bobby


     On Thursday, February 19, 2015 6:47 AM, clay teahouse <cl...@gmail.com> wrote:


 Hi All,
Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
which hdfs node is the active node?

thanks
Clay


   



   

Re: HdfsBolt and hdfs in HA mode

Posted by clay teahouse <cl...@gmail.com>.
Bobby,
What do you mean by client here? In this context, do you consider hdfsbolt
a client? If yes, then which configuration you are referring to? I've seen
the following, but I am not sure if I follow.


   - *dfs.client.failover.proxy.provider.[nameservice ID]* - the Java class
   that HDFS clients use to contact the Active NameNode

   Configure the name of the Java class which will be used by the DFS
   Client to determine which NameNode is the current Active, and therefore
   which NameNode is currently serving client requests. The only
   implementation which currently ships with Hadoop is the
   *ConfiguredFailoverProxyProvider*, so use this unless you are using a
   custom one. For example:

   <property>
     <name>dfs.client.failover.proxy.provider.mycluster</name>
     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
   </property>


thanks,
Clay


On Thu, Feb 19, 2015 at 8:38 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> HDFS HA provides fail-over for the name node and the client determines
> which name node is the active one but should be completely transparent to
> you if the client is configured correctly.
>  - Bobby
>
>
>      On Thursday, February 19, 2015 6:47 AM, clay teahouse <
> clayteahouse@gmail.com> wrote:
>
>
>  Hi All,
> Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
> which hdfs node is the active node?
>
> thanks
> Clay
>
>
>
>

Re: HdfsBolt and hdfs in HA mode

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.
HDFS HA provides fail-over for the name node and the client determines which name node is the active one but should be completely transparent to you if the client is configured correctly.
 - Bobby
 

     On Thursday, February 19, 2015 6:47 AM, clay teahouse <cl...@gmail.com> wrote:
   

 Hi All,
Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
which hdfs node is the active node?

thanks
Clay


   

Re: HdfsBolt and hdfs in HA mode

Posted by Bobby Evans <ev...@yahoo-inc.com>.
HDFS HA provides fail-over for the name node and the client determines which name node is the active one but should be completely transparent to you if the client is configured correctly.
 - Bobby
 

     On Thursday, February 19, 2015 6:47 AM, clay teahouse <cl...@gmail.com> wrote:
   

 Hi All,
Has anyone used HdfsBolt with hdfs in HA mode? How would you determine
which hdfs node is the active node?

thanks
Clay