You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Raja.Aravapalli" <Ra...@target.com> on 2016/03/04 09:16:44 UTC

storm-hdfs - hdfs bolt failing after 24hrs

Hi,

My storm topology which reads from kafka and writes to hadoop hdfs is failing exactly after 24hrs!!

I suspect the problem is, topology was not able to renew the tokens/not finding the keytabs to renew. Please share your thoughts and help me fix the issue.


Please find the code used to configure hdfs bolt..

Config object:
===========

//building a 'map' with hdfs related configuration for key tab
Map<String, Object> hdfsSecConfigMap = new HashMap<String, Object>();

hdfsSecConfigMap.put("hdfs.keytab.file", ktPath);
hdfsSecConfigMap.put("hdfs.kerberos.principal", ktPrincipal);

//building a 'map' with hbase related configuration
Map<String, Object> hbaseConfigMap = new HashMap<String, Object>();
hbaseConfigMap.put("hbase.rootdir", hbaseRootDir);
hbaseConfigMap.put("storm.keytab.file", ktPath);
hbaseConfigMap.put("storm.kerberos.principal", ktPrincipal);

Config configured = new Config();
configured.setDebug(true);
configured.put(hdfsConfKey, hdfsSecConfigMap);
configured.put(hbaseConfKey, hbaseConfigMap);
configured.setNumWorkers(2);
configured.setMaxSpoutPending(300);
configured.setNumAckers(30);
configured.setMessageTimeoutSecs(1200);

configured.put(HdfsSecurityUtil.STORM_KEYTAB_FILE_KEY, ktPath);
configured.put(HdfsSecurityUtil.STORM_USER_NAME_KEY, ktPrincipal);

configured.put(HBaseSecurityUtil.STORM_KEYTAB_FILE_KEY, ktPath);
configured.put(HBaseSecurityUtil.STORM_USER_NAME_KEY, ktPrincipal);

=======Retrieving hdfs bolt

HdfsBolt hdfsbolt = new HdfsBolt()
        .withFsUrl(hdfsuri)
        .withRecordFormat(recFormat)
        .withFileNameFormat(fileNameWithPath)
        .withRotationPolicy(fileRotationSize)
        .withSyncPolicy(syncPolicy)
        .withConfigKey(secBypassConfigKey);

TopologyBuilder setup below:

builder.setBolt("hdfsBolt", avroHDFSBolt, 1)
        .setNumTasks(1)
        .shuffleGrouping("kafka-spout");

Exception facing is below:


ava.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "**********"; destination host is: "***************":8020;

        at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2082) ~[stormjar.jar:?]

        at org.apache.hadoop.hdfs.DFSOutputStream.hsync(DFSOutputStream.java:1969) ~[stormjar.jar:?]

        at org.apache.hadoop.hdfs.client.HdfsDataOutputStream.hsync(HdfsDataOutputStream.java:95) ~[stormjar.jar:?]

        at org.apache.storm.hdfs.bolt.HdfsBolt.execute(HdfsBolt.java:100) [stormjar.jar:?]

        at backtype.storm.daemon.executor$fn__3697$tuple_action_fn__3699.invoke(executor.clj:670) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at backtype.storm.daemon.executor$mk_task_receiver$fn__3620.invoke(executor.clj:426) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at backtype.storm.disruptor$clojure_handler$reify__3196.onEvent(disruptor.clj:58) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at backtype.storm.daemon.executor$fn__3697$fn__3710$fn__3761.invoke(executor.clj:808) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at backtype.storm.util$async_loop$fn__544.invoke(util.clj:475) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]

        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]

        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]


Thanks a lot in advance for your valuable thoughts.


Regards,
Raja Aravapalli.