You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Ed Judge <ej...@gmail.com> on 2014/09/29 16:38:17 UTC

HDFS sink to a remote HDFS node

I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/


Thanks,
Ed

Re: HDFS sink to a remote HDFS node

Posted by Ed Judge <ej...@gmail.com>.

Ok, I pulled over all of the hadoop jar files.  Now I am seeing this:

0 Sep 2014 19:39:26,973 INFO  [Twitter Stream consumer-1[initializing]] (twitter4j.internal.logging.SLF4JLogger.info:83)  - Establishing connection.
30 Sep 2014 19:39:28,204 INFO  [Twitter Stream consumer-1[Establishing connection]] (twitter4j.internal.logging.SLF4JLogger.info:83)  - Connection established.
30 Sep 2014 19:39:28,205 INFO  [Twitter Stream consumer-1[Establishing connection]] (twitter4j.internal.logging.SLF4JLogger.info:83)  - Receiving status stream.
30 Sep 2014 19:39:28,442 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - Serializer = TEXT, UseRawLocalFileSystem = false
30 Sep 2014 19:39:28,591 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://10.0.0.14/tmp//twitter.1412105968443.ds.tmp
30 Sep 2014 19:39:28,690 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:467)  - process failed
java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
	at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)
	at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2365)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:270)
	at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:262)
	at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:718)
	at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183)
	at org.apache.flume.sink.hdfs.BucketWriter.access$1700(BucketWriter.java:59)
	at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:715)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Is there something misconfigured on my hadoop node?

Thanks.

On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com> wrote:

> You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.
> 
> Thanks,
> Hari
> 
> 
> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
> 
> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
> 
> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
> 
> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	... 17 more
> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
> [vagrant@localhost 6]$ 
> 
> Is there another jar file I need?
> 
> Thanks.
> 
> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
> 
>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>  
>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>  
>>  
>>  
>>  
>> 2014-09-30
>> shengyi.pan
>> 发件人：Ed Judge <ej...@gmail.com>
>> 发送时间：2014-09-29 22:38
>> 主题：HDFS sink to a remote HDFS node
>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>> 抄送：
>>  
>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>> 
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>> 
>> 
>> Thanks,
>> Ed
> 
>

Re: can flume-ng keep header info when sink to remote hosts

Posted by Ashish <pa...@gmail.com>.

Flume does preserve the Header properties. Can you share more details?

On Sat, Oct 4, 2014 at 2:34 PM, wangyi@testbird.com <wa...@testbird.com>
wrote:

> Hi All,
>     I use flume-ng httpsource collect logs then buffed into memory at last
> post to remote host use avro sink
> remote host create an avro source.
>
>     My question is can flume keep header properties when sink to remote
> avro.
>
>
> ------------------------------
> 公司：TestBird
> QQ：2741334465
> Email: wangyi@testbird.com
> 地址：成都市高新区天府软件园C8-3#  邮编：610041
> 网址：http://www.testbird.com
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: can flume-ng keep header info when sink to remote hosts

Posted by terreyshih <te...@gmail.com>.

Can you not just use the event.getHeades() method to get the header Map ?

thx

On Oct 4, 2014, at 2:04 AM, wangyi@testbird.com wrote:

> Hi All,
>     I use flume-ng httpsource collect logs then buffed into memory at last post to remote host use avro sink
> remote host create an avro source.
> 
>     My question is can flume keep header properties when sink to remote avro.
> 
> 
> 公司：TestBird  
> QQ：2741334465
> Email: wangyi@testbird.com
> 地址：成都市高新区天府软件园C8-3#  邮编：610041
> 网址：http://www.testbird.com

can flume-ng keep header info when sink to remote hosts

Posted by "wangyi@testbird.com" <wa...@testbird.com>.






Hi All,    I use flume-ng httpsource collect logs then buffed into memory at last post to remote host use avro sinkremote host create an avro source.
    My question is can flume keep header properties when sink to remote avro.



公司：TestBird  
QQ：2741334465
Email: wangyi@testbird.com
地址：成都市高新区天府软件园C8-3#  邮编：610041
网址：http://www.testbird.com

Re: HDFS sink to a remote HDFS node

Posted by Ed Judge <ej...@gmail.com>.

This is more of a proof of concept configuration.  I took the hadoop install from here:

http://tecadmin.net/setup-hadoop-2-4-single-node-cluster-on-linux/

All I really want to do is run flume on one node and have it write to a remote HDFS filesystem.  I went down the path of installing hadoop on 2 Linux instances (calling it a “node” sounds ambiguous) thinking it might help.
With the current simple configuration (hadoop installed on 2 Linux instances), flume can write to the local Linux instance HDFS but not to the remote Linux instance HDFS.  I was assuming the later was possible since I’ve seen examples such as:

a1.sinks.k1.hdfs.path = hdfs://<IP Adress>:9000/tmp/

Has anyone documented how to set up this “remote" HDFS flume configuration (i.e. copying of jar files or anything else involved)?  

Thanks,
-Ed


On Oct 1, 2014, at 3:41 PM, Hari Shreedharan <hs...@cloudera.com> wrote:

> What are you trying to accomplish here? You’d really need to have more nodes in your HDFS cluster to test anything realistic. I am not sure what happens when the cluster is really just 1 datanode - not sure the replication code within HDFS really handles that case.
> 
> Thanks,
> Hari
> 
> 
> On Wed, Oct 1, 2014 at 6:04 AM, Ed Judge <ej...@gmail.com> wrote:
> 
> Looks like they are up.  I see the following on one of the nodes but both look generally the same (1 live datanode).
> 
> [hadoop@localhost bin]$ hdfs dfsadmin -report
> 14/10/01 12:51:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Configured Capacity: 40797364224 (38.00 GB)
> Present Capacity: 37030862848 (34.49 GB)
> DFS Remaining: 37030830080 (34.49 GB)
> DFS Used: 32768 (32 KB)
> DFS Used%: 0.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> 
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
> 
> Live datanodes:
> Name: 127.0.0.1:50010 (localhost)
> Hostname: localhost
> Decommission Status : Normal
> Configured Capacity: 40797364224 (38.00 GB)
> DFS Used: 32768 (32 KB)
> Non DFS Used: 3766501376 (3.51 GB)
> DFS Remaining: 37030830080 (34.49 GB)
> DFS Used%: 0.00%
> DFS Remaining%: 90.77%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Last contact: Wed Oct 01 12:51:57 UTC 2014
> 
> 
> I don’t know how to demonstrate that they are accessible except to telnet into each of them.  Right now that test shows that both nodes accept the connection to port 50010.
> Is there some other test I can perform?
> 
> Thanks,
> -Ed
> 
> On Oct 1, 2014, at 12:31 AM, Hari Shreedharan <hs...@cloudera.com> wrote:
> 
>> Looks like one data node is inaccessible or down - so the HDFS client has black listed it and the writes are failing as blocks are allocated to that one.
>> 
>> Thanks,
>> Hari
>> 
>> 
>> On Tue, Sep 30, 2014 at 7:33 PM, Ed Judge <ej...@gmail.com> wrote:
>> 
>> I’ve pulled over all of the Hadoop jar files for my flume instance to use.  I am seeing some slightly different errors now.  Basically I have 2 identically configured hadoop instances on the same subnet.  Running flume on those same instances and pointing flume at the local hadoop/hdfs instance works fine and the files get written.  However, when I point it to the adjacent hadoop/hdfs instance I get many exceptions/errors (show below) and the files never get written.  Here is my HDFS sink configuration on 10.0.0.14:
>> 
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://10.0.0.16:9000/tmp/
>> a1.sinks.k1.hdfs.filePrefix = twitter
>> a1.sinks.k1.hdfs.fileSuffix = .ds
>> a1.sinks.k1.hdfs.rollInterval = 0
>> a1.sinks.k1.hdfs.rollSize = 10
>> a1.sinks.k1.hdfs.rollCount = 0
>> a1.sinks.k1.hdfs.fileType = DataStream
>> #a1.sinks.k1.serializer = TEXT
>> a1.sinks.k1.channel = c1
>> 
>> Any idea why this is not working?
>> 
>> Thanks.
>> 
>> 01 Oct 2014 01:59:45,098 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - Serializer = TEXT, UseRawLocalFileSystem = false
>> 01 Oct 2014 01:59:45,385 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://10.0.0.16:9000/tmp//twitter.1412128785099.ds.tmp
>> 01 Oct 2014 01:59:45,997 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 100 docs
>> 01 Oct 2014 01:59:47,754 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 200 docs
>> 01 Oct 2014 01:59:49,379 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream:1378)  - Exception in createBlockOutputStream
>> java.io.EOFException: Premature EOF: no length prefix available
>> 	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,390 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1275)  - Abandoning BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751
>> 01 Oct 2014 01:59:49,398 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1278)  - Excluding datanode 127.0.0.1:50010
>> 01 Oct 2014 01:59:49,431 WARN  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627)  - DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
>> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> 
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 	at java.lang.reflect.Method.invoke(Method.java:606)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,437 WARN  [hdfs-k1-call-runner-2] (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950)  - Error while syncing
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
>> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> 
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 	at java.lang.reflect.Method.invoke(Method.java:606)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,439 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
>> 
>> On Sep 30, 2014, at 3:18 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> 
>>> You'd need to add the jars that hadoop itself depends on. Flume pulls it in if Hadoop is installed on that machine, else you'd need to manually download it and install it. If you are using Hadoop 2.x, install the RPM provided by Bigtop.
>>> 
>>> On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge <ej...@gmail.com> wrote:
>>> I added commons-configuration and there is now another missing dependency.  What do you mean by “all of Hadoop’s dependencies”?
>>> 
>>> 
>>> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>>> 
>>>> You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.
>>>> 
>>>> Thanks,
>>>> Hari
>>>> 
>>>> 
>>>> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
>>>> 
>>>> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
>>>> 
>>>> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
>>>> 
>>>> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
>>>> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
>>>> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
>>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
>>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
>>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
>>>> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
>>>> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
>>>> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
>>>> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
>>>> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
>>>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
>>>> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
>>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
>>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
>>>> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
>>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
>>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
>>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
>>>> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>>>> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
>>>> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
>>>> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>>>> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
>>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>> 	... 17 more
>>>> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
>>>> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
>>>> [vagrant@localhost 6]$ 
>>>> 
>>>> Is there another jar file I need?
>>>> 
>>>> Thanks.
>>>> 
>>>> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
>>>> 
>>>>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>>>>  
>>>>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> 2014-09-30
>>>>> shengyi.pan
>>>>> 发件人：Ed Judge <ej...@gmail.com>
>>>>> 发送时间：2014-09-29 22:38
>>>>> 主题：HDFS sink to a remote HDFS node
>>>>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>>>>> 抄送：
>>>>>  
>>>>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>>>>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>>>>> 
>>>>> # Describe the sink
>>>>> a1.sinks.k1.type = hdfs
>>>>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Ed
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: HDFS sink to a remote HDFS node

Posted by Hari Shreedharan <hs...@cloudera.com>.

What are you trying to accomplish here? You’d really need to have more nodes in your HDFS cluster to test anything realistic. I am not sure what happens when the cluster is really just 1 datanode - not sure the replication code within HDFS really handles that case.


Thanks,
Hari

On Wed, Oct 1, 2014 at 6:04 AM, Ed Judge <ej...@gmail.com> wrote:

> Looks like they are up.  I see the following on one of the nodes but both look generally the same (1 live datanode).
> [hadoop@localhost bin]$ hdfs dfsadmin -report
> 14/10/01 12:51:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Configured Capacity: 40797364224 (38.00 GB)
> Present Capacity: 37030862848 (34.49 GB)
> DFS Remaining: 37030830080 (34.49 GB)
> DFS Used: 32768 (32 KB)
> DFS Used%: 0.00%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
> Live datanodes:
> Name: 127.0.0.1:50010 (localhost)
> Hostname: localhost
> Decommission Status : Normal
> Configured Capacity: 40797364224 (38.00 GB)
> DFS Used: 32768 (32 KB)
> Non DFS Used: 3766501376 (3.51 GB)
> DFS Remaining: 37030830080 (34.49 GB)
> DFS Used%: 0.00%
> DFS Remaining%: 90.77%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Last contact: Wed Oct 01 12:51:57 UTC 2014
> I don’t know how to demonstrate that they are accessible except to telnet into each of them.  Right now that test shows that both nodes accept the connection to port 50010.
> Is there some other test I can perform?
> Thanks,
> -Ed
> On Oct 1, 2014, at 12:31 AM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> Looks like one data node is inaccessible or down - so the HDFS client has black listed it and the writes are failing as blocks are allocated to that one.
>> 
>> Thanks,
>> Hari
>> 
>> 
>> On Tue, Sep 30, 2014 at 7:33 PM, Ed Judge <ej...@gmail.com> wrote:
>> 
>> I’ve pulled over all of the Hadoop jar files for my flume instance to use.  I am seeing some slightly different errors now.  Basically I have 2 identically configured hadoop instances on the same subnet.  Running flume on those same instances and pointing flume at the local hadoop/hdfs instance works fine and the files get written.  However, when I point it to the adjacent hadoop/hdfs instance I get many exceptions/errors (show below) and the files never get written.  Here is my HDFS sink configuration on 10.0.0.14:
>> 
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://10.0.0.16:9000/tmp/
>> a1.sinks.k1.hdfs.filePrefix = twitter
>> a1.sinks.k1.hdfs.fileSuffix = .ds
>> a1.sinks.k1.hdfs.rollInterval = 0
>> a1.sinks.k1.hdfs.rollSize = 10
>> a1.sinks.k1.hdfs.rollCount = 0
>> a1.sinks.k1.hdfs.fileType = DataStream
>> #a1.sinks.k1.serializer = TEXT
>> a1.sinks.k1.channel = c1
>> 
>> Any idea why this is not working?
>> 
>> Thanks.
>> 
>> 01 Oct 2014 01:59:45,098 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - Serializer = TEXT, UseRawLocalFileSystem = false
>> 01 Oct 2014 01:59:45,385 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://10.0.0.16:9000/tmp//twitter.1412128785099.ds.tmp
>> 01 Oct 2014 01:59:45,997 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 100 docs
>> 01 Oct 2014 01:59:47,754 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 200 docs
>> 01 Oct 2014 01:59:49,379 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream:1378)  - Exception in createBlockOutputStream
>> java.io.EOFException: Premature EOF: no length prefix available
>> 	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,390 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1275)  - Abandoning BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751
>> 01 Oct 2014 01:59:49,398 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1278)  - Excluding datanode 127.0.0.1:50010
>> 01 Oct 2014 01:59:49,431 WARN  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627)  - DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
>> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> 
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 	at java.lang.reflect.Method.invoke(Method.java:606)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,437 WARN  [hdfs-k1-call-runner-2] (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950)  - Error while syncing
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
>> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> 
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 	at java.lang.reflect.Method.invoke(Method.java:606)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
>> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>> 01 Oct 2014 01:59:49,439 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
>> 
>> On Sep 30, 2014, at 3:18 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> 
>>> You'd need to add the jars that hadoop itself depends on. Flume pulls it in if Hadoop is installed on that machine, else you'd need to manually download it and install it. If you are using Hadoop 2.x, install the RPM provided by Bigtop.
>>> 
>>> On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge <ej...@gmail.com> wrote:
>>> I added commons-configuration and there is now another missing dependency.  What do you mean by “all of Hadoop’s dependencies”?
>>> 
>>> 
>>> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>>> 
>>>> You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.
>>>> 
>>>> Thanks,
>>>> Hari
>>>> 
>>>> 
>>>> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
>>>> 
>>>> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
>>>> 
>>>> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
>>>> 
>>>> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
>>>> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
>>>> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
>>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
>>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
>>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
>>>> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
>>>> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
>>>> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
>>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
>>>> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
>>>> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
>>>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
>>>> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
>>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
>>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
>>>> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
>>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
>>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
>>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
>>>> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>>>> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
>>>> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
>>>> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>>>> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
>>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>> 	... 17 more
>>>> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
>>>> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
>>>> [vagrant@localhost 6]$ 
>>>> 
>>>> Is there another jar file I need?
>>>> 
>>>> Thanks.
>>>> 
>>>> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
>>>> 
>>>>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>>>>  
>>>>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> 2014-09-30
>>>>> shengyi.pan
>>>>> 发件人：Ed Judge <ej...@gmail.com>
>>>>> 发送时间：2014-09-29 22:38
>>>>> 主题：HDFS sink to a remote HDFS node
>>>>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>>>>> 抄送：
>>>>>  
>>>>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>>>>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>>>>> 
>>>>> # Describe the sink
>>>>> a1.sinks.k1.type = hdfs
>>>>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Ed
>>>> 
>>>> 
>>> 
>>> 
>> 
>>

Re: HDFS sink to a remote HDFS node

Posted by Ed Judge <ej...@gmail.com>.

Looks like they are up.  I see the following on one of the nodes but both look generally the same (1 live datanode).

[hadoop@localhost bin]$ hdfs dfsadmin -report
14/10/01 12:51:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 40797364224 (38.00 GB)
Present Capacity: 37030862848 (34.49 GB)
DFS Remaining: 37030830080 (34.49 GB)
DFS Used: 32768 (32 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 40797364224 (38.00 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 3766501376 (3.51 GB)
DFS Remaining: 37030830080 (34.49 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Wed Oct 01 12:51:57 UTC 2014


I don’t know how to demonstrate that they are accessible except to telnet into each of them.  Right now that test shows that both nodes accept the connection to port 50010.
Is there some other test I can perform?

Thanks,
-Ed

On Oct 1, 2014, at 12:31 AM, Hari Shreedharan <hs...@cloudera.com> wrote:

> Looks like one data node is inaccessible or down - so the HDFS client has black listed it and the writes are failing as blocks are allocated to that one.
> 
> Thanks,
> Hari
> 
> 
> On Tue, Sep 30, 2014 at 7:33 PM, Ed Judge <ej...@gmail.com> wrote:
> 
> I’ve pulled over all of the Hadoop jar files for my flume instance to use.  I am seeing some slightly different errors now.  Basically I have 2 identically configured hadoop instances on the same subnet.  Running flume on those same instances and pointing flume at the local hadoop/hdfs instance works fine and the files get written.  However, when I point it to the adjacent hadoop/hdfs instance I get many exceptions/errors (show below) and the files never get written.  Here is my HDFS sink configuration on 10.0.0.14:
> 
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.path = hdfs://10.0.0.16:9000/tmp/
> a1.sinks.k1.hdfs.filePrefix = twitter
> a1.sinks.k1.hdfs.fileSuffix = .ds
> a1.sinks.k1.hdfs.rollInterval = 0
> a1.sinks.k1.hdfs.rollSize = 10
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.fileType = DataStream
> #a1.sinks.k1.serializer = TEXT
> a1.sinks.k1.channel = c1
> 
> Any idea why this is not working?
> 
> Thanks.
> 
> 01 Oct 2014 01:59:45,098 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - Serializer = TEXT, UseRawLocalFileSystem = false
> 01 Oct 2014 01:59:45,385 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://10.0.0.16:9000/tmp//twitter.1412128785099.ds.tmp
> 01 Oct 2014 01:59:45,997 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 100 docs
> 01 Oct 2014 01:59:47,754 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 200 docs
> 01 Oct 2014 01:59:49,379 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream:1378)  - Exception in createBlockOutputStream
> java.io.EOFException: Premature EOF: no length prefix available
> 	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> 01 Oct 2014 01:59:49,390 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1275)  - Abandoning BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751
> 01 Oct 2014 01:59:49,398 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1278)  - Excluding datanode 127.0.0.1:50010
> 01 Oct 2014 01:59:49,431 WARN  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627)  - DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> 
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> 01 Oct 2014 01:59:49,437 WARN  [hdfs-k1-call-runner-2] (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950)  - Error while syncing
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> 
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> 01 Oct 2014 01:59:49,439 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> 
> On Sep 30, 2014, at 3:18 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
> 
>> You'd need to add the jars that hadoop itself depends on. Flume pulls it in if Hadoop is installed on that machine, else you'd need to manually download it and install it. If you are using Hadoop 2.x, install the RPM provided by Bigtop.
>> 
>> On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge <ej...@gmail.com> wrote:
>> I added commons-configuration and there is now another missing dependency.  What do you mean by “all of Hadoop’s dependencies”?
>> 
>> 
>> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> 
>>> You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.
>>> 
>>> Thanks,
>>> Hari
>>> 
>>> 
>>> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
>>> 
>>> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
>>> 
>>> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
>>> 
>>> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
>>> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
>>> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
>>> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
>>> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
>>> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
>>> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
>>> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
>>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
>>> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
>>> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
>>> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>>> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
>>> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
>>> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>>> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> 	at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> 	... 17 more
>>> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
>>> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
>>> [vagrant@localhost 6]$ 
>>> 
>>> Is there another jar file I need?
>>> 
>>> Thanks.
>>> 
>>> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
>>> 
>>>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>>>  
>>>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>>>  
>>>>  
>>>>  
>>>>  
>>>> 2014-09-30
>>>> shengyi.pan
>>>> 发件人：Ed Judge <ej...@gmail.com>
>>>> 发送时间：2014-09-29 22:38
>>>> 主题：HDFS sink to a remote HDFS node
>>>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>>>> 抄送：
>>>>  
>>>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>>>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>>>> 
>>>> # Describe the sink
>>>> a1.sinks.k1.type = hdfs
>>>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>>> 
>>>> 
>>>> Thanks,
>>>> Ed
>>> 
>>> 
>> 
>> 
> 
>

Re: HDFS sink to a remote HDFS node

Posted by Hari Shreedharan <hs...@cloudera.com>.

Looks like one data node is inaccessible or down - so the HDFS client has black listed it and the writes are failing as blocks are allocated to that one.


Thanks,
Hari

On Tue, Sep 30, 2014 at 7:33 PM, Ed Judge <ej...@gmail.com> wrote:

> I’ve pulled over all of the Hadoop jar files for my flume instance to use.  I am seeing some slightly different errors now.  Basically I have 2 identically configured hadoop instances on the same subnet.  Running flume on those same instances and pointing flume at the local hadoop/hdfs instance works fine and the files get written.  However, when I point it to the adjacent hadoop/hdfs instance I get many exceptions/errors (show below) and the files never get written.  Here is my HDFS sink configuration on 10.0.0.14:
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.path = hdfs://10.0.0.16:9000/tmp/
> a1.sinks.k1.hdfs.filePrefix = twitter
> a1.sinks.k1.hdfs.fileSuffix = .ds
> a1.sinks.k1.hdfs.rollInterval = 0
> a1.sinks.k1.hdfs.rollSize = 10
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.fileType = DataStream
> #a1.sinks.k1.serializer = TEXT
> a1.sinks.k1.channel = c1
> Any idea why this is not working?
> Thanks.
> 01 Oct 2014 01:59:45,098 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - Serializer = TEXT, UseRawLocalFileSystem = false
> 01 Oct 2014 01:59:45,385 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://10.0.0.16:9000/tmp//twitter.1412128785099.ds.tmp
> 01 Oct 2014 01:59:45,997 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 100 docs
> 01 Oct 2014 01:59:47,754 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 200 docs
> 01 Oct 2014 01:59:49,379 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream:1378)  - Exception in createBlockOutputStream
> java.io.EOFException: Premature EOF: no length prefix available
> 	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> 01 Oct 2014 01:59:49,390 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1275)  - Abandoning BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751
> 01 Oct 2014 01:59:49,398 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1278)  - Excluding datanode 127.0.0.1:50010
> 01 Oct 2014 01:59:49,431 WARN  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627)  - DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> 01 Oct 2014 01:59:49,437 WARN  [hdfs-k1-call-runner-2] (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950)  - Error while syncing
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> 	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
> 01 Oct 2014 01:59:49,439 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> On Sep 30, 2014, at 3:18 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> You'd need to add the jars that hadoop itself depends on. Flume pulls it in if Hadoop is installed on that machine, else you'd need to manually download it and install it. If you are using Hadoop 2.x, install the RPM provided by Bigtop.
>> 
>> On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge <ej...@gmail.com> wrote:
>> I added commons-configuration and there is now another missing dependency.  What do you mean by “all of Hadoop’s dependencies”?
>> 
>> 
>> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> 
>>> You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.
>>> 
>>> Thanks,
>>> Hari
>>> 
>>> 
>>> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
>>> 
>>> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
>>> 
>>> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
>>> 
>>> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
>>> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
>>> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
>>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
>>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
>>> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
>>> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
>>> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
>>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
>>> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
>>> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
>>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
>>> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
>>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
>>> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
>>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
>>> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>>> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
>>> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
>>> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>>> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> 	at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> 	... 17 more
>>> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
>>> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
>>> [vagrant@localhost 6]$ 
>>> 
>>> Is there another jar file I need?
>>> 
>>> Thanks.
>>> 
>>> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
>>> 
>>>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>>>  
>>>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>>>  
>>>>  
>>>>  
>>>>  
>>>> 2014-09-30
>>>> shengyi.pan
>>>> 发件人：Ed Judge <ej...@gmail.com>
>>>> 发送时间：2014-09-29 22:38
>>>> 主题：HDFS sink to a remote HDFS node
>>>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>>>> 抄送：
>>>>  
>>>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>>>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>>>> 
>>>> # Describe the sink
>>>> a1.sinks.k1.type = hdfs
>>>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>>> 
>>>> 
>>>> Thanks,
>>>> Ed
>>> 
>>> 
>> 
>>

Re: HDFS sink to a remote HDFS node

Posted by Ed Judge <ej...@gmail.com>.

I’ve pulled over all of the Hadoop jar files for my flume instance to use.  I am seeing some slightly different errors now.  Basically I have 2 identically configured hadoop instances on the same subnet.  Running flume on those same instances and pointing flume at the local hadoop/hdfs instance works fine and the files get written.  However, when I point it to the adjacent hadoop/hdfs instance I get many exceptions/errors (show below) and the files never get written.  Here is my HDFS sink configuration on 10.0.0.14:

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://10.0.0.16:9000/tmp/
a1.sinks.k1.hdfs.filePrefix = twitter
a1.sinks.k1.hdfs.fileSuffix = .ds
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 10
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = DataStream
#a1.sinks.k1.serializer = TEXT
a1.sinks.k1.channel = c1

Any idea why this is not working?

Thanks.

01 Oct 2014 01:59:45,098 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSDataStream.configure:58)  - Serializer = TEXT, UseRawLocalFileSystem = false
01 Oct 2014 01:59:45,385 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating hdfs://10.0.0.16:9000/tmp//twitter.1412128785099.ds.tmp
01 Oct 2014 01:59:45,997 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 100 docs
01 Oct 2014 01:59:47,754 INFO  [Twitter4J Async Dispatcher[0]] (org.apache.flume.source.twitter.TwitterSource.onStatus:178)  - Processed 200 docs
01 Oct 2014 01:59:49,379 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream:1378)  - Exception in createBlockOutputStream
java.io.EOFException: Premature EOF: no length prefix available
	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1272)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
01 Oct 2014 01:59:49,390 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1275)  - Abandoning BP-1768727495-127.0.0.1-1412117897373:blk_1073743575_2751
01 Oct 2014 01:59:49,398 INFO  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream:1278)  - Excluding datanode 127.0.0.1:50010
01 Oct 2014 01:59:49,431 WARN  [Thread-7] (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run:627)  - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
01 Oct 2014 01:59:49,437 WARN  [hdfs-k1-call-runner-2] (org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync:1950)  - Error while syncing
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
	at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
01 Oct 2014 01:59:49,439 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/twitter.1412128785099.ds.tmp could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

On Sep 30, 2014, at 3:18 PM, Hari Shreedharan <hs...@cloudera.com> wrote:

> You'd need to add the jars that hadoop itself depends on. Flume pulls it in if Hadoop is installed on that machine, else you'd need to manually download it and install it. If you are using Hadoop 2.x, install the RPM provided by Bigtop.
> 
> On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge <ej...@gmail.com> wrote:
> I added commons-configuration and there is now another missing dependency.  What do you mean by “all of Hadoop’s dependencies”?
> 
> 
> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
> 
>> You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.
>> 
>> Thanks,
>> Hari
>> 
>> 
>> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
>> 
>> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
>> 
>> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
>> 
>> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
>> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
>> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
>> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
>> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
>> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
>> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
>> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
>> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
>> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
>> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
>> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
>> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
>> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> 	at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> 	... 17 more
>> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
>> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
>> [vagrant@localhost 6]$ 
>> 
>> Is there another jar file I need?
>> 
>> Thanks.
>> 
>> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
>> 
>>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>>  
>>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>>  
>>>  
>>>  
>>>  
>>> 2014-09-30
>>> shengyi.pan
>>> 发件人：Ed Judge <ej...@gmail.com>
>>> 发送时间：2014-09-29 22:38
>>> 主题：HDFS sink to a remote HDFS node
>>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>>> 抄送：
>>>  
>>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>>> 
>>> # Describe the sink
>>> a1.sinks.k1.type = hdfs
>>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>> 
>>> 
>>> Thanks,
>>> Ed
>> 
>> 
> 
>

Re: HDFS sink to a remote HDFS node

Posted by Hari Shreedharan <hs...@cloudera.com>.

You'd need to add the jars that hadoop itself depends on. Flume pulls it in
if Hadoop is installed on that machine, else you'd need to manually
download it and install it. If you are using Hadoop 2.x, install the RPM
provided by Bigtop.

On Tue, Sep 30, 2014 at 12:12 PM, Ed Judge <ej...@gmail.com> wrote:

> I added commons-configuration and there is now another missing
> dependency.  What do you mean by “all of Hadoop’s dependencies”?
>
>
> On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com>
> wrote:
>
> You actually need to add of all Hadoop’s dependencies to Flume classpath.
> Looks like Apache Commons Configuration is missing in classpath.
>
> Thanks,
> Hari
>
>
> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
>
>> Thank you.  I am using hadoop 2.5 which I think uses
>> protobuf-java-2.5.0.jar.
>>
>> I am getting the following error even after adding those 2 jar files to
>> my flume-ng classpath:
>>
>>  30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0]
>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)
>> - Configuration provider starting
>> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0]
>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)
>> - Reloading configuration file:./src.conf
>> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)
>> - Added sinks: k1 Agent: a1
>> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration
>> property ignored: i# = Describe the sink
>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
>> - Processing:k1
>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0]
>> (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  -
>> Post-validation flume configuration contains configuration for agents: [a1]
>> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0]
>> (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  -
>> Creating channels
>> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0]
>> (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating
>> instance of channel c1 type memory
>> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0]
>> (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  -
>> Created channel c1
>> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0]
>> (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating
>> instance of source r1, type org.apache.flume.source.twitter.TwitterSource
>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0]
>> (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer
>> Key:        'tobhMtidckJoe1tByXDmI4pW3'
>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0]
>> (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer
>> Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
>> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0]
>> (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access
>> Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
>> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0]
>> (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access
>> Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
>> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0]
>> (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance
>> of sink: k1, type: hdfs
>> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0]
>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)
>> - Failed to start agent because dependencies were not found in classpath.
>> Error follows.
>> java.lang.NoClassDefFoundError:
>> org/apache/commons/configuration/Configuration
>>  at
>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
>>  at
>> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
>>  at
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
>>  at
>> org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
>>  at
>> org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
>>  at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>>  at
>> org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
>>  at
>> org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
>>  at
>> org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>  at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>  at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.configuration.Configuration
>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>  ... 17 more
>> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook]
>> (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping
>> lifecycle supervisor 10
>> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook]
>> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)
>> - Configuration provider stopping
>> [vagrant@localhost 6]$
>>
>> Is there another jar file I need?
>>
>> Thanks.
>>
>> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
>>
>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your
>> flume-ng classpath, and the dependent hadoop jar version must match your
>> hadoop system.
>>
>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar"
>> (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is
>> under flume lib directory ), because the pb interface of hdfs-2.0 is
>> compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail
>> to start....
>>
>>
>>
>>
>> 2014-09-30
>> ------------------------------
>> shengyi.pan
>> ------------------------------
>> *发件人：*Ed Judge <ej...@gmail.com>
>> *发送时间：*2014-09-29 22:38
>> *主题：*HDFS sink to a remote HDFS node
>> *收件人：*"user@flume.apache.org"<us...@flume.apache.org>
>> *抄送：*
>>
>> I am trying to run the flume-ng agent on one node with an HDFS sink
>> pointing to an HDFS filesystem on another node.
>> Is this possible?  What packages/jar files are needed on the flume agent
>> node for this to work?  Secondary goal is to install only what is needed on
>> the flume-ng node.
>>
>>  # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>
>>
>> Thanks,
>> Ed
>>
>>
>>
>
>

Re: HDFS sink to a remote HDFS node

Posted by Ed Judge <ej...@gmail.com>.

I added commons-configuration and there is now another missing dependency.  What do you mean by “all of Hadoop’s dependencies”?


On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <hs...@cloudera.com> wrote:

> You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.
> 
> Thanks,
> Hari
> 
> 
> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:
> 
> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
> 
> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
> 
> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	... 17 more
> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
> [vagrant@localhost 6]$ 
> 
> Is there another jar file I need?
> 
> Thanks.
> 
> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
> 
>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>  
>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>  
>>  
>>  
>>  
>> 2014-09-30
>> shengyi.pan
>> 发件人：Ed Judge <ej...@gmail.com>
>> 发送时间：2014-09-29 22:38
>> 主题：HDFS sink to a remote HDFS node
>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>> 抄送：
>>  
>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>> 
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>> 
>> 
>> Thanks,
>> Ed
> 
>

Re: HDFS sink to a remote HDFS node

Posted by Hari Shreedharan <hs...@cloudera.com>.

You actually need to add of all Hadoop’s dependencies to Flume classpath. Looks like Apache Commons Configuration is missing in classpath.


Thanks,
Hari

On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <ej...@gmail.com> wrote:

> Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
> I am getting the following error even after adding those 2 jar files to my flume-ng classpath:
> 30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
> 30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
> 30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
> 30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
> 30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
> 30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
> 30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
> 30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
> 30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
> 30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
> 30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
> 	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
> 	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
> 	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
> 	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
> 	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	... 17 more
> 30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
> 30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
> [vagrant@localhost 6]$ 
> Is there another jar file I need?
> Thanks.
> On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:
>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>>  
>> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>>  
>>  
>>  
>>  
>> 2014-09-30
>> shengyi.pan
>> 发件人：Ed Judge <ej...@gmail.com>
>> 发送时间：2014-09-29 22:38
>> 主题：HDFS sink to a remote HDFS node
>> 收件人："user@flume.apache.org"<us...@flume.apache.org>
>> 抄送：
>>  
>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
>> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
>> 
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>> 
>> 
>> Thanks,
>> Ed

Re: HDFS sink to a remote HDFS node

Posted by Ed Judge <ej...@gmail.com>.

Thank you.  I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.

I am getting the following error even after adding those 2 jar files to my flume-ng classpath:

30 Sep 2014 18:27:03,269 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)  - Configuration provider starting
30 Sep 2014 18:27:03,278 INFO  [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)  - Reloading configuration file:./src.conf
30 Sep 2014 18:27:03,288 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)  - Added sinks: k1 Agent: a1
30 Sep 2014 18:27:03,289 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,292 WARN  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.<init>:101)  - Configuration property ignored: i# = Describe the sink
30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,292 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,293 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)  - Processing:k1
30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140)  - Post-validation flume configuration contains configuration for agents: [a1]
30 Sep 2014 18:27:03,312 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150)  - Creating channels
30 Sep 2014 18:27:03,329 INFO  [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40)  - Creating instance of channel c1 type memory
30 Sep 2014 18:27:03,351 INFO  [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205)  - Created channel c1
30 Sep 2014 18:27:03,352 INFO  [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39)  - Creating instance of source r1, type org.apache.flume.source.twitter.TwitterSource
30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:110)  - Consumer Key:        'tobhMtidckJoe1tByXDmI4pW3'
30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:111)  - Consumer Secret:     '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
30 Sep 2014 18:27:03,363 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:112)  - Access Token:        '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
30 Sep 2014 18:27:03,364 INFO  [conf-file-poller-0] (org.apache.flume.source.twitter.TwitterSource.configure:113)  - Access Token Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
30 Sep 2014 18:27:03,825 INFO  [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40)  - Creating instance of sink: k1, type: hdfs
30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
	at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 17 more
30 Sep 2014 18:27:33,491 INFO  [agent-shutdown-hook] (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping lifecycle supervisor 10
30 Sep 2014 18:27:33,493 INFO  [agent-shutdown-hook] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83)  - Configuration provider stopping
[vagrant@localhost 6]$ 

Is there another jar file I need?

Thanks.

On Sep 29, 2014, at 9:04 PM, shengyi.pan <sh...@gmail.com> wrote:

> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.
>  
> if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....
>  
>  
>  
>  
> 2014-09-30
> shengyi.pan
> 发件人：Ed Judge <ej...@gmail.com>
> 发送时间：2014-09-29 22:38
> 主题：HDFS sink to a remote HDFS node
> 收件人："user@flume.apache.org"<us...@flume.apache.org>
> 抄送：
>  
> I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
> Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.
> 
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
> 
> 
> Thanks,
> Ed

Re: HDFS sink to a remote HDFS node

Posted by "shengyi.pan" <sh...@gmail.com>.

you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your flume-ng classpath, and the dependent hadoop jar version must match your hadoop system.

if sink to hadoop-2.0.0,  you should use "protobuf-java-2.4.1.jar" (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is under flume lib directory ), because the pb interface of hdfs-2.0 is compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail to start....




2014-09-30



shengyi.pan



发件人：Ed Judge <ej...@gmail.com>
发送时间：2014-09-29 22:38
主题：HDFS sink to a remote HDFS node
收件人："user@flume.apache.org"<us...@flume.apache.org>
抄送：

I am trying to run the flume-ng agent on one node with an HDFS sink pointing to an HDFS filesystem on another node.
Is this possible?  What packages/jar files are needed on the flume agent node for this to work?  Secondary goal is to install only what is needed on the flume-ng node.


# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/




Thanks,
Ed