You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Thomas Vachon <va...@sessionm.com> on 2011/12/14 18:41:42 UTC

Flume 0.9.4 and AWS EMR 0.20.250

AWS uses Hadoop 0.20.250 and Flume 0.9.4 seems to be using a newer system. This is causing delivery of logs into HDFS to fail with: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 63, server = 61).

I tried replacing hadoop-core.jar with the Apache Hadoop 0.20.250 version, but that caused bigger problems (flume was saying "not logged in" and throwing exceptions).  What is the correct way to fix this problem?  I obvisouly cannot change Amazon's version of Hadoop, so I need to find a compatible version of Flume.

Re: Flume 0.9.4 and AWS EMR 0.20.250

Posted by Thomas Vachon <va...@sessionm.com>.

Eric,

I am still running into this and did a test as you said.  Here is a sanitized log/shell output.

root@$collector:/tmp# tail -f /var/log/flume/flume-flume-node-$collector.log | grep achievements20111222-131545535+0000.685203378014020.00000134.gz
2011-12-22 20:15:08,690 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: hdfs://$NAMENODE/flume/achievements/2011/12/22/09/achievements20111222-131545535+0000.685203378014020.00000134.gz.tmp
2011-12-22 20:15:08,690 INFO com.cloudera.flume.handlers.debug.InsistentAppendDecorator: append attempt 421 failed, backoff (60000ms): java.io.IOException: File /flume/achievements/2011/12/22/09/achievements20111222-131545535+0000.685203378014020.00000134.gz.tmp could only be replicated to 0 nodes, instead of 1

root@$collector:/tmp# sudo -u flume hadoop fs -ls hdfs://$NAMENODE/flume/achievements/2011/12/22/09/achievements20111222-131545535+0000.685203378014020.00000134.gz.tmp
Found 1 items
-rw-r--r--   3 flume supergroup          0 2011-12-22 13:15 /flume/achievements/2011/12/22/09/achievements20111222-131545535+0000.685203378014020.00000134.gz.tmp


root@$collector:/tmp# sudo -u flume hadoop fs -put test.txt hdfs://$NAMENODE/flume/test.txt

root@$collector:/tmp# sudo -u flume hadoop fs -ls hdfs://$NAMENODE/flume
Found 5 items
drwxr-xr-x   - flume supergroup          0 2011-12-21 21:23 /flume/achievements
<snip>
-rw-r--r--   3 flume supergroup          5 2011-12-22 20:14 /flume/test.txt


As you can see I can write.  I think the .gz.tmp in question is old?  Is there a way to make it just forget about it?  Also, this is normal Hadoop now (EMR has too many limitations) and we are able to use 0.9.4 as a result

--
Thomas Vachon
Principal Operations Architect
session M


On Dec 14, 2011, at 5:10 PM, Eric Sammer wrote:

> This isn't a Flume error as much as an HDFS error. The hint is the "...could only be replicated to 0 nodes, instead of 1" meaning that, while Flume could talk to the NN to open the file, it can't write to the the DN it needs to. I see that this is EC2 fun and excitement which almost always means a security group issue. Can you do an 'hadoop fs -put foo.bar /rails/whatever/whatever/' from a machine in the same sec group as the flume node?
> 
> On Wed, Dec 14, 2011 at 12:13 PM, Thomas Vachon <va...@sessionm.com> wrote:
> I was able to get flume to write to EMR using 0.9.2, but I fear I have run into other bugs.
> 
> 2011-12-14 20:01:05,177 ERROR com.cloudera.flume.handlers.rolling.RollSink: Failure when attempting to rotate and open new sink: java.io.IOException: File /rails/adstats/2011/12/09/19/adstatslog.00000019.20111214-195722621+0000.614859534929985.seq could only be replicated to 0 nodes, instead of 1
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1531)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
> 	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> 
> Any ideas if this can be safely ignored?  I checked the HDFS and I see: 
> 
> hadoop@domU-12-31-39-06-E6-CE:~$ hadoop fs -ls /rails/adstats/2011/12/14/19
> Found 1 items
> -rw-r--r--   3 hadoop supergroup          0 2011-12-14 19:57 /rails/adstats/2011/12/14/19/adstatslog.00000019.20111214-195722621+0000.614859534929985.seq
> 
> 
> I looked at the DFS admin pages and I see (which leads me to believe it replicated correctly): 
> Node	 Last 
> Contact	Admin State	 Configured 
> Capacity (GB)	Used 
> (GB)	 Non DFS 
> Used (GB)	Remaining 
> (GB)	 Used 
> (%)	Used 
> (%)	 Remaining 
> (%)	Blocks
> HOST1	2	 In Service	9.34	 0	2.4	 6.94	0	
> 74.31	 1
> HOST2	 2	In Service	 9.34	0	 2.4	6.94	 0	
> 74.3	 0
> 
> 
> The other question is, does s3n work in this version?  We want to dual-stack (if you will) our collectorSinks
> 
> 
> On Dec 14, 2011, at 12:41 PM, Thomas Vachon wrote:
> 
>> AWS uses Hadoop 0.20.250 and Flume 0.9.4 seems to be using a newer system. This is causing delivery of logs into HDFS to fail with: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 63, server = 61).
>> 
>> I tried replacing hadoop-core.jar with the Apache Hadoop 0.20.250 version, but that caused bigger problems (flume was saying "not logged in" and throwing exceptions).  What is the correct way to fix this problem?  I obvisouly cannot change Amazon's version of Hadoop, so I need to find a compatible version of Flume.
> 
> 
> 
> 
> -- 
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com

Re: Flume 0.9.4 and AWS EMR 0.20.250

Posted by Eric Sammer <es...@cloudera.com>.

This isn't a Flume error as much as an HDFS error. The hint is the
"...could only be replicated to 0 nodes, instead of 1" meaning that, while
Flume could talk to the NN to open the file, it can't write to the the DN
it needs to. I see that this is EC2 fun and excitement which almost always
means a security group issue. Can you do an 'hadoop fs -put foo.bar
/rails/whatever/whatever/' from a machine in the same sec group as the
flume node?

On Wed, Dec 14, 2011 at 12:13 PM, Thomas Vachon <va...@sessionm.com> wrote:

> I was able to get flume to write to EMR using 0.9.2, but I fear I have run
> into other bugs.
>
> 2011-12-14 20:01:05,177 ERROR
> com.cloudera.flume.handlers.rolling.RollSink: Failure when attempting to
> rotate and open new sink: java.io.IOException: File
> /rails/adstats/2011/12/09/19/adstatslog.00000019.20111214-195722621+0000.614859534929985.seq
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1531)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
>
> Any ideas if this can be safely ignored?  I checked the HDFS and I see:
>
> hadoop@domU-12-31-39-06-E6-CE:~$ hadoop fs -ls
> /rails/adstats/2011/12/14/19
> Found 1 items
> -rw-r--r--   3 hadoop supergroup          0 2011-12-14 19:57
> /rails/adstats/2011/12/14/19/adstatslog.00000019.20111214-195722621+0000.614859534929985.seq
>
>
> I looked at the DFS admin pages and I see (which leads me to believe it
> replicated correctly):
> NodeLast
> ContactAdmin StateConfigured
> Capacity (GB)Used
> (GB)Non DFS
> Used (GB)Remaining
> (GB)Used
> (%)Used
> (%)Remaining
> (%)BlocksHOST12In Service9.3402.46.94074.311HOST22In Service9.3402.46.940
> 74.30
>
>
> The other question is, does s3n work in this version?  We want to
> dual-stack (if you will) our collectorSinks
>
>
> On Dec 14, 2011, at 12:41 PM, Thomas Vachon wrote:
>
> AWS uses Hadoop 0.20.250 and Flume 0.9.4 seems to be using a newer system.
> This is causing delivery of logs into HDFS to fail with: Protocol
> org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client =
> 63, server = 61).
>
> I tried replacing hadoop-core.jar with the Apache Hadoop 0.20.250 version,
> but that caused bigger problems (flume was saying "not logged in" and
> throwing exceptions).  What is the correct way to fix this problem?  I
> obvisouly cannot change Amazon's version of Hadoop, so I need to find a
> compatible version of Flume.
>
>
>


-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: Flume 0.9.4 and AWS EMR 0.20.250

Posted by Thomas Vachon <va...@sessionm.com>.

I was able to get flume to write to EMR using 0.9.2, but I fear I have run into other bugs.

2011-12-14 20:01:05,177 ERROR com.cloudera.flume.handlers.rolling.RollSink: Failure when attempting to rotate and open new sink: java.io.IOException: File /rails/adstats/2011/12/09/19/adstatslog.00000019.20111214-195722621+0000.614859534929985.seq could only be replicated to 0 nodes, instead of 1
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1531)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

Any ideas if this can be safely ignored?  I checked the HDFS and I see: 

hadoop@domU-12-31-39-06-E6-CE:~$ hadoop fs -ls /rails/adstats/2011/12/14/19
Found 1 items
-rw-r--r--   3 hadoop supergroup          0 2011-12-14 19:57 /rails/adstats/2011/12/14/19/adstatslog.00000019.20111214-195722621+0000.614859534929985.seq


I looked at the DFS admin pages and I see (which leads me to believe it replicated correctly): 
Node	Last 
Contact	Admin State	Configured 
Capacity (GB)	Used 
(GB)	Non DFS 
Used (GB)	Remaining 
(GB)	Used 
(%)	Used 
(%)	Remaining 
(%)	Blocks
HOST1	2	In Service	9.34	0	2.4	6.94	0	
74.31	1
HOST2	2	In Service	9.34	0	2.4	6.94	0	
74.3	0


The other question is, does s3n work in this version?  We want to dual-stack (if you will) our collectorSinks


On Dec 14, 2011, at 12:41 PM, Thomas Vachon wrote:

> AWS uses Hadoop 0.20.250 and Flume 0.9.4 seems to be using a newer system. This is causing delivery of logs into HDFS to fail with: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 63, server = 61).
> 
> I tried replacing hadoop-core.jar with the Apache Hadoop 0.20.250 version, but that caused bigger problems (flume was saying "not logged in" and throwing exceptions).  What is the correct way to fix this problem?  I obvisouly cannot change Amazon's version of Hadoop, so I need to find a compatible version of Flume.