You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Josef Ludvíček (JIRA)" <ji...@apache.org> on 2014/11/12 13:41:33 UTC

[jira] [Created] (CAMEL-8040) camel-hdfs2 consumer overwriting data instead of appending them

Josef Ludvíček created CAMEL-8040:
-------------------------------------

             Summary: camel-hdfs2 consumer overwriting data instead of appending them
                 Key: CAMEL-8040
                 URL: https://issues.apache.org/jira/browse/CAMEL-8040
             Project: Camel
          Issue Type: Bug
          Components: camel-hdfs
    Affects Versions: 2.14.0, 2.13.0
            Reporter: Josef Ludvíček


h1. camel-hdfs2 consumer overwriting data instead of appending them

There is probably bug in camel hdfs2 consumer.

In this project are two camel routes, one taking files from `test-source` and uploading them to hadoop hdfs,
another route watching folder in hadoop hdfs and downloading them to `test-dest` folder in this project.


It seems, that when downloading file from hdfs to local filesystem, it keeps writing chunks of data to begining of target file in test-source, instead of simply appending chunks, as I would expect.
>From camel log i suppose, that each chunk of data from hadoop file is treated it was whole file.


Ruby script `generate_textfile.rb` can generate file `test.txt` with content 

{code}
0 - line
1 - line
2 - line
3 - line
4 - line
5 - line
...
...
99999 - line
{code}

h2. Scenario
 - _expecting running hadoop instance on localhost:8020_
 - run mvn camel:run
 - copy test.txt into test-source
 - see log and file test.txt in test-dest
 - rest.txt in test-dest folder should contain only last x lines of original one.
 
 
Camel log 

{code}
[localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
[localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
[localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
[localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
[localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
[localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
[localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
[localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
{code}
 
h2. Envoriment

* camel 2.14 and 2.13 
* hadoop VirtualBox VM 
* * downloaded from http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-2-x.html
* * tested with version 2.3.0-cdh5.1.0, r8e266e052e423af592871e2dfe09d54c03f6a0e8 which I couldn't find on download page
* hadoop docker image
* * https://github.com/sequenceiq/hadoop-docker
* * results were the same as with virtualbox vm


In case ov VirtualBox VM, by default it binds hdfs to `hdfs://quickstart.cloudera:8020` and it needs to be changed in `/etc/hadoop/conf/core-site.xml`. It should work when `fs.defaultFS` is set to `hdfs://0.0.0.0:8020`.

In case of docker hadoop image, first start docker container, figure out its ip address, and use it for camel hdfs component.
Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`.

{code} 
docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash

Starting sshd:                                             [  OK  ]
Starting namenodes on [966476255fc2]
966476255fc2: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-966476255fc2.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-966476255fc2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-966476255fc2.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-966476255fc2.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-966476255fc2.out
{code}
see to which IP hdfs filesystem api is bound to inside docker container
{code}
bash-4.1# netstat -tulnp 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
...
tcp        0      0 172.17.0.2:9000             0.0.0.0:*                   LISTEN      -                   
...
{code}

There might be Exception because of hdfs permissions. It could be solved by setting hdfs filesystem permissions.
{code}
bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 /

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)