You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Cinyoung Hur <ci...@gmail.com> on 2017/07/25 08:00:31 UTC

How to use webhdfs CONCAT?

https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files

I tried to concat multiple parts to single target file through webhdfs.
But, I couldn't do it.
Could you give me examples concatenating parts?

Re: How to use webhdfs CONCAT?

Posted by Wellington Chevreuil <we...@gmail.com>.

Yes, all the files passed must pre-exist. In this case, you would need to run something as follows:

curl -i -X POST "http://HOST/webhdfs/v1/PATH_TO_YOUR_HDFS_FOLDER/part-01-000000-000?user.name=hadoop&op=CONCAT&sources=PATH_TO_YOUR_HDFS_FOLDER/part-02-000000-000,PATH_TO_YOUR_HDFS_FOLDER/part-04-000000-000"

Where these 3 files would be concatenated into PATH_TO_YOUR_HDFS_FOLDER/part-01-000000-000 file. Note that this will only work if the file sizes are exact multiples of "dfs.block.size". If not, you may get another error.

> On 27 Jul 2017, at 10:06, Cinyoung Hur <ci...@gmail.com> wrote:
> 
> Hi, Wellington
> 
> All the source parts are:
> -rw-r--r--	hadoop	supergroup	2.43 KB	2	32 MB	part-01-000000-000
> -rw-r--r--	hadoop	supergroup	21.14 MB	2	32 MB	part-02-000000-000
> -rw-r--r--	hadoop	supergroup	22.1 MB	2	32 MB	part-04-000000-000
> -rw-r--r--	hadoop	supergroup	22.29 MB	2	32 MB	part-05-000000-000
> -rw-r--r--	hadoop	supergroup	22.29 MB	2	32 MB	part-06-000000-000
> -rw-r--r--	hadoop	supergroup	22.56 MB	2	32 MB	part-07-000000-000
> 
> 
> I got this exception. It seems like I have to create target file before concatenation.
> 
> curl -i -X POST "http://HOST/webhdfs/v1/tajo/warehouse/hira_analysis/material_usage_concat?user.name=hadoop&op=CONCAT&sources=/tajo/warehouse/hira_analysis/material_usage <http://host/webhdfs/v1/tajo/warehouse/hira_analysis/material_usage_concat?user.name=hadoop&op=CONCAT&sources=/tajo/warehouse/hira_analysis/material_usage>"
> HTTP/1.1 404 Not Found
> Date: Thu, 27 Jul 2017 09:05:48 GMT
> Server: Jetty(6.1.26)
> Content-Type: application/json
> Cache-Control: no-cache
> Expires: Thu, 27 Jul 2017 09:05:48 GMT
> Pragma: no-cache
> Expires: Thu, 27 Jul 2017 09:05:48 GMT
> Pragma: no-cache
> Set-Cookie: hadoop.auth="u=hadoop&p=hadoop&t=simple&e=1501182348739&s=o02nv4on4FXbhlijJ+R/KXvhooQ="; Path=/; Expires=Thu, 27-Jul-2017 19:05:48 GMT; HttpOnly
> Transfer-Encoding: chunked
> 
> {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File does not exist: /tajo/warehouse/hira_analysis/material_usage_concat"}}%        
> 
> Thanks!
> 
> 2017-07-26 0:54 GMT+09:00 Wellington Chevreuil <wellington.chevreuil@gmail.com <ma...@gmail.com>>:
> Hi Cinyoung, 
> 
> Concat has some restrictions, like the need for src file having last block size to be the same as the configured dfs.block.size. If all the conditions are met, below command example should work (where we are concatenating /user/root/file-2 into /user/root/file-1):
> 
> curl -i -X POST "http:HTTPFS_HOST:14000/webhdfs/v1/user/root/file-1?user.name <http://user.name/>=root&op=CONCAT&sources=/user/root/file-2"
> 
> Is this similar to what you had tried? Can you share the resulting output you are getting?
> 
> 
> 
>> On 25 Jul 2017, at 09:00, Cinyoung Hur <cinyoung.hur@gmail.com <ma...@gmail.com>> wrote:
>> 
>> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files <https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files>
>> 
>> I tried to concat multiple parts to single target file through webhdfs. 
>> But, I couldn't do it. 
>> Could you give me examples concatenating parts?
> 
>

Re: How to use webhdfs CONCAT?

Posted by Cinyoung Hur <ci...@gmail.com>.

Hi, Wellington

All the source parts are:

-rw-r--r-- hadoop supergroup 2.43 KB 2 32 MB part-01-000000-000
-rw-r--r-- hadoop supergroup 21.14 MB 2 32 MB part-02-000000-000
-rw-r--r-- hadoop supergroup 22.1 MB 2 32 MB part-04-000000-000
-rw-r--r-- hadoop supergroup 22.29 MB 2 32 MB part-05-000000-000
-rw-r--r-- hadoop supergroup 22.29 MB 2 32 MB part-06-000000-000
-rw-r--r-- hadoop supergroup 22.56 MB 2 32 MB part-07-000000-000



I got this exception. It seems like I have to create target file before
concatenation.

curl -i -X POST "
http://HOST/webhdfs/v1/tajo/warehouse/hira_analysis/material_usage_concat?user.name=hadoop&op=CONCAT&sources=/tajo/warehouse/hira_analysis/material_usage
"
HTTP/1.1 404 Not Found
Date: Thu, 27 Jul 2017 09:05:48 GMT
Server: Jetty(6.1.26)
Content-Type: application/json
Cache-Control: no-cache
Expires: Thu, 27 Jul 2017 09:05:48 GMT
Pragma: no-cache
Expires: Thu, 27 Jul 2017 09:05:48 GMT
Pragma: no-cache
Set-Cookie:
hadoop.auth="u=hadoop&p=hadoop&t=simple&e=1501182348739&s=o02nv4on4FXbhlijJ+R/KXvhooQ=";
Path=/; Expires=Thu, 27-Jul-2017 19:05:48 GMT; HttpOnly
Transfer-Encoding: chunked

{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /tajo/warehouse/hira_analysis/material_usage_concat"}}%


Thanks!

2017-07-26 0:54 GMT+09:00 Wellington Chevreuil <
wellington.chevreuil@gmail.com>:

> Hi Cinyoung,
>
> Concat has some restrictions, like the need for src file having last block
> size to be the same as the configured dfs.block.size. If all the conditions
> are met, below command example should work (where we are concatenating
> /user/root/file-2 into /user/root/file-1):
>
> curl -i -X POST "http:HTTPFS_HOST:14000/webhdfs/v1/user/root/file-1?us
> er.name=root&op=CONCAT&sources=/user/root/file-2"
>
> Is this similar to what you had tried? Can you share the resulting output
> you are getting?
>
>
>
> On 25 Jul 2017, at 09:00, Cinyoung Hur <ci...@gmail.com> wrote:
>
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-
> dist/hadoop-hdfs/WebHDFS.html#Concat_Files
>
> I tried to concat multiple parts to single target file through webhdfs.
> But, I couldn't do it.
> Could you give me examples concatenating parts?
>
>
>

Re: How to use webhdfs CONCAT?

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi Cinyoung, 

Concat has some restrictions, like the need for src file having last block size to be the same as the configured dfs.block.size. If all the conditions are met, below command example should work (where we are concatenating /user/root/file-2 into /user/root/file-1):

curl -i -X POST "http:HTTPFS_HOST:14000/webhdfs/v1/user/root/file-1?user.name=root&op=CONCAT&sources=/user/root/file-2"

Is this similar to what you had tried? Can you share the resulting output you are getting?

> On 25 Jul 2017, at 09:00, Cinyoung Hur <ci...@gmail.com> wrote:
> 
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files <https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files>
> 
> I tried to concat multiple parts to single target file through webhdfs. 
> But, I couldn't do it. 
> Could you give me examples concatenating parts?