You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/01/06 03:07:27 UTC
[jira] Created: (HADOOP-862) Add handling of s3 to CopyFile tool
Add handling of s3 to CopyFile tool
-----------------------------------
Key: HADOOP-862
URL: https://issues.apache.org/jira/browse/HADOOP-862
Project: Hadoop
Issue Type: Improvement
Components: util
Affects Versions: 0.10.0
Reporter: stack@archive.org
Priority: Minor
CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462676 ]
stack@archive.org commented on HADOOP-862:
------------------------------------------
Attached is first cut at adding s3 handling to CopyFiles.
Here's list of changes:
+ Allow hdfs or dfs URI schemes (Used to be dfs only).
+ Changed the usage message so filesystem is generic URI (rather than namenode:port | local).
+ getFileSysName was removed. Use Filesystem.get with fs URI instead.
+ getMapCount: Moved duplicated code for figuring number of maps here.
+ toURI: Added. Have (duplicated) tests of URIness go via here instead.
+ CopyFilesReducer: Removed two instances. Does nothing.
+ Added testing of URIness to members of file-of-source URIs.
+ Minor javadoc and formatting changes.
Its lightly tested.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.0
> Reporter: stack@archive.org
> Priority: Minor
> Attachments: copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated HADOOP-862:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I just committed this. Thanks, Michael!
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack@archive.org updated HADOOP-862:
-------------------------------------
Fix Version/s: 0.11.0
Affects Version/s: (was: 0.10.0)
0.10.1
Status: Patch Available (was: Open)
Marking issue with 'patch available'.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469210 ]
Tom White commented on HADOOP-862:
----------------------------------
I just tried using this patch, and I managed to copy some local files to the S3 file system without trouble.
Looking at the code I noticed that the -fs option doesn't seem to be used any longer so it can be dropped. Other than that, it looks fine to me.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.0
> Reporter: stack@archive.org
> Priority: Minor
> Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469853 ]
Hadoop QA commented on HADOOP-862:
----------------------------------
+1, because http://issues.apache.org/jira/secure/attachment/12350237/copyfiles-s3-4.diff applied and successfully tested against trunk revision r502694.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack@archive.org updated HADOOP-862:
-------------------------------------
Attachment: copyfiles-s3-4.diff
New patch to fix broken unit test. Removes 'dfs' scheme. Only 'hdfs' allowed from here on out.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by Nigel Daley <nd...@yahoo-inc.com>.
org.apache.hadoop.mapred.TestMiniMRLocalFS hung the process. I'm
restarting now...
On Feb 2, 2007, at 11:38 AM, Doug Cutting (JIRA) wrote:
>
> [ https://issues.apache.org/jira/browse/HADOOP-862?
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel#action_12469841 ]
>
> Doug Cutting commented on HADOOP-862:
> -------------------------------------
>
>> Mr 'Hadoop QA' [ ... ]
>
> Please, call him "Nigel".
>
>> Add handling of s3 to CopyFile tool
>> -----------------------------------
>>
>> Key: HADOOP-862
>> URL: https://issues.apache.org/jira/browse/HADOOP-862
>> Project: Hadoop
>> Issue Type: Improvement
>> Components: util
>> Affects Versions: 0.10.1
>> Reporter: stack@archive.org
>> Priority: Minor
>> Fix For: 0.11.0
>>
>> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff,
>> copyfiles-s3-4.diff, copyfiles-s3.diff
>>
>>
>> CopyFile is a useful tool for doing bulk copies. It doesn't have
>> handling for the recently added s3 filesystem.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469841 ]
Doug Cutting commented on HADOOP-862:
-------------------------------------
> Mr 'Hadoop QA' [ ... ]
Please, call him "Nigel".
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469685 ]
Hadoop QA commented on HADOOP-862:
----------------------------------
-1, because 3 attempts failed to build and test the latest attachment (http://issues.apache.org/jira/secure/attachment/12350196/copyfiles-s3-3.diff) against trunk revision r502402. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469836 ]
stack@archive.org commented on HADOOP-862:
------------------------------------------
Mr 'Hadoop QA', do I have to do anything special to re-trigger your auto-application and test of version 4 of the patch? Thanks.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3-4.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack@archive.org updated HADOOP-862:
-------------------------------------
Attachment: copyfiles-s3-3.diff
Fix usage string (suggested by Tom White review)
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.0
> Reporter: stack@archive.org
> Priority: Minor
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463198 ]
stack@archive.org commented on HADOOP-862:
------------------------------------------
Updated patch.
+ Renamed DFSCopyFilesMapper as FSCopyFilesMapper
+ If no scheme, use 'default' (the value of 'fs.default.name' in hadoop-site.xml).
I ran more extensive tests going from hdfs to s3 and back again and copying from http into s3 and hdfs (distcp is a nice tool). For example, here is output from a copy of a small nutch segment from hdfs to s3 (in the below hdfs was set as the fs.default.name filesystem):
stack@debord:~/checkouts/hadoop$ ./bin/hadoop fs -lsr outputs/segments
/user/stack/outputs/segments/20070108213341-test <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/data <r 1> 1187
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/index <r 1> 234
/user/stack/outputs/segments/20070108213341-test/crawl_parse <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_parse/part-00000 <r 1> 9010
/user/stack/outputs/segments/20070108213341-test/parse_data <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/data <r 1> 4630
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/index <r 1> 234
/user/stack/outputs/segments/20070108213341-test/parse_text <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/data <r 1> 6180
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/index <r 1> 234
Here's copy to an s3 directory named segments-bkup:
% ./bin/hadoop distcp /user/stack/outputs/segments s3://KEY:SECRET@BUCKET/segments-bkup
Here's listing of s3 content:
stack@debord:~/checkouts/hadoop$ ./bin/hadoop fs -fs s3://KEY:SECRET@BUCKET/segments-bkup -lsr /segments-bkup/
/segments-bkup/20070108213341-test <dir>
/segments-bkup/20070108213341-test/crawl_fetch <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000 <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/data <r 1> 1187
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/index <r 1> 234
/segments-bkup/20070108213341-test/crawl_parse <dir>
/segments-bkup/20070108213341-test/crawl_parse/part-00000 <r 1> 9010
/segments-bkup/20070108213341-test/parse_data <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000 <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000/data <r 1> 4630
/segments-bkup/20070108213341-test/parse_data/part-00000/index <r 1> 234
/segments-bkup/20070108213341-test/parse_text <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000 <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000/data <r 1> 6180
/segments-bkup/20070108213341-test/parse_text/part-00000/index <r 1> 234
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.0
> Reporter: stack@archive.org
> Priority: Minor
> Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469681 ]
stack@archive.org commented on HADOOP-862:
------------------------------------------
Thanks for the review Tom.
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.1
> Reporter: stack@archive.org
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: copyfiles-s3-2.diff, copyfiles-s3-3.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack@archive.org updated HADOOP-862:
-------------------------------------
Attachment: copyfiles-s3.diff
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.0
> Reporter: stack@archive.org
> Priority: Minor
> Attachments: copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HADOOP-862) Add handling of s3 to CopyFile tool
Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack@archive.org updated HADOOP-862:
-------------------------------------
Attachment: copyfiles-s3-2.diff
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.0
> Reporter: stack@archive.org
> Priority: Minor
> Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira