You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ilayaraja <il...@rediff.co.in> on 2010/03/21 08:23:12 UTC

Hadoop Compatibility and EMR

Hi,

We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to us:

1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0?

2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?
    

Regards,
Ilayaraja


Re: Hadoop Compatibility and EMR

Posted by Vibhooti Verma <ve...@gmail.com>.
Up-gradation information is given in
http://wiki.apache.org/hadoop/Hadoop_Upgrade
Our team had done the upgrade from 15.5 to 18 with keeping the data intact
but it did require lot of testing.
I suggest to first do the upgrade on test data and then to production data.

On Thu, Mar 25, 2010 at 9:58 AM, Sagar Shukla <sagar_shukla@persistent.co.in
> wrote:

> Hi Ilayaraja,
>      Hadoop HDFS has a utility distcp using which it should be possible to
> copy data between two different versions of hadoop. Though I am not quite
> sure if this utility would be able to read data from hadoop 15.5 . More
> information is available at the link -
> http://hadoop.apache.org/common/docs/r0.19.1/distcp.html#cpver
>
> There is no information on supported versions at this URL, but it could be
> worth a try.
>
> You can also create a local environment to push your HDFS data from older
> version to newer version. And then finally, move it on Amazon EMR.
>
> Regards,
> Sagar
>
> -----Original Message-----
> From: ilayaraja [mailto:ilayaraja@rediff.co.in]
> Sent: Sunday, March 21, 2010 12:53 PM
> To: hadoop-user; hadoop-dev
> Subject: Hadoop Compatibility and EMR
>
> Hi,
>
> We 've been using hadoop 15.5 in our production environment where we have
> about 10 TB of data stored on the dfs.
> The files were generated as mapreduce output. We want to move our env. to
> Amazon Elastic Map Reduce (EMR) which throws the following questions to us:
>
> 1. EMR supports only hadoop 19.0 and above. Is it possible to use the
> current data that were generated with hadoop 15.5 from hadoop 19.0?
>
> 2. Or how can we make it possible to use or update to hadoop 19.0 from
> hadoop 15.5? What are the issues expected while doing so?
>
>
> Regards,
> Ilayaraja
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>



-- 
cheers,
Vibhooti

Error with distcp: hdfs to S3 bulk transfer

Posted by ilayaraja <il...@rediff.co.in>.
The following error is thrown when distcp ing data from hdfs (hadoop 15.5) 
to S3 storage.
This problem is creeping in after actually applying couple of bug fixes in 
hadoop 15.5 that were resolved in the later versions.
Any thoughts would be greatly helpful.

With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.fs.s3.S3Exception: 
org.jets3t.service.S3ServiceException: S3 GET failed. XML Error Message: 
<?xml version="1.0" 
encoding="UTF-8"?><Error><Code>NoSuchKey</Code><Message>The specified key 
does not 
exist.</Message><Key>/user/root/ImplicitFeedback/linkdb-test</Key><RequestId>1249D2146A4A104E</RequestId><HostId>....</HostId></Error>
        at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:199)
        at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.inodeExists(Jets3tFileSystemStore.java:169)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.inodeExists(Unknown Source)
        at 
org.apache.hadoop.fs.s3.S3FileSystem.exists(S3FileSystem.java:127)
        at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:675)
        at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475)
        at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)

Regards & Thanks,
Ilayaraja 


Error with distcp: hdfs to S3 bulk transfer

Posted by ilayaraja <il...@rediff.co.in>.
The following error is thrown when distcp ing data from hdfs (hadoop 15.5) 
to S3 storage.
This problem is creeping in after actually applying couple of bug fixes in 
hadoop 15.5 that were resolved in the later versions.
Any thoughts would be greatly helpful.

With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.fs.s3.S3Exception: 
org.jets3t.service.S3ServiceException: S3 GET failed. XML Error Message: 
<?xml version="1.0" 
encoding="UTF-8"?><Error><Code>NoSuchKey</Code><Message>The specified key 
does not 
exist.</Message><Key>/user/root/ImplicitFeedback/linkdb-test</Key><RequestId>1249D2146A4A104E</RequestId><HostId>....</HostId></Error>
        at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:199)
        at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.inodeExists(Jets3tFileSystemStore.java:169)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.inodeExists(Unknown Source)
        at 
org.apache.hadoop.fs.s3.S3FileSystem.exists(S3FileSystem.java:127)
        at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:675)
        at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475)
        at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)

Regards & Thanks,
Ilayaraja 


Re: Hadoop Compatibility and EMR

Posted by Vibhooti Verma <ve...@gmail.com>.
Up-gradation information is given in
http://wiki.apache.org/hadoop/Hadoop_Upgrade
Our team had done the upgrade from 15.5 to 18 with keeping the data intact
but it did require lot of testing.
I suggest to first do the upgrade on test data and then to production data.

On Thu, Mar 25, 2010 at 9:58 AM, Sagar Shukla <sagar_shukla@persistent.co.in
> wrote:

> Hi Ilayaraja,
>      Hadoop HDFS has a utility distcp using which it should be possible to
> copy data between two different versions of hadoop. Though I am not quite
> sure if this utility would be able to read data from hadoop 15.5 . More
> information is available at the link -
> http://hadoop.apache.org/common/docs/r0.19.1/distcp.html#cpver
>
> There is no information on supported versions at this URL, but it could be
> worth a try.
>
> You can also create a local environment to push your HDFS data from older
> version to newer version. And then finally, move it on Amazon EMR.
>
> Regards,
> Sagar
>
> -----Original Message-----
> From: ilayaraja [mailto:ilayaraja@rediff.co.in]
> Sent: Sunday, March 21, 2010 12:53 PM
> To: hadoop-user; hadoop-dev
> Subject: Hadoop Compatibility and EMR
>
> Hi,
>
> We 've been using hadoop 15.5 in our production environment where we have
> about 10 TB of data stored on the dfs.
> The files were generated as mapreduce output. We want to move our env. to
> Amazon Elastic Map Reduce (EMR) which throws the following questions to us:
>
> 1. EMR supports only hadoop 19.0 and above. Is it possible to use the
> current data that were generated with hadoop 15.5 from hadoop 19.0?
>
> 2. Or how can we make it possible to use or update to hadoop 19.0 from
> hadoop 15.5? What are the issues expected while doing so?
>
>
> Regards,
> Ilayaraja
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>



-- 
cheers,
Vibhooti

RE: Hadoop Compatibility and EMR

Posted by Sagar Shukla <sa...@persistent.co.in>.
Hi Ilayaraja,
      Hadoop HDFS has a utility distcp using which it should be possible to copy data between two different versions of hadoop. Though I am not quite sure if this utility would be able to read data from hadoop 15.5 . More information is available at the link - http://hadoop.apache.org/common/docs/r0.19.1/distcp.html#cpver

There is no information on supported versions at this URL, but it could be worth a try.

You can also create a local environment to push your HDFS data from older version to newer version. And then finally, move it on Amazon EMR.

Regards,
Sagar

-----Original Message-----
From: ilayaraja [mailto:ilayaraja@rediff.co.in] 
Sent: Sunday, March 21, 2010 12:53 PM
To: hadoop-user; hadoop-dev
Subject: Hadoop Compatibility and EMR

Hi,

We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to us:

1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0?

2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?
    

Regards,
Ilayaraja


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Hadoop Compatibility and EMR

Posted by Philip Zeyliger <ph...@cloudera.com>.
I believe 0.15 had HftpFileSystem.
http://hadoop.apache.org/common/docs/r0.15.3/api/index.html

You may be able to run 0.19's distcp to copy from your 0.15 (use HFTP
as the source) to HDFS.

-- Philip


On Sun, Mar 21, 2010 at 12:26 PM, Owen O'Malley <ow...@gmail.com> wrote:
> I believe you need to take two jumps. I believe it is 15-> 18 -> 20. I'd
> strongly suggest trying a practice file system first. Did we have owners and
> perms in 15? If not, you'll need to set owners and perms.
>
> -- Owen
>
> On Mar 21, 2010, at 12:23 AM, "ilayaraja" <il...@rediff.co.in> wrote:
>
>> Hi,
>>
>> We 've been using hadoop 15.5 in our production environment where we have
>> about 10 TB of data stored on the dfs.
>> The files were generated as mapreduce output. We want to move our env. to
>> Amazon Elastic Map Reduce (EMR) which throws the following questions to us:
>>
>> 1. EMR supports only hadoop 19.0 and above. Is it possible to use the
>> current data that were generated with hadoop 15.5 from hadoop 19.0?
>>
>> 2. Or how can we make it possible to use or update to hadoop 19.0 from
>> hadoop 15.5? What are the issues expected while doing so?
>>
>>
>> Regards,
>> Ilayaraja
>>
>

Re: Hadoop Compatibility and EMR

Posted by Owen O'Malley <ow...@gmail.com>.
I believe you need to take two jumps. I believe it is 15-> 18 -> 20.  
I'd strongly suggest trying a practice file system first. Did we have  
owners and perms in 15? If not, you'll need to set owners and perms.

-- Owen

On Mar 21, 2010, at 12:23 AM, "ilayaraja" <il...@rediff.co.in>  
wrote:

> Hi,
>
> We 've been using hadoop 15.5 in our production environment where we  
> have about 10 TB of data stored on the dfs.
> The files were generated as mapreduce output. We want to move our  
> env. to Amazon Elastic Map Reduce (EMR) which throws the following  
> questions to us:
>
> 1. EMR supports only hadoop 19.0 and above. Is it possible to use  
> the current data that were generated with hadoop 15.5 from hadoop  
> 19.0?
>
> 2. Or how can we make it possible to use or update to hadoop 19.0  
> from hadoop 15.5? What are the issues expected while doing so?
>
>
> Regards,
> Ilayaraja
>

RE: Hadoop Compatibility and EMR

Posted by Sagar Shukla <sa...@persistent.co.in>.
Hi Ilayaraja,
      Hadoop HDFS has a utility distcp using which it should be possible to copy data between two different versions of hadoop. Though I am not quite sure if this utility would be able to read data from hadoop 15.5 . More information is available at the link - http://hadoop.apache.org/common/docs/r0.19.1/distcp.html#cpver

There is no information on supported versions at this URL, but it could be worth a try.

You can also create a local environment to push your HDFS data from older version to newer version. And then finally, move it on Amazon EMR.

Regards,
Sagar

-----Original Message-----
From: ilayaraja [mailto:ilayaraja@rediff.co.in] 
Sent: Sunday, March 21, 2010 12:53 PM
To: hadoop-user; hadoop-dev
Subject: Hadoop Compatibility and EMR

Hi,

We 've been using hadoop 15.5 in our production environment where we have about 10 TB of data stored on the dfs.
The files were generated as mapreduce output. We want to move our env. to Amazon Elastic Map Reduce (EMR) which throws the following questions to us:

1. EMR supports only hadoop 19.0 and above. Is it possible to use the current data that were generated with hadoop 15.5 from hadoop 19.0?

2. Or how can we make it possible to use or update to hadoop 19.0 from hadoop 15.5? What are the issues expected while doing so?
    

Regards,
Ilayaraja


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.