You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Adrian Sandulescu <sa...@gmail.com> on 2013/10/08 15:35:03 UTC

exportSnapshot MISSING_FILES

Hello everyone,

I'm using this tool to export and "import" snapshots from S3:
https://github.com/lospro7/snapshot-s3-util/blob/master/src/main/java/com/imgur/backup/SnapshotS3Util.java

I'm using this tool because it seems like a better option than ExportTable,
considering there isn't another HDFS cluster on hand.

It uses the following trick to make exportSnapshot "import" from S3 to the
local HDFS.

            // Override dfs configuration to point to S3
            config.set("fs.default.name", s3protocol + accessKey + ":"
+ accessSecret + "@" + bucketName);
            config.set("fs.defaultFS", s3protocol + accessKey + ":" +
accessSecret  + "@" + bucketName);
            config.set("fs.s3.awsAccessKeyId", accessKey);
            config.set("fs.s3.awsSecretAccessKey", accessSecret);
            config.set("hbase.tmp.dir", "/tmp/hbase-${user.name}");
            config.set("hbase.rootdir", s3Url);


Imports work great, but only when using the s3n:// protocol (which means
and HFile limit of 5GB).
When using the s3:// protocol, I get the following:
13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1

The author said he wasn't able to debug it and just uses s3n:// until it
becomes a problem.

Has anyone encountered this when using exportSnapshot?
Can you please point me in the right direction?

Adrian

Re: exportSnapshot MISSING_FILES

Posted by Adrian Sandulescu <sa...@gmail.com>.

Would this question be better suited for the developer mailing list?

Adrian


On Tue, Oct 8, 2013 at 7:01 PM, Adrian Sandulescu <
sandulescu.adrian@gmail.com> wrote:

> Also, here are the files in S3:
> $ hadoop fs -ls
> s3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
> /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
> Found 1 items
> -rwxrwxrwx   1  741047906 2013-10-08 13:45
> s3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
> /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
>
> $ hadoop fs -ls
> s3://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
> /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
> Found 1 items
> -rwxrwxrwx   1  741047906 1970-01-01 00:00
> s3://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
> /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
>
> Thank you,
> Adrian
>
>
> On Tue, Oct 8, 2013 at 6:44 PM, Adrian Sandulescu <
> sandulescu.adrian@gmail.com> wrote:
>
>> Yes, I was just digging.
>>
>> From a successful s3n:// import
>>
>> 2013-10-08 14:57:04,816 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
>> 2013-10-08 14:57:04,965 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening 's3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1' for reading
>> 2013-10-08 14:57:05,039 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1' for reading at position '0'
>> 2013-10-08 14:57:05,299 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: Skip copy v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 to hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1, same file.
>> 2013-10-08 14:57:05,300 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy completed for input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
>>
>> From a failed s3:// import
>>
>> 2013-10-08 15:27:21,810 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
>> 2013-10-08 15:27:21,834 ERROR org.apache.hadoop.hbase.snapshot.ExportSnapshot: Unable to open source file=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1
>> java.io.IOException: No such file.
>> 	at org.apache.hadoop.fs.s3.S3FileSystem.checkFile(S3FileSystem.java:181)
>> 	at org.apache.hadoop.fs.s3.S3FileSystem.open(S3FileSystem.java:246)
>> 	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.tryOpen(FileLink.java:289)
>> 	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:120)
>> 	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:111)
>> 	at org.apache.hadoop.hbase.io.FileLink.open(FileLink.java:390)
>> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.openSourceFile(ExportSnapshot.java:302)
>> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:175)
>> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:146)
>> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:95)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>
>> Thank you,
>> Adrian
>>
>>
>> On Tue, Oct 8, 2013 at 5:56 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> bq. 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1
>>>
>>> Are you able to provide more context from job output ?
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu <
>>> sandulescu.adrian@gmail.com> wrote:
>>>
>>> > Hello everyone,
>>> >
>>> > I'm using this tool to export and "import" snapshots from S3:
>>> >
>>> >
>>> https://github.com/lospro7/snapshot-s3-util/blob/master/src/main/java/com/imgur/backup/SnapshotS3Util.java
>>> >
>>> > I'm using this tool because it seems like a better option than
>>> ExportTable,
>>> > considering there isn't another HDFS cluster on hand.
>>> >
>>> > It uses the following trick to make exportSnapshot "import" from S3 to
>>> the
>>> > local HDFS.
>>> >
>>> >             // Override dfs configuration to point to S3
>>> >             config.set("fs.default.name", s3protocol + accessKey + ":"
>>> > + accessSecret + "@" + bucketName);
>>> >             config.set("fs.defaultFS", s3protocol + accessKey + ":" +
>>> > accessSecret  + "@" + bucketName);
>>> >             config.set("fs.s3.awsAccessKeyId", accessKey);
>>> >             config.set("fs.s3.awsSecretAccessKey", accessSecret);
>>> >             config.set("hbase.tmp.dir", "/tmp/hbase-${user.name}");
>>> >             config.set("hbase.rootdir", s3Url);
>>> >
>>> >
>>> > Imports work great, but only when using the s3n:// protocol (which
>>> means
>>> > and HFile limit of 5GB).
>>> > When using the s3:// protocol, I get the following:
>>> > 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1
>>> >
>>> > The author said he wasn't able to debug it and just uses s3n:// until
>>> it
>>> > becomes a problem.
>>> >
>>> > Has anyone encountered this when using exportSnapshot?
>>> > Can you please point me in the right direction?
>>> >
>>> > Adrian
>>> >
>>>
>>
>>
>

Re: exportSnapshot MISSING_FILES

Posted by Adrian Sandulescu <sa...@gmail.com>.

Also, here are the files in S3:
$ hadoop fs -ls
s3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
Found 1 items
-rwxrwxrwx   1  741047906 2013-10-08 13:45
s3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1

$ hadoop fs -ls
s3://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
Found 1 items
-rwxrwxrwx   1  741047906 1970-01-01 00:00
s3://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export
/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1

Thank you,
Adrian


On Tue, Oct 8, 2013 at 6:44 PM, Adrian Sandulescu <
sandulescu.adrian@gmail.com> wrote:

> Yes, I was just digging.
>
> From a successful s3n:// import
>
> 2013-10-08 14:57:04,816 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
> 2013-10-08 14:57:04,965 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening 's3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1' for reading
> 2013-10-08 14:57:05,039 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key 'hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1' for reading at position '0'
> 2013-10-08 14:57:05,299 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: Skip copy v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 to hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1, same file.
> 2013-10-08 14:57:05,300 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy completed for input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
>
> From a failed s3:// import
>
> 2013-10-08 15:27:21,810 INFO org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
> 2013-10-08 15:27:21,834 ERROR org.apache.hadoop.hbase.snapshot.ExportSnapshot: Unable to open source file=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1
> java.io.IOException: No such file.
> 	at org.apache.hadoop.fs.s3.S3FileSystem.checkFile(S3FileSystem.java:181)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.open(S3FileSystem.java:246)
> 	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.tryOpen(FileLink.java:289)
> 	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:120)
> 	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:111)
> 	at org.apache.hadoop.hbase.io.FileLink.open(FileLink.java:390)
> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.openSourceFile(ExportSnapshot.java:302)
> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:175)
> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:146)
> 	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:95)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
> Thank you,
> Adrian
>
>
> On Tue, Oct 8, 2013 at 5:56 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1
>>
>> Are you able to provide more context from job output ?
>>
>> Thanks
>>
>>
>> On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu <
>> sandulescu.adrian@gmail.com> wrote:
>>
>> > Hello everyone,
>> >
>> > I'm using this tool to export and "import" snapshots from S3:
>> >
>> >
>> https://github.com/lospro7/snapshot-s3-util/blob/master/src/main/java/com/imgur/backup/SnapshotS3Util.java
>> >
>> > I'm using this tool because it seems like a better option than
>> ExportTable,
>> > considering there isn't another HDFS cluster on hand.
>> >
>> > It uses the following trick to make exportSnapshot "import" from S3 to
>> the
>> > local HDFS.
>> >
>> >             // Override dfs configuration to point to S3
>> >             config.set("fs.default.name", s3protocol + accessKey + ":"
>> > + accessSecret + "@" + bucketName);
>> >             config.set("fs.defaultFS", s3protocol + accessKey + ":" +
>> > accessSecret  + "@" + bucketName);
>> >             config.set("fs.s3.awsAccessKeyId", accessKey);
>> >             config.set("fs.s3.awsSecretAccessKey", accessSecret);
>> >             config.set("hbase.tmp.dir", "/tmp/hbase-${user.name}");
>> >             config.set("hbase.rootdir", s3Url);
>> >
>> >
>> > Imports work great, but only when using the s3n:// protocol (which means
>> > and HFile limit of 5GB).
>> > When using the s3:// protocol, I get the following:
>> > 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1
>> >
>> > The author said he wasn't able to debug it and just uses s3n:// until it
>> > becomes a problem.
>> >
>> > Has anyone encountered this when using exportSnapshot?
>> > Can you please point me in the right direction?
>> >
>> > Adrian
>> >
>>
>
>

Re: exportSnapshot MISSING_FILES

Posted by Adrian Sandulescu <sa...@gmail.com>.

Yes, I was just digging.

>From a successful s3n:// import

2013-10-08 14:57:04,816 INFO
org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file
input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1
output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
2013-10-08 14:57:04,965 INFO
org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening
's3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1'
for reading
2013-10-08 14:57:05,039 INFO
org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key
'hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1'
for reading at position '0'
2013-10-08 14:57:05,299 INFO
org.apache.hadoop.hbase.snapshot.ExportSnapshot: Skip copy
v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1
to hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1,
same file.
2013-10-08 14:57:05,300 INFO
org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy completed for
input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1
output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1

>From a failed s3:// import

2013-10-08 15:27:21,810 INFO
org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file
input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1
output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1
2013-10-08 15:27:21,834 ERROR
org.apache.hadoop.hbase.snapshot.ExportSnapshot: Unable to open source
file=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1
java.io.IOException: No such file.
	at org.apache.hadoop.fs.s3.S3FileSystem.checkFile(S3FileSystem.java:181)
	at org.apache.hadoop.fs.s3.S3FileSystem.open(S3FileSystem.java:246)
	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.tryOpen(FileLink.java:289)
	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:120)
	at org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:111)
	at org.apache.hadoop.hbase.io.FileLink.open(FileLink.java:390)
	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.openSourceFile(ExportSnapshot.java:302)
	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:175)
	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:146)
	at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:95)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

Thank you,
Adrian

On Tue, Oct 8, 2013 at 5:56 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1
>
> Are you able to provide more context from job output ?
>
> Thanks
>
>
> On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu <
> sandulescu.adrian@gmail.com> wrote:
>
> > Hello everyone,
> >
> > I'm using this tool to export and "import" snapshots from S3:
> >
> >
> https://github.com/lospro7/snapshot-s3-util/blob/master/src/main/java/com/imgur/backup/SnapshotS3Util.java
> >
> > I'm using this tool because it seems like a better option than
> ExportTable,
> > considering there isn't another HDFS cluster on hand.
> >
> > It uses the following trick to make exportSnapshot "import" from S3 to
> the
> > local HDFS.
> >
> >             // Override dfs configuration to point to S3
> >             config.set("fs.default.name", s3protocol + accessKey + ":"
> > + accessSecret + "@" + bucketName);
> >             config.set("fs.defaultFS", s3protocol + accessKey + ":" +
> > accessSecret  + "@" + bucketName);
> >             config.set("fs.s3.awsAccessKeyId", accessKey);
> >             config.set("fs.s3.awsSecretAccessKey", accessSecret);
> >             config.set("hbase.tmp.dir", "/tmp/hbase-${user.name}");
> >             config.set("hbase.rootdir", s3Url);
> >
> >
> > Imports work great, but only when using the s3n:// protocol (which means
> > and HFile limit of 5GB).
> > When using the s3:// protocol, I get the following:
> > 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1
> >
> > The author said he wasn't able to debug it and just uses s3n:// until it
> > becomes a problem.
> >
> > Has anyone encountered this when using exportSnapshot?
> > Can you please point me in the right direction?
> >
> > Adrian
> >
>

Re: exportSnapshot MISSING_FILES

Posted by Ted Yu <yu...@gmail.com>.

bq. 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1

Are you able to provide more context from job output ?

Thanks


On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu <
sandulescu.adrian@gmail.com> wrote:

> Hello everyone,
>
> I'm using this tool to export and "import" snapshots from S3:
>
> https://github.com/lospro7/snapshot-s3-util/blob/master/src/main/java/com/imgur/backup/SnapshotS3Util.java
>
> I'm using this tool because it seems like a better option than ExportTable,
> considering there isn't another HDFS cluster on hand.
>
> It uses the following trick to make exportSnapshot "import" from S3 to the
> local HDFS.
>
>             // Override dfs configuration to point to S3
>             config.set("fs.default.name", s3protocol + accessKey + ":"
> + accessSecret + "@" + bucketName);
>             config.set("fs.defaultFS", s3protocol + accessKey + ":" +
> accessSecret  + "@" + bucketName);
>             config.set("fs.s3.awsAccessKeyId", accessKey);
>             config.set("fs.s3.awsSecretAccessKey", accessSecret);
>             config.set("hbase.tmp.dir", "/tmp/hbase-${user.name}");
>             config.set("hbase.rootdir", s3Url);
>
>
> Imports work great, but only when using the s3n:// protocol (which means
> and HFile limit of 5GB).
> When using the s3:// protocol, I get the following:
> 13/10/08 13:32:01 INFO mapred.JobClient:     MISSING_FILES=1
>
> The author said he wasn't able to debug it and just uses s3n:// until it
> becomes a problem.
>
> Has anyone encountered this when using exportSnapshot?
> Can you please point me in the right direction?
>
> Adrian
>

Re: exportSnapshot MISSING_FILES

Posted by Nick Dimiduk <nd...@gmail.com>.

On Fri, Oct 11, 2013 at 5:27 AM, Adrian Sandulescu <
sandulescu.adrian@gmail.com> wrote:

> It's Apache Hadoop, but I see multi-part upload is in the works for this as
> well.
> https://issues.apache.org/jira/browse/HADOOP-9454
>

I didn't know about this ticket. That's a very good thing to have in
Apache. In the mean time, does EMR run a hadoop version close to yours? If
they do, and you're feeling brave enough, you can try a franken-build. Grab
the appropriate hadoop jar from an EMR deployment and deploy it on your
cluster. I dunno what else this might pull in though.

Another solution would be to limit the size of the HFiles to 5GB, but I
> don't know yet what effect this would have on cluster performance.
>

If that's you're only option, that's you're only option. Once the snapshot
is hydrated from S3, you can always compact your table to clean things up.

Good luck!
-n

Re: exportSnapshot MISSING_FILES

Posted by Adrian Sandulescu <sa...@gmail.com>.

It's Apache Hadoop, but I see multi-part upload is in the works for this as
well.
https://issues.apache.org/jira/browse/HADOOP-9454

Another solution would be to limit the size of the HFiles to 5GB, but I
don't know yet what effect this would have on cluster performance.

On Thu, Oct 10, 2013 at 7:32 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu <
> sandulescu.adrian@gmail.com> wrote:
>
> > Imports work great, but only when using the s3n:// protocol (which means
> > and HFile limit of 5GB).
> >
>
> Are you using Apache Hadoop or an EMR build? From what I recall, EMR ships
> a customized s3n implementation that uses multi-part upload feature of S3
> to chunk files and bypass the old 5GB limitation. You might give that shot.
>

Re: exportSnapshot MISSING_FILES

Posted by Nick Dimiduk <nd...@gmail.com>.

On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu <
sandulescu.adrian@gmail.com> wrote:

> Imports work great, but only when using the s3n:// protocol (which means
> and HFile limit of 5GB).
>

Are you using Apache Hadoop or an EMR build? From what I recall, EMR ships
a customized s3n implementation that uses multi-part upload feature of S3
to chunk files and bypass the old 5GB limitation. You might give that shot.