You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vishnu Amdiyala <vi...@gmail.com> on 2016/02/05 03:34:21 UTC

Regd: ExportSnapshot Tool (export Hfiles in parts)

Hi,

I am trying to back up snapshots of Hbase table to S3 bucket of which each
Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation.  The
export snapshot's source says that the mappers are set to max of  total
number of files. Is there a way to use this tool to split files and upload
to S3 in parts?


Thanks!
Vishnu

Re: Regd: ExportSnapshot Tool (export Hfiles in parts)

Posted by Vishnu Amdiyala <vi...@gmail.com>.
I got it to work on CDH 5.3 but not on CDH 4.6. Any different approaches to
make this work on CDH 4.6?

https://issues.apache.org/jira/browse/HADOOP-9454  this was not fixed for
4.6 I think.

Thanks!
Vishnu

On Fri, Feb 5, 2016 at 9:24 AM, Matteo Bertozzi <th...@gmail.com>
wrote:

> you just have to add the configuration property to enable the multipart in
> the -site.xml
> you can probably just pass it as -Dfs.s3n.multipart.uploads.enabled=true
> there is nothing to change in the tool. it is just a configuration option
> to enable. the s3n connector will pickup the configuration and use the
> multipart allowing you to transfer a single file > 5G
>
> Matteo
>
>
> On Fri, Feb 5, 2016 at 9:16 AM, Vishnu Amdiyala <vi...@gmail.com>
> wrote:
>
> > I understand how multi part upload works when the file>5GB resides on
> HDFS
> > but I am doing something like this:
> >
> > /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> > table_2_2016020504 -copy-to s3n://${bucketname}
> >  which triggers a MR job to put data into the bucket which is failing
> since
> > each Hfile>5GB from the table. Do I have to re-write the snapshot tool to
> > make use of the multipart upload API?
> >
> > Thanks!
> > Vishnu
> >
> > On Thu, Feb 4, 2016 at 7:54 PM, Matteo Bertozzi <theo.bertozzi@gmail.com
> >
> > wrote:
> >
> > > there is nothing to split files in Export Snapshot because you don't
> need
> > > it.
> > >
> > > take a look at
> > > http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
> > > "With a single PUT operation you can upload objects up to 5 GB in size"
> > > "Using the Multipart upload API you can upload large objects, up to 5
> > TB."
> > >
> > > you just have to configure the s3 connector to use multipart.
> > > and you'll be able to upload files > 5G
> > >
> > > Matteo
> > >
> > >
> > > On Thu, Feb 4, 2016 at 7:50 PM, Vishnu Amdiyala <
> > vishnuamdiyala@gmail.com>
> > > wrote:
> > >
> > > > Thank you guys for the quick response. My question is how do I
> generate
> > > > part files out of these Hfiles to upload to S3 ? Export Snapshot tool
> > > which
> > > > I use doesn't allow more mappers than the number of files[correct me
> > if I
> > > > am wrong]. So, how will I be able to generate splits out of the each
> > bulk
> > > > file>5GB?
> > > >
> > > >
> > > > On Thu, Feb 4, 2016 at 7:14 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > Vishnu:
> > > > > Please take a look
> > > > > at
> > > >
> hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
> > > > > for multipart related config parameters (other than the one
> mentioned
> > > by
> > > > > Matteo):
> > > > >
> > > > > fs.s3n.multipart.uploads.block.size
> > > > > fs.s3n.multipart.copy.block.size
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Thu, Feb 4, 2016 at 7:00 PM, Matteo Bertozzi <
> > > theo.bertozzi@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > the multipart upload is on the s3 connector.
> > > > > > you can tune your connector, to use multipart
> > > > > > fs.s3n.multipart.uploads.enabled = true
> > > > > >
> > > > > > Matteo
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala <
> > > > > vishnuamdiyala@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am trying to back up snapshots of Hbase table to S3 bucket of
> > > which
> > > > > > each
> > > > > > > Hfile is sized>5GB which fails due to S3 bucket's 5gb
> limitation.
> > > > The
> > > > > > > export snapshot's source says that the mappers are set to max
> of
> > > > total
> > > > > > > number of files. Is there a way to use this tool to split files
> > and
> > > > > > upload
> > > > > > > to S3 in parts?
> > > > > > >
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Vishnu
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Regd: ExportSnapshot Tool (export Hfiles in parts)

Posted by Matteo Bertozzi <th...@gmail.com>.
you just have to add the configuration property to enable the multipart in
the -site.xml
you can probably just pass it as -Dfs.s3n.multipart.uploads.enabled=true
there is nothing to change in the tool. it is just a configuration option
to enable. the s3n connector will pickup the configuration and use the
multipart allowing you to transfer a single file > 5G

Matteo


On Fri, Feb 5, 2016 at 9:16 AM, Vishnu Amdiyala <vi...@gmail.com>
wrote:

> I understand how multi part upload works when the file>5GB resides on HDFS
> but I am doing something like this:
>
> /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> table_2_2016020504 -copy-to s3n://${bucketname}
>  which triggers a MR job to put data into the bucket which is failing since
> each Hfile>5GB from the table. Do I have to re-write the snapshot tool to
> make use of the multipart upload API?
>
> Thanks!
> Vishnu
>
> On Thu, Feb 4, 2016 at 7:54 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > there is nothing to split files in Export Snapshot because you don't need
> > it.
> >
> > take a look at
> > http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
> > "With a single PUT operation you can upload objects up to 5 GB in size"
> > "Using the Multipart upload API you can upload large objects, up to 5
> TB."
> >
> > you just have to configure the s3 connector to use multipart.
> > and you'll be able to upload files > 5G
> >
> > Matteo
> >
> >
> > On Thu, Feb 4, 2016 at 7:50 PM, Vishnu Amdiyala <
> vishnuamdiyala@gmail.com>
> > wrote:
> >
> > > Thank you guys for the quick response. My question is how do I generate
> > > part files out of these Hfiles to upload to S3 ? Export Snapshot tool
> > which
> > > I use doesn't allow more mappers than the number of files[correct me
> if I
> > > am wrong]. So, how will I be able to generate splits out of the each
> bulk
> > > file>5GB?
> > >
> > >
> > > On Thu, Feb 4, 2016 at 7:14 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Vishnu:
> > > > Please take a look
> > > > at
> > > hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
> > > > for multipart related config parameters (other than the one mentioned
> > by
> > > > Matteo):
> > > >
> > > > fs.s3n.multipart.uploads.block.size
> > > > fs.s3n.multipart.copy.block.size
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Feb 4, 2016 at 7:00 PM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > the multipart upload is on the s3 connector.
> > > > > you can tune your connector, to use multipart
> > > > > fs.s3n.multipart.uploads.enabled = true
> > > > >
> > > > > Matteo
> > > > >
> > > > >
> > > > > On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala <
> > > > vishnuamdiyala@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to back up snapshots of Hbase table to S3 bucket of
> > which
> > > > > each
> > > > > > Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation.
> > > The
> > > > > > export snapshot's source says that the mappers are set to max of
> > > total
> > > > > > number of files. Is there a way to use this tool to split files
> and
> > > > > upload
> > > > > > to S3 in parts?
> > > > > >
> > > > > >
> > > > > > Thanks!
> > > > > > Vishnu
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Regd: ExportSnapshot Tool (export Hfiles in parts)

Posted by Vishnu Amdiyala <vi...@gmail.com>.
I understand how multi part upload works when the file>5GB resides on HDFS
but I am doing something like this:

/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
table_2_2016020504 -copy-to s3n://${bucketname}
 which triggers a MR job to put data into the bucket which is failing since
each Hfile>5GB from the table. Do I have to re-write the snapshot tool to
make use of the multipart upload API?

Thanks!
Vishnu

On Thu, Feb 4, 2016 at 7:54 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> there is nothing to split files in Export Snapshot because you don't need
> it.
>
> take a look at
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
> "With a single PUT operation you can upload objects up to 5 GB in size"
> "Using the Multipart upload API you can upload large objects, up to 5 TB."
>
> you just have to configure the s3 connector to use multipart.
> and you'll be able to upload files > 5G
>
> Matteo
>
>
> On Thu, Feb 4, 2016 at 7:50 PM, Vishnu Amdiyala <vi...@gmail.com>
> wrote:
>
> > Thank you guys for the quick response. My question is how do I generate
> > part files out of these Hfiles to upload to S3 ? Export Snapshot tool
> which
> > I use doesn't allow more mappers than the number of files[correct me if I
> > am wrong]. So, how will I be able to generate splits out of the each bulk
> > file>5GB?
> >
> >
> > On Thu, Feb 4, 2016 at 7:14 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Vishnu:
> > > Please take a look
> > > at
> > hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
> > > for multipart related config parameters (other than the one mentioned
> by
> > > Matteo):
> > >
> > > fs.s3n.multipart.uploads.block.size
> > > fs.s3n.multipart.copy.block.size
> > >
> > > Cheers
> > >
> > > On Thu, Feb 4, 2016 at 7:00 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com
> > >
> > > wrote:
> > >
> > > > the multipart upload is on the s3 connector.
> > > > you can tune your connector, to use multipart
> > > > fs.s3n.multipart.uploads.enabled = true
> > > >
> > > > Matteo
> > > >
> > > >
> > > > On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala <
> > > vishnuamdiyala@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to back up snapshots of Hbase table to S3 bucket of
> which
> > > > each
> > > > > Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation.
> > The
> > > > > export snapshot's source says that the mappers are set to max of
> > total
> > > > > number of files. Is there a way to use this tool to split files and
> > > > upload
> > > > > to S3 in parts?
> > > > >
> > > > >
> > > > > Thanks!
> > > > > Vishnu
> > > > >
> > > >
> > >
> >
>

Re: Regd: ExportSnapshot Tool (export Hfiles in parts)

Posted by Matteo Bertozzi <th...@gmail.com>.
there is nothing to split files in Export Snapshot because you don't need
it.

take a look at
http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
"With a single PUT operation you can upload objects up to 5 GB in size"
"Using the Multipart upload API you can upload large objects, up to 5 TB."

you just have to configure the s3 connector to use multipart.
and you'll be able to upload files > 5G

Matteo


On Thu, Feb 4, 2016 at 7:50 PM, Vishnu Amdiyala <vi...@gmail.com>
wrote:

> Thank you guys for the quick response. My question is how do I generate
> part files out of these Hfiles to upload to S3 ? Export Snapshot tool which
> I use doesn't allow more mappers than the number of files[correct me if I
> am wrong]. So, how will I be able to generate splits out of the each bulk
> file>5GB?
>
>
> On Thu, Feb 4, 2016 at 7:14 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Vishnu:
> > Please take a look
> > at
> hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
> > for multipart related config parameters (other than the one mentioned by
> > Matteo):
> >
> > fs.s3n.multipart.uploads.block.size
> > fs.s3n.multipart.copy.block.size
> >
> > Cheers
> >
> > On Thu, Feb 4, 2016 at 7:00 PM, Matteo Bertozzi <theo.bertozzi@gmail.com
> >
> > wrote:
> >
> > > the multipart upload is on the s3 connector.
> > > you can tune your connector, to use multipart
> > > fs.s3n.multipart.uploads.enabled = true
> > >
> > > Matteo
> > >
> > >
> > > On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala <
> > vishnuamdiyala@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to back up snapshots of Hbase table to S3 bucket of which
> > > each
> > > > Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation.
> The
> > > > export snapshot's source says that the mappers are set to max of
> total
> > > > number of files. Is there a way to use this tool to split files and
> > > upload
> > > > to S3 in parts?
> > > >
> > > >
> > > > Thanks!
> > > > Vishnu
> > > >
> > >
> >
>

Re: Regd: ExportSnapshot Tool (export Hfiles in parts)

Posted by Vishnu Amdiyala <vi...@gmail.com>.
Thank you guys for the quick response. My question is how do I generate
part files out of these Hfiles to upload to S3 ? Export Snapshot tool which
I use doesn't allow more mappers than the number of files[correct me if I
am wrong]. So, how will I be able to generate splits out of the each bulk
file>5GB?


On Thu, Feb 4, 2016 at 7:14 PM, Ted Yu <yu...@gmail.com> wrote:

> Vishnu:
> Please take a look
> at hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
> for multipart related config parameters (other than the one mentioned by
> Matteo):
>
> fs.s3n.multipart.uploads.block.size
> fs.s3n.multipart.copy.block.size
>
> Cheers
>
> On Thu, Feb 4, 2016 at 7:00 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > the multipart upload is on the s3 connector.
> > you can tune your connector, to use multipart
> > fs.s3n.multipart.uploads.enabled = true
> >
> > Matteo
> >
> >
> > On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala <
> vishnuamdiyala@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying to back up snapshots of Hbase table to S3 bucket of which
> > each
> > > Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation.  The
> > > export snapshot's source says that the mappers are set to max of  total
> > > number of files. Is there a way to use this tool to split files and
> > upload
> > > to S3 in parts?
> > >
> > >
> > > Thanks!
> > > Vishnu
> > >
> >
>

Re: Regd: ExportSnapshot Tool (export Hfiles in parts)

Posted by Ted Yu <yu...@gmail.com>.
Vishnu:
Please take a look
at hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
for multipart related config parameters (other than the one mentioned by
Matteo):

fs.s3n.multipart.uploads.block.size
fs.s3n.multipart.copy.block.size

Cheers

On Thu, Feb 4, 2016 at 7:00 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> the multipart upload is on the s3 connector.
> you can tune your connector, to use multipart
> fs.s3n.multipart.uploads.enabled = true
>
> Matteo
>
>
> On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala <vi...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am trying to back up snapshots of Hbase table to S3 bucket of which
> each
> > Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation.  The
> > export snapshot's source says that the mappers are set to max of  total
> > number of files. Is there a way to use this tool to split files and
> upload
> > to S3 in parts?
> >
> >
> > Thanks!
> > Vishnu
> >
>

Re: Regd: ExportSnapshot Tool (export Hfiles in parts)

Posted by Matteo Bertozzi <th...@gmail.com>.
the multipart upload is on the s3 connector.
you can tune your connector, to use multipart
fs.s3n.multipart.uploads.enabled = true

Matteo


On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala <vi...@gmail.com>
wrote:

> Hi,
>
> I am trying to back up snapshots of Hbase table to S3 bucket of which each
> Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation.  The
> export snapshot's source says that the mappers are set to max of  total
> number of files. Is there a way to use this tool to split files and upload
> to S3 in parts?
>
>
> Thanks!
> Vishnu
>