You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Laura Biester <la...@pinterest.com.INVALID> on 2017/05/12 18:40:05 UTC

TableMapReduceUtil.initTableSnapshotMapperJob upgrade from HBase .94 to HBase 1.2.1

Hi everyone,

We are currently working on upgrading from Hbase 0.94 to HBase 1.2. We use
TableMapReduceUtil.initTableSnapshotMapperJob to read snapshots that are
stored on s3 in a few hadoop jobs.

I am working on upgrading the jobs for the new snapshots to use the HBase
1.2 jar, and this is causing a few problems. In particular, I am having
problems with what seems like new code, saying that the restoreDir and
rootDir are in different filesystems. This is the code:

    if(!restoreDir.getFileSystem(conf).getUri().equals(rootDir.getFileSystem(conf).getUri()))
{
      throw new IllegalArgumentException("Filesystems for restore directory
and HBase root directory should be the same");
    } else if(restoreDir.toUri().getPath().startsWith(rootDir.toUri().getPath()))
{
      throw new IllegalArgumentException("Restore directory cannot be a sub
directory of HBase root directory. RootDir: " + rootDir + ", restoreDir: "
+ restoreDir);

And this is the exception:

Exception in thread "main" java.lang.IllegalArgumentException: Filesystems
for restore directory and HBase root directory should be the same
        at
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:716)
        at
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(TableSnapshotInputFormatImpl.java:403)
        at
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:205)
        at
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:365)

Our restoreDir is on the hadoop cluster in HDFS, and the rootDir is on s3,
so the first exception is thrown. I also tried setting the restoreDir to be
on s3, but that caused another exception.

Exception in thread "main" java.io.IOException:
java.util.concurrent.ExecutionException:
java.lang.IllegalArgumentException: Wrong FS: s3n://..., expected:
hdfs://...

We didn't see this problem at all in the old jobs that read HBase .94
snapshots with the 94 jar, where the restoreDir was on hdfs and the rootDir
was on s3. All of the paths have remained unchanged. I noticed that the
docs for the last argument changed slightly,

tableRootDir - The directory where the temp table will be created
<https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTableSnapshotMapperJob(java.lang.String,%20org.apache.hadoop.hbase.client.Scan,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapreduce.Job,%20boolean,%20org.apache.hadoop.fs.Path)>
to
tmpRestoreDir - a temporary directory to copy the snapshot files into.
Current user should have write permissions to this directory, and this
should not be a subdirectory of rootdir. After the job is finished, restore
directory can be deleted.
<https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html>
*What exactly has changed? The snapshots will continue to be stored in s3.
What can we do to make it so that they can be read by this method?*

Laura

Re: TableMapReduceUtil.initTableSnapshotMapperJob upgrade from HBase .94 to HBase 1.2.1

Posted by Laura Biester <la...@pinterest.com.INVALID>.

Yes, here's the stacktrace <https://pastebin.com/NkeMgpMu>. I removed some
pinterest-specifc code and paths from the stack but it should have all of
the necessary debugging information.

On Fri, May 12, 2017 at 2:06 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. java.lang.IllegalArgumentException: Wrong FS: s3n://..., expected:
> hdfs://...
>
> Can you pastebin the full stack trace ?
>
> The tmpRestoreDir parameter you referenced seems to come from this JIRA:
>
> HBASE-8369 MapReduce over snapshot files
>
> On Fri, May 12, 2017 at 11:40 AM, Laura Biester <
> laurabiester@pinterest.com.invalid> wrote:
>
> > Hi everyone,
> >
> > We are currently working on upgrading from Hbase 0.94 to HBase 1.2. We
> use
> > TableMapReduceUtil.initTableSnapshotMapperJob to read snapshots that are
> > stored on s3 in a few hadoop jobs.
> >
> > I am working on upgrading the jobs for the new snapshots to use the HBase
> > 1.2 jar, and this is causing a few problems. In particular, I am having
> > problems with what seems like new code, saying that the restoreDir and
> > rootDir are in different filesystems. This is the code:
> >
> >     if(!restoreDir.getFileSystem(conf).getUri().equals(rootDir.
> > getFileSystem(conf).getUri()))
> > {
> >       throw new IllegalArgumentException("Filesystems for restore
> > directory
> > and HBase root directory should be the same");
> >     } else if(restoreDir.toUri().getPath().startsWith(rootDir.toUri().
> > getPath()))
> > {
> >       throw new IllegalArgumentException("Restore directory cannot be a
> > sub
> > directory of HBase root directory. RootDir: " + rootDir + ", restoreDir:
> "
> > + restoreDir);
> >
> > And this is the exception:
> >
> > Exception in thread "main" java.lang.IllegalArgumentException:
> Filesystems
> > for restore directory and HBase root directory should be the same
> >         at
> > org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.
> > copySnapshotForScanner(RestoreSnapshotHelper.java:716)
> >         at
> > org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(
> > TableSnapshotInputFormatImpl.java:403)
> >         at
> > org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(
> > TableSnapshotInputFormat.java:205)
> >         at
> > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.
> > initTableSnapshotMapperJob(TableMapReduceUtil.java:365)
> >
> > Our restoreDir is on the hadoop cluster in HDFS, and the rootDir is on
> s3,
> > so the first exception is thrown. I also tried setting the restoreDir to
> be
> > on s3, but that caused another exception.
> >
> > Exception in thread "main" java.io.IOException:
> > java.util.concurrent.ExecutionException:
> > java.lang.IllegalArgumentException: Wrong FS: s3n://..., expected:
> > hdfs://...
> >
> > We didn't see this problem at all in the old jobs that read HBase .94
> > snapshots with the 94 jar, where the restoreDir was on hdfs and the
> rootDir
> > was on s3. All of the paths have remained unchanged. I noticed that the
> > docs for the last argument changed slightly,
> >
> > tableRootDir - The directory where the temp table will be created
> > <https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/
> hbase/mapreduce/
> > TableMapReduceUtil.html#initTableSnapshotMapperJob(
> > java.lang.String,%20org.apache.hadoop.hbase.client.
> > Scan,%20java.lang.Class,%20java.lang.Class,%20java.
> > lang.Class,%20org.apache.hadoop.mapreduce.Job,%20boolean,%20org.apache.
> > hadoop.fs.Path)>
> > to
> > tmpRestoreDir - a temporary directory to copy the snapshot files into.
> > Current user should have write permissions to this directory, and this
> > should not be a subdirectory of rootdir. After the job is finished,
> restore
> > directory can be deleted.
> > <https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/mapreduce/
> > TableMapReduceUtil.html>
> > *What exactly has changed? The snapshots will continue to be stored in
> s3.
> > What can we do to make it so that they can be read by this method?*
> >
> > Laura
> >
>

Re: TableMapReduceUtil.initTableSnapshotMapperJob upgrade from HBase .94 to HBase 1.2.1

Posted by Ted Yu <yu...@gmail.com>.

bq. java.lang.IllegalArgumentException: Wrong FS: s3n://..., expected:
hdfs://...

Can you pastebin the full stack trace ?

The tmpRestoreDir parameter you referenced seems to come from this JIRA:

HBASE-8369 MapReduce over snapshot files

On Fri, May 12, 2017 at 11:40 AM, Laura Biester <
laurabiester@pinterest.com.invalid> wrote:

> Hi everyone,
>
> We are currently working on upgrading from Hbase 0.94 to HBase 1.2. We use
> TableMapReduceUtil.initTableSnapshotMapperJob to read snapshots that are
> stored on s3 in a few hadoop jobs.
>
> I am working on upgrading the jobs for the new snapshots to use the HBase
> 1.2 jar, and this is causing a few problems. In particular, I am having
> problems with what seems like new code, saying that the restoreDir and
> rootDir are in different filesystems. This is the code:
>
>     if(!restoreDir.getFileSystem(conf).getUri().equals(rootDir.
> getFileSystem(conf).getUri()))
> {
>       throw new IllegalArgumentException("Filesystems for restore
> directory
> and HBase root directory should be the same");
>     } else if(restoreDir.toUri().getPath().startsWith(rootDir.toUri().
> getPath()))
> {
>       throw new IllegalArgumentException("Restore directory cannot be a
> sub
> directory of HBase root directory. RootDir: " + rootDir + ", restoreDir: "
> + restoreDir);
>
> And this is the exception:
>
> Exception in thread "main" java.lang.IllegalArgumentException: Filesystems
> for restore directory and HBase root directory should be the same
>         at
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.
> copySnapshotForScanner(RestoreSnapshotHelper.java:716)
>         at
> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(
> TableSnapshotInputFormatImpl.java:403)
>         at
> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(
> TableSnapshotInputFormat.java:205)
>         at
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.
> initTableSnapshotMapperJob(TableMapReduceUtil.java:365)
>
> Our restoreDir is on the hadoop cluster in HDFS, and the rootDir is on s3,
> so the first exception is thrown. I also tried setting the restoreDir to be
> on s3, but that caused another exception.
>
> Exception in thread "main" java.io.IOException:
> java.util.concurrent.ExecutionException:
> java.lang.IllegalArgumentException: Wrong FS: s3n://..., expected:
> hdfs://...
>
> We didn't see this problem at all in the old jobs that read HBase .94
> snapshots with the 94 jar, where the restoreDir was on hdfs and the rootDir
> was on s3. All of the paths have remained unchanged. I noticed that the
> docs for the last argument changed slightly,
>
> tableRootDir - The directory where the temp table will be created
> <https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/mapreduce/
> TableMapReduceUtil.html#initTableSnapshotMapperJob(
> java.lang.String,%20org.apache.hadoop.hbase.client.
> Scan,%20java.lang.Class,%20java.lang.Class,%20java.
> lang.Class,%20org.apache.hadoop.mapreduce.Job,%20boolean,%20org.apache.
> hadoop.fs.Path)>
> to
> tmpRestoreDir - a temporary directory to copy the snapshot files into.
> Current user should have write permissions to this directory, and this
> should not be a subdirectory of rootdir. After the job is finished, restore
> directory can be deleted.
> <https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/mapreduce/
> TableMapReduceUtil.html>
> *What exactly has changed? The snapshots will continue to be stored in s3.
> What can we do to make it so that they can be read by this method?*
>
> Laura
>