You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Yair Even-Zohar <ya...@revenuescience.com> on 2009/02/03 11:44:26 UTC

writing map output to local fileSystem

I'm trying to run a map and output data to local filesystem on EC2 and
run into some problems.

prior to hadoop/hbase 19 I was using

 

    RawLocalFileSystem rlfs = new RawLocalFileSystem();

    Path path = new Path("file:///directory");

    rlfs.setWorkingDirectory(path);

    FileOutputFormat.setOutputPath(c, rlfs.getWorkingDirectory());

 

All I'm getting now under the blah directory is just the _logs directory
and the other output files are not there.

If I write to HDFS all the required data is in place.

 

Any idea?

 

Thanks

-Yair

Re: writing map output to local fileSystem

Posted by Yabo-Arber Xu <ar...@gmail.com>.

Thanks, ryan. I will reconsider my design and explore the possibility of
what you suggested.

PS: sorry for mixing my message into this thread.

Best,
Arber

On Wed, Feb 4, 2009 at 5:35 AM, Ryan Rawson <ry...@gmail.com> wrote:

> You got it - you can't "update" the row key without a
> reading/updating/delete/insert cycle.  This is because the newly inserted
> row might not live on the same region server anymore.
>
> It would probably be better to have a schema that avoids needing it's
> primary key updated.
>
> One common design pattern is to use a system like protobufs or thrift
> serialization to store structured binary data in the hbase cells, thus
> upping the complexity of what you may store in a hbase row. With some
> clever
> redesign you may discover you can avoid the primary key value update.
>
> Good luck!
> -ryan
>
> On Tue, Feb 3, 2009 at 6:23 AM, Yabo-Arber Xu <arber.research@gmail.com
> >wrote:
>
> > Hi there,
> >
> > I have this usage scenario on HBase, and wonder what is the most
> efficient
> > way of doing this:
> >
> > I use each row to represent some cluster, and the rowkey is sort of the
> > center of the cluster. So every time I add an element into a cluster, i
> > need
> > to update the rowkey ( also some minor additional updates to certain
> > columns).
> >
> > The best way I know is to read this whole row out, remove it from hbase,
> > and
> > insert the same row with new rowkey, but this appears not to be that
> > efficient. Any thoughts?
> >
> > Thanks for your input!
> >
> > Best,
> > Arber
> >
>



-- 
Yabo-Arber Xu, Ph.D
VP Engineering,   Summba Inc.
Web: www.yabo-x.com

Re: writing map output to local fileSystem

Posted by Ryan Rawson <ry...@gmail.com>.

You got it - you can't "update" the row key without a
reading/updating/delete/insert cycle.  This is because the newly inserted
row might not live on the same region server anymore.

It would probably be better to have a schema that avoids needing it's
primary key updated.

One common design pattern is to use a system like protobufs or thrift
serialization to store structured binary data in the hbase cells, thus
upping the complexity of what you may store in a hbase row. With some clever
redesign you may discover you can avoid the primary key value update.

Good luck!
-ryan

On Tue, Feb 3, 2009 at 6:23 AM, Yabo-Arber Xu <ar...@gmail.com>wrote:

> Hi there,
>
> I have this usage scenario on HBase, and wonder what is the most efficient
> way of doing this:
>
> I use each row to represent some cluster, and the rowkey is sort of the
> center of the cluster. So every time I add an element into a cluster, i
> need
> to update the rowkey ( also some minor additional updates to certain
> columns).
>
> The best way I know is to read this whole row out, remove it from hbase,
> and
> insert the same row with new rowkey, but this appears not to be that
> efficient. Any thoughts?
>
> Thanks for your input!
>
> Best,
> Arber
>

Re: writing map output to local fileSystem

Posted by Yabo-Arber Xu <ar...@gmail.com>.

Hi there,

I have this usage scenario on HBase, and wonder what is the most efficient
way of doing this:

I use each row to represent some cluster, and the rowkey is sort of the
center of the cluster. So every time I add an element into a cluster, i need
to update the rowkey ( also some minor additional updates to certain
columns).

The best way I know is to read this whole row out, remove it from hbase, and
insert the same row with new rowkey, but this appears not to be that
efficient. Any thoughts?

Thanks for your input!

Best,
Arber

Re: writing map output to local fileSystem

Posted by Ryan Rawson <ry...@gmail.com>.

Hi there,

This is the hbase user list. Hbase is a google bigtable inspired project.
You want the hadoop-user list. Try hadoop's website.

Good luck!

On Feb 3, 2009 2:42 AM, "Yair Even-Zohar" <ya...@revenuescience.com> wrote:

I'm trying to run a map and output data to local filesystem on EC2 and
run into some problems.

prior to hadoop/hbase 19 I was using



   RawLocalFileSystem rlfs = new RawLocalFileSystem();

   Path path = new Path("file:///directory");

   rlfs.setWorkingDirectory(path);

   FileOutputFormat.setOutputPath(c, rlfs.getWorkingDirectory());



All I'm getting now under the blah directory is just the _logs directory
and the other output files are not there.

If I write to HDFS all the required data is in place.



Any idea?



Thanks

-Yair