You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Andrew Purtell <ap...@apache.org> on 2009/06/20 21:11:04 UTC

who's doing what for 0.21?

http://tinyurl.com/m7nt72

I have an interest in these:

  https://issues.apache.org/jira/browse/HBASE-1015
  https://issues.apache.org/jira/browse/HBASE-1295
  https://issues.apache.org/jira/browse/HBASE-1556

I think for 1015 and 1295, there is interest on the part of at least myself,
dj_ryan, and jgray. dj_ryan was saying something about su executives making
1295 a priority for him. We should figure out how to divide up and assign
out the work. 

Also, probably I'll end up taking on the grunt work of 1556, because it 
needs to be done. 

Have we set a time and place for the next dev meeting? 

   - Andy

Re: who's doing what for 0.21?

Posted by Andrew Purtell <ap...@apache.org>.

I'll keep Tuesday the 30th free. 

  - Andy

________________________________
From: stack <st...@duboce.net>
To: hbase-dev@hadoop.apache.org
Sent: Monday, June 22, 2009 4:54:47 PM
Subject: Re: who's doing what for 0.21?

On Sat, Jun 20, 2009 at 12:11 PM, Andrew Purtell <ap...@apache.org>wrote:

>
> Have we set a time and place for the next dev meeting?

We've been talking about meeting next Tuesday, the 30th.  We were talking
about having a HUG rolling out 0.20.0.  Maybe should just be a dev. checkin
meeting figuring whats next?
St.Ack

Re: who's doing what for 0.21?

Posted by stack <st...@duboce.net>.

On Sat, Jun 20, 2009 at 12:11 PM, Andrew Purtell <ap...@apache.org>wrote:

>
> Have we set a time and place for the next dev meeting?

We've been talking about meeting next Tuesday, the 30th.  We were talking
about having a HUG rolling out 0.20.0.  Maybe should just be a dev. checkin
meeting figuring whats next?
St.Ack

Re: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Oh and what I forgot to say was using Thrift features (like the different
transport types) that aren't compatible in PHP is not a problem from my
perspective.  I would rcommend to all php users to use the thrift gateways
anyways, so i dont think we need a php-native library (yet?).

On Sat, Jun 20, 2009 at 4:54 PM, Ryan Rawson <ry...@gmail.com> wrote:

> One thing to consider, is that at SU, I keep on planning to use the Thrift
> gateway.  PHP processes are short lived, the deployment scenario is really
> easy (php just needs to know some IP addresses, unlike the zoo
> configuration, etc, etc), and ultimately a longer lived process caches the
> META/ROOT finding more effectively overall.
>
> -ryan
>
>
> On Sat, Jun 20, 2009 at 4:08 PM, Andrew Purtell <ap...@apache.org>wrote:
>
>> I my opinion, we should not bother to wait for Avro. I've been hearing
>> about it on and off for three months now. If it is ready the day we
>> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
>> we should just use Thrift or pbufs. Thrift may be preferable as its
>> compact binary protocol is competitive with pbufs plus it has a fully
>> implemented async rpc stack. I think this applies for both 1015 and
>> 1295. Also I'm skeptical that something to supplant RMI won't have
>> overheads related to that we don't need, e.g. transmitting class and
>> method names as strings, etc.
>>
>>   - Andy
>>
>>
>>
>>
>> ________________________________
>> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
>> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
>> Sent: Saturday, June 20, 2009 3:26:58 PM
>> Subject: RE: who's doing what for 0.21?
>>
>> I am also interested in 1295 (I have quite a bit of experience
>> with cross data center replication), but more interested in
>> getting more of the master into zookeeper.
>>
>> As for 1556, I might wait a bit. At the Cloudera off-site, one
>> of the things talked about was doing something similar for
>> Hadoop which we might leverage.
>>
>> What really needs to get done around builds is when you mark
>> a Jira as patch available, we should do a patch build and test
>> like Hadoop does. Noone has had time to do it to date, but if
>> you are taking on the build, that would be a "nice to have".
>>
>> For 1015, should you wait for Avro?
>>
>> And if you missed it, here are the notes from the Cloudera off-site:
>> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>>
>> ---
>> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>>
>>
>> > -----Original Message-----
>> > From: Andrew Purtell [mailto:apurtell@apache.org]
>> > Sent: Saturday, June 20, 2009 12:11 PM
>> > To: hbase-dev@hadoop.apache.org
>> > Subject: who's doing what for 0.21?
>> >
>> > http://tinyurl.com/m7nt72
>> >
>> > I have an interest in these:
>> >
>> >  https://issues.apache.org/jira/browse/HBASE-1015
>> >  https://issues.apache.org/jira/browse/HBASE-1295
>> >  https://issues.apache.org/jira/browse/HBASE-1556
>> >
>> > I think for 1015 and 1295, there is interest on the part of at least
>> > myself,
>> > dj_ryan, and jgray. dj_ryan was saying something about su executives
>> > making
>> > 1295 a priority for him. We should figure out how to divide up and
>> > assign
>> > out the work.
>> >
>> > Also, probably I'll end up taking on the grunt work of 1556, because
>> > it
>> > needs to be done.
>> >
>> > Have we set a time and place for the next dev meeting?
>> >
>> >    - Andy
>> >
>>
>>
>>
>>
>
>

Re: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

One thing to consider, is that at SU, I keep on planning to use the Thrift
gateway.  PHP processes are short lived, the deployment scenario is really
easy (php just needs to know some IP addresses, unlike the zoo
configuration, etc, etc), and ultimately a longer lived process caches the
META/ROOT finding more effectively overall.

-ryan

On Sat, Jun 20, 2009 at 4:08 PM, Andrew Purtell <ap...@apache.org> wrote:

> I my opinion, we should not bother to wait for Avro. I've been hearing
> about it on and off for three months now. If it is ready the day we
> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
> we should just use Thrift or pbufs. Thrift may be preferable as its
> compact binary protocol is competitive with pbufs plus it has a fully
> implemented async rpc stack. I think this applies for both 1015 and
> 1295. Also I'm skeptical that something to supplant RMI won't have
> overheads related to that we don't need, e.g. transmitting class and
> method names as strings, etc.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
> Sent: Saturday, June 20, 2009 3:26:58 PM
> Subject: RE: who's doing what for 0.21?
>
> I am also interested in 1295 (I have quite a bit of experience
> with cross data center replication), but more interested in
> getting more of the master into zookeeper.
>
> As for 1556, I might wait a bit. At the Cloudera off-site, one
> of the things talked about was doing something similar for
> Hadoop which we might leverage.
>
> What really needs to get done around builds is when you mark
> a Jira as patch available, we should do a patch build and test
> like Hadoop does. Noone has had time to do it to date, but if
> you are taking on the build, that would be a "nice to have".
>
> For 1015, should you wait for Avro?
>
> And if you missed it, here are the notes from the Cloudera off-site:
> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
> > -----Original Message-----
> > From: Andrew Purtell [mailto:apurtell@apache.org]
> > Sent: Saturday, June 20, 2009 12:11 PM
> > To: hbase-dev@hadoop.apache.org
> > Subject: who's doing what for 0.21?
> >
> > http://tinyurl.com/m7nt72
> >
> > I have an interest in these:
> >
> >  https://issues.apache.org/jira/browse/HBASE-1015
> >  https://issues.apache.org/jira/browse/HBASE-1295
> >  https://issues.apache.org/jira/browse/HBASE-1556
> >
> > I think for 1015 and 1295, there is interest on the part of at least
> > myself,
> > dj_ryan, and jgray. dj_ryan was saying something about su executives
> > making
> > 1295 a priority for him. We should figure out how to divide up and
> > assign
> > out the work.
> >
> > Also, probably I'll end up taking on the grunt work of 1556, because
> > it
> > needs to be done.
> >
> > Have we set a time and place for the next dev meeting?
> >
> >    - Andy
> >
>
>
>
>

Re: who's doing what for 0.21?

Posted by Doug Cutting <cu...@apache.org>.

George Porter wrote:
> For now, though, it might be best for me to first write a patch to the 
> spec (with the metadata coming before the name and parameters), and a 
> patch to the implementation.  Time permitting before Wednesday, I can 
> then write a patch to support the public api for registering the plugin.
> 
> Sound good?

Yes.  The priority is to get it into the spec and have the 
implementations conform, even if they can only pass empty metadata. 
Then, subsequently, can add APIs to access the metadata.

Doug

Re: who's doing what for 0.21?

Posted by George Porter <Ge...@Sun.COM>.

I was wondering along similar lines while trying to come up with an  
API for the Responder side.

On the Requestor side, a plugin could be registered with callbacks  
that are invoked before the request is send, and right after the  
response is received.  For the Responder, callbacks associated with a  
registered plugin would be called after the request is received (but  
before the logic of the rpc is executed), and then before the response  
is sent back.  Default implementations would just do a nop, and by  
default an empty map would be sent.

For now, though, it might be best for me to first write a patch to the  
spec (with the metadata coming before the name and parameters), and a  
patch to the implementation.  Time permitting before Wednesday, I can  
then write a patch to support the public api for registering the plugin.

Sound good?

Thanks,
George


On Jun 30, 2009, at 12:04 PM, Doug Cutting wrote:

> I wonder if, rather than using ThreadLocals, what might be better is  
> to have Requestor and Responder support plugins that can manipulate  
> the call and response metadata.  Default plugins could be statically  
> configured, and per-proxy plugins could be passed to  
> ReflectRequestor#getClient().
>
> Doug
>
> Doug Cutting wrote:
>> George Porter wrote:
>>> If (as I expect) it is possible for a thread to have multiple  
>>> Requestors open (but Requestors could not be shared to more than  
>>> one thread), then the ThreadLocal could be a Map<Requestor,  
>>> metadata>.  This way you could set per-call metadata on a per- 
>>> requestor basis even if multiple requestors are open.  We could  
>>> provide some static methods as you mentioned to facilitate this.
>> Perhaps the ThreadLocal need not be static, but rather per- 
>> Requestor. The setters and getters would be implicitly per-thread  
>> and explicitly per-requestor.  Could that work?
>>> So that I understand, in this case the call format for the request  
>>> would then become:
>>>  - the message name, an avro string, followed by the message  
>>> parameters
>>>  - the per-call metadata, which is a map.  By default the map is  
>>> null
>> My intuition would be to transmit the metadata before the message  
>> name and parameters.  Also, a map would always be transmitted, but  
>> it would be empty by default.  The analogy is with HTTP headers,  
>> which are transmitted before request and response payloads.  It  
>> should be easy to implement an HTTP-based Transceiver.
>>> and the call response would become
>>>  - a one byte error flag boolean (with either a message response  
>>> or error)
>>>  - the per-call metadata, which is a map.  By default the map is  
>>> null
>> Again, I would transmit metadata before the response data and  
>> always transmit a map, empty by default.
>>> Before you had mentioned that the metadata could be sent in  
>>> Transceiver.transceive.  But couldn't it also be sent in  
>>> Requestor.request() (approximately at the writeRequset() call)?
>> Yes, if it is part of the Avro spec, it would be sent by Requestor  
>> and Responder.  If added by an application, it could be in  
>> Transceiver#transceive().
>>> What do people think of proceeding with Doug's recommendation of  
>>> using ThreadLocals, and including the map in the call request and  
>>> response?
>> +1 from me.
>> Doug

[George.Porter@Sun.com][+1.858.320.9932]
[Principal Investigator][Project BigData][Sun Labs]
[twitter.com/SunLabsBigData][blogs.sun.com/george]

Re: who's doing what for 0.21?

Posted by Doug Cutting <cu...@apache.org>.

I wonder if, rather than using ThreadLocals, what might be better is to 
have Requestor and Responder support plugins that can manipulate the 
call and response metadata.  Default plugins could be statically 
configured, and per-proxy plugins could be passed to 
ReflectRequestor#getClient().

Doug

Doug Cutting wrote:
> George Porter wrote:
>> If (as I expect) it is possible for a thread to have multiple 
>> Requestors open (but Requestors could not be shared to more than one 
>> thread), then the ThreadLocal could be a Map<Requestor, metadata>.  
>> This way you could set per-call metadata on a per-requestor basis even 
>> if multiple requestors are open.  We could provide some static methods 
>> as you mentioned to facilitate this.
> 
> Perhaps the ThreadLocal need not be static, but rather per-Requestor. 
> The setters and getters would be implicitly per-thread and explicitly 
> per-requestor.  Could that work?
> 
>> So that I understand, in this case the call format for the request 
>> would then become:
>>   - the message name, an avro string, followed by the message parameters
>>   - the per-call metadata, which is a map.  By default the map is null
> 
> My intuition would be to transmit the metadata before the message name 
> and parameters.  Also, a map would always be transmitted, but it would 
> be empty by default.  The analogy is with HTTP headers, which are 
> transmitted before request and response payloads.  It should be easy to 
> implement an HTTP-based Transceiver.
> 
>> and the call response would become
>>   - a one byte error flag boolean (with either a message response or 
>> error)
>>   - the per-call metadata, which is a map.  By default the map is null
> 
> Again, I would transmit metadata before the response data and always 
> transmit a map, empty by default.
> 
>> Before you had mentioned that the metadata could be sent in 
>> Transceiver.transceive.  But couldn't it also be sent in 
>> Requestor.request() (approximately at the writeRequset() call)?
> 
> Yes, if it is part of the Avro spec, it would be sent by Requestor and 
> Responder.  If added by an application, it could be in 
> Transceiver#transceive().
> 
>> What do people think of proceeding with Doug's recommendation of using 
>> ThreadLocals, and including the map in the call request and response?
> 
> +1 from me.
> 
> Doug

Re: who's doing what for 0.21?

Posted by Doug Cutting <cu...@apache.org>.

George Porter wrote:
> If (as I expect) it is possible for a thread to have multiple Requestors 
> open (but Requestors could not be shared to more than one thread), then 
> the ThreadLocal could be a Map<Requestor, metadata>.  This way you could 
> set per-call metadata on a per-requestor basis even if multiple 
> requestors are open.  We could provide some static methods as you 
> mentioned to facilitate this.

Perhaps the ThreadLocal need not be static, but rather per-Requestor. 
The setters and getters would be implicitly per-thread and explicitly 
per-requestor.  Could that work?

> So that I understand, in this case the call format for the request would 
> then become:
>   - the message name, an avro string, followed by the message parameters
>   - the per-call metadata, which is a map.  By default the map is null

My intuition would be to transmit the metadata before the message name 
and parameters.  Also, a map would always be transmitted, but it would 
be empty by default.  The analogy is with HTTP headers, which are 
transmitted before request and response payloads.  It should be easy to 
implement an HTTP-based Transceiver.

> and the call response would become
>   - a one byte error flag boolean (with either a message response or error)
>   - the per-call metadata, which is a map.  By default the map is null

Again, I would transmit metadata before the response data and always 
transmit a map, empty by default.

> Before you had mentioned that the metadata could be sent in 
> Transceiver.transceive.  But couldn't it also be sent in 
> Requestor.request() (approximately at the writeRequset() call)?

Yes, if it is part of the Avro spec, it would be sent by Requestor and 
Responder.  If added by an application, it could be in 
Transceiver#transceive().

> What do people think of proceeding with Doug's recommendation of using 
> ThreadLocals, and including the map in the call request and response?

+1 from me.

Doug

Re: who's doing what for 0.21?

Posted by George Porter <Ge...@Sun.COM>.

Hi Doug,

The ThreadLocal approach would be an easy way to communicate per-call  
metadata between the application and the Requstor/Responder without  
having to add explicit API.  I just started picking up the Avro code,  
and so I have a question:  is it possible for a Requestor or Responder  
to be used by more than one thread?  On the flip side, is it possible  
for a thread to have more than one Requestor?

If (as I expect) it is possible for a thread to have multiple  
Requestors open (but Requestors could not be shared to more than one  
thread), then the ThreadLocal could be a Map<Requestor, metadata>.   
This way you could set per-call metadata on a per-requestor basis even  
if multiple requestors are open.  We could provide some static methods  
as you mentioned to facilitate this.

So that I understand, in this case the call format for the request  
would then become:
   - the message name, an avro string, followed by the message  
parameters
   - the per-call metadata, which is a map.  By default the map is null

and the call response would become
   - a one byte error flag boolean (with either a message response or  
error)
   - the per-call metadata, which is a map.  By default the map is null

Before you had mentioned that the metadata could be sent in  
Transceiver.transceive.  But couldn't it also be sent in  
Requestor.request() (approximately at the writeRequset() call)?

What do people think of proceeding with Doug's recommendation of using  
ThreadLocals, and including the map in the call request and response?

Thanks,
George

On Jun 29, 2009, at 4:09 PM, Doug Cutting wrote:

> George Porter wrote:
>> If we would have to create an API for (optional) per-call metadata,  
>> then I think we should go ahead and make it part of the spec from  
>> the get-go.  We can then implement both the API and the spec change.
>
> Can we do it by Wednesday?  I was hoping to freeze the 1.0 spec on  
> Wednesday.  We could easily add it to the spec and read/write empty  
> metadata maps in the impls.  Providing a nice API might be a little  
> harder.  I guess it could be easily hacked through threadlocals:  
> Requestor and Responder could just copy things from/to threadlocal  
> maps, accessed via static methods.  Would that suffice?
>
> Doug

Re: who's doing what for 0.21?

Posted by Doug Cutting <cu...@apache.org>.

George Porter wrote:
> If we would have to create an API for (optional) per-call metadata, then 
> I think we should go ahead and make it part of the spec from the 
> get-go.  We can then implement both the API and the spec change.

Can we do it by Wednesday?  I was hoping to freeze the 1.0 spec on 
Wednesday.  We could easily add it to the spec and read/write empty 
metadata maps in the impls.  Providing a nice API might be a little 
harder.  I guess it could be easily hacked through threadlocals: 
Requestor and Responder could just copy things from/to threadlocal maps, 
accessed via static methods.  Would that suffice?

Doug

Re: who's doing what for 0.21?

Posted by George Porter <Ge...@Sun.COM>.

If we would have to create an API for (optional) per-call metadata,  
then I think we should go ahead and make it part of the spec from the  
get-go.  We can then implement both the API and the spec change.

-George

On Jun 29, 2009, at 9:40 AM, Doug Cutting wrote:

> George Porter wrote:
>> I'd be happy to look into bottom up profiling with tracer tags  
>> (path-based metadata field) supported with avro.
>
> That would be great.  I added metadata field to the RPC handshake,  
> so we have per-connection metadata, but, to support what you seek,  
> we'd also need per-call metadata.  This can be done without altering  
> the Avro spec using a Transciever#transceive() implementation, but  
> it might perhaps be better to have per-call metadata in the spec.
>
> Doug

Re: who's doing what for 0.21?

Posted by Doug Cutting <cu...@apache.org>.

George Porter wrote:
> I'd be happy to look into bottom up profiling with tracer tags 
> (path-based metadata field) supported with avro. 

That would be great.  I added metadata field to the RPC handshake, so we 
have per-connection metadata, but, to support what you seek, we'd also 
need per-call metadata.  This can be done without altering the Avro spec 
using a Transciever#transceive() implementation, but it might perhaps be 
better to have per-call metadata in the spec.

Doug

Re: who's doing what for 0.21?

Posted by George Porter <Ge...@Sun.COM>.

I'd be happy to look into bottom up profiling with tracer tags (path- 
based metadata field) supported with avro.  I implemented similar  
tracing with Thrift by including an optional parameter into each RPC  
by default which passed along metadata.  If the receiver didn't  
understand it, it would just ignore that parameter, and if the client  
didn't support it, then the receiver would just use a default value of  
null.  It seems like a similar approach could be taken with Avro, and  
I'll check that out this week.

Thanks,
George

On Jun 24, 2009, at 12:35 AM, Ryan Rawson wrote:

> Here are some thoughts:
>
> - Performance is important.
> - Unified protocol would help the use of tracer tags, which could give
> us top to bottom profiling.
> - Would like to do rolling restart of HBase, even under relatively
> major upgrades.
> - PHP (even pure php) bindings.
>
> I'll poke at the code, what is the current state?
>

Re: who's doing what for 0.21?

Posted by Amr Awadallah <aa...@cloudera.com>.

 >  I'll poke at the code, what is the current state?

First release of Avro should be rolling out on July 1st per Doug:

From: Doug Cutting
To: avro-dev
Date: June 23, 2009
Subject: Release 1.0.0: July 1st?

I'd like to make a release soon, so that folks can start using Avro. I'd 
like to call the first release 1.0.0.

I propose that the Avro version convention is that only major releases 
permit forward and/or backward data format incompatibilities.  So, by 
calling this 1.x.x, we commit to not altering the specification accept 
to potentially add new features whose presence does not affect 
compatibility with any existing features until release 2.x.x.

Minor releases would permit API incompatibilities.  So 1.1.X APIs may 
not be compatible with 1.0.1 APIs, but all 1.0.X releases should be API 
compatible bugfix releases.

Does that sound like a reasonable policy?

It would be nice to have more complete C and/or C++ support working in 
1.0.0, but that's not strictly required: we can add it in subsequent 
releases.

So I propose to make a 1.0.0 release on July 1st.  Objections?

Doug
> Here are some thoughts:
>
> - Performance is important.
> - Unified protocol would help the use of tracer tags, which could give
> us top to bottom profiling.
> - Would like to do rolling restart of HBase, even under relatively
> major upgrades.
> - PHP (even pure php) bindings.
>
> I'll poke at the code, what is the current state?
>
> On Tue, Jun 23, 2009 at 11:55 PM, Jeff Hammerbacher<ha...@cloudera.com> wrote:
>   
>> Ryan,
>>
>> The working plan is to use Avro to replace the current DataNode streaming
>> protocol, in addition to the HTTP-based protocol for fetching data during
>> the shuffle and the standard Hadoop IPC. Having a unified serialization and
>> RPC strategy should make enforcing things like security constraints and
>> versions far more straightforward. It would be great to see the HBase
>> community express their specific needs to the Avro folks so that we could
>> have unified serialization and RPC throughout the stack (not you, Michael).
>>
>> Later,
>> Jeff
>>
>> On Mon, Jun 22, 2009 at 9:36 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>>     
>>> HBase has performance requirements on par with the datanode streaming
>>> protocol - we want to provide the data in our systems as fast as we can
>>> read
>>> and stream them.
>>>
>>> -ryan
>>>
>>> On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>>>
>>>       
>>>> CCing avro-dev@ to comment, but my understanding is that it is currently
>>>> functioning and way superior to the alternatives you mention below :)
>>>>
>>>> -- amr
>>>>
>>>>
>>>> Andrew Purtell wrote:
>>>>
>>>>         
>>>>> I my opinion, we should not bother to wait for Avro. I've been hearing
>>>>> about it on and off for three months now. If it is ready the day we
>>>>> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
>>>>> we should just use Thrift or pbufs. Thrift may be preferable as its
>>>>> compact binary protocol is competitive with pbufs plus it has a fully
>>>>> implemented async rpc stack. I think this applies for both 1015 and
>>>>> 1295. Also I'm skeptical that something to supplant RMI won't have
>>>>> overheads related to that we don't need, e.g. transmitting class and
>>>>> method names as strings, etc.
>>>>>   - Andy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
>>>>> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
>>>>> Sent: Saturday, June 20, 2009 3:26:58 PM
>>>>> Subject: RE: who's doing what for 0.21?
>>>>>
>>>>> I am also interested in 1295 (I have quite a bit of experience
>>>>> with cross data center replication), but more interested in
>>>>> getting more of the master into zookeeper.
>>>>>
>>>>> As for 1556, I might wait a bit. At the Cloudera off-site, one
>>>>> of the things talked about was doing something similar for
>>>>> Hadoop which we might leverage.
>>>>>
>>>>> What really needs to get done around builds is when you mark
>>>>> a Jira as patch available, we should do a patch build and test
>>>>> like Hadoop does. Noone has had time to do it to date, but if
>>>>> you are taking on the build, that would be a "nice to have".
>>>>>
>>>>> For 1015, should you wait for Avro?
>>>>>
>>>>> And if you missed it, here are the notes from the Cloudera off-site:
>>>>> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>>>>>
>>>>> ---
>>>>> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: Andrew Purtell [mailto:apurtell@apache.org]
>>>>>> Sent: Saturday, June 20, 2009 12:11 PM
>>>>>> To: hbase-dev@hadoop.apache.org
>>>>>> Subject: who's doing what for 0.21?
>>>>>>
>>>>>> http://tinyurl.com/m7nt72
>>>>>>
>>>>>> I have an interest in these:
>>>>>>
>>>>>>  https://issues.apache.org/jira/browse/HBASE-1015
>>>>>>  https://issues.apache.org/jira/browse/HBASE-1295
>>>>>>  https://issues.apache.org/jira/browse/HBASE-1556
>>>>>>
>>>>>> I think for 1015 and 1295, there is interest on the part of at least
>>>>>> myself,
>>>>>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>>>>>> making
>>>>>> 1295 a priority for him. We should figure out how to divide up and
>>>>>> assign
>>>>>> out the work.
>>>>>>
>>>>>> Also, probably I'll end up taking on the grunt work of 1556, because
>>>>>> it
>>>>>> needs to be done.
>>>>>>
>>>>>> Have we set a time and place for the next dev meeting?
>>>>>>
>>>>>>   - Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>
>>>>>
>>>>>

Re: who's doing what for 0.21?

Posted by Amr Awadallah <aa...@cloudera.com>.

 >  I'll poke at the code, what is the current state?

First release of Avro should be rolling out on July 1st per Doug:

From: Doug Cutting
To: avro-dev
Date: June 23, 2009
Subject: Release 1.0.0: July 1st?

I'd like to make a release soon, so that folks can start using Avro. I'd 
like to call the first release 1.0.0.

I propose that the Avro version convention is that only major releases 
permit forward and/or backward data format incompatibilities.  So, by 
calling this 1.x.x, we commit to not altering the specification accept 
to potentially add new features whose presence does not affect 
compatibility with any existing features until release 2.x.x.

Minor releases would permit API incompatibilities.  So 1.1.X APIs may 
not be compatible with 1.0.1 APIs, but all 1.0.X releases should be API 
compatible bugfix releases.

Does that sound like a reasonable policy?

It would be nice to have more complete C and/or C++ support working in 
1.0.0, but that's not strictly required: we can add it in subsequent 
releases.

So I propose to make a 1.0.0 release on July 1st.  Objections?

Doug
> Here are some thoughts:
>
> - Performance is important.
> - Unified protocol would help the use of tracer tags, which could give
> us top to bottom profiling.
> - Would like to do rolling restart of HBase, even under relatively
> major upgrades.
> - PHP (even pure php) bindings.
>
> I'll poke at the code, what is the current state?
>
> On Tue, Jun 23, 2009 at 11:55 PM, Jeff Hammerbacher<ha...@cloudera.com> wrote:
>   
>> Ryan,
>>
>> The working plan is to use Avro to replace the current DataNode streaming
>> protocol, in addition to the HTTP-based protocol for fetching data during
>> the shuffle and the standard Hadoop IPC. Having a unified serialization and
>> RPC strategy should make enforcing things like security constraints and
>> versions far more straightforward. It would be great to see the HBase
>> community express their specific needs to the Avro folks so that we could
>> have unified serialization and RPC throughout the stack (not you, Michael).
>>
>> Later,
>> Jeff
>>
>> On Mon, Jun 22, 2009 at 9:36 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>>     
>>> HBase has performance requirements on par with the datanode streaming
>>> protocol - we want to provide the data in our systems as fast as we can
>>> read
>>> and stream them.
>>>
>>> -ryan
>>>
>>> On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>>>
>>>       
>>>> CCing avro-dev@ to comment, but my understanding is that it is currently
>>>> functioning and way superior to the alternatives you mention below :)
>>>>
>>>> -- amr
>>>>
>>>>
>>>> Andrew Purtell wrote:
>>>>
>>>>         
>>>>> I my opinion, we should not bother to wait for Avro. I've been hearing
>>>>> about it on and off for three months now. If it is ready the day we
>>>>> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
>>>>> we should just use Thrift or pbufs. Thrift may be preferable as its
>>>>> compact binary protocol is competitive with pbufs plus it has a fully
>>>>> implemented async rpc stack. I think this applies for both 1015 and
>>>>> 1295. Also I'm skeptical that something to supplant RMI won't have
>>>>> overheads related to that we don't need, e.g. transmitting class and
>>>>> method names as strings, etc.
>>>>>   - Andy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
>>>>> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
>>>>> Sent: Saturday, June 20, 2009 3:26:58 PM
>>>>> Subject: RE: who's doing what for 0.21?
>>>>>
>>>>> I am also interested in 1295 (I have quite a bit of experience
>>>>> with cross data center replication), but more interested in
>>>>> getting more of the master into zookeeper.
>>>>>
>>>>> As for 1556, I might wait a bit. At the Cloudera off-site, one
>>>>> of the things talked about was doing something similar for
>>>>> Hadoop which we might leverage.
>>>>>
>>>>> What really needs to get done around builds is when you mark
>>>>> a Jira as patch available, we should do a patch build and test
>>>>> like Hadoop does. Noone has had time to do it to date, but if
>>>>> you are taking on the build, that would be a "nice to have".
>>>>>
>>>>> For 1015, should you wait for Avro?
>>>>>
>>>>> And if you missed it, here are the notes from the Cloudera off-site:
>>>>> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>>>>>
>>>>> ---
>>>>> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: Andrew Purtell [mailto:apurtell@apache.org]
>>>>>> Sent: Saturday, June 20, 2009 12:11 PM
>>>>>> To: hbase-dev@hadoop.apache.org
>>>>>> Subject: who's doing what for 0.21?
>>>>>>
>>>>>> http://tinyurl.com/m7nt72
>>>>>>
>>>>>> I have an interest in these:
>>>>>>
>>>>>>  https://issues.apache.org/jira/browse/HBASE-1015
>>>>>>  https://issues.apache.org/jira/browse/HBASE-1295
>>>>>>  https://issues.apache.org/jira/browse/HBASE-1556
>>>>>>
>>>>>> I think for 1015 and 1295, there is interest on the part of at least
>>>>>> myself,
>>>>>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>>>>>> making
>>>>>> 1295 a priority for him. We should figure out how to divide up and
>>>>>> assign
>>>>>> out the work.
>>>>>>
>>>>>> Also, probably I'll end up taking on the grunt work of 1556, because
>>>>>> it
>>>>>> needs to be done.
>>>>>>
>>>>>> Have we set a time and place for the next dev meeting?
>>>>>>
>>>>>>   - Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>
>>>>>
>>>>>

Re: who's doing what for 0.21?

Posted by George Porter <Ge...@Sun.COM>.

I'd be happy to look into bottom up profiling with tracer tags (path- 
based metadata field) supported with avro.  I implemented similar  
tracing with Thrift by including an optional parameter into each RPC  
by default which passed along metadata.  If the receiver didn't  
understand it, it would just ignore that parameter, and if the client  
didn't support it, then the receiver would just use a default value of  
null.  It seems like a similar approach could be taken with Avro, and  
I'll check that out this week.

Thanks,
George

On Jun 24, 2009, at 12:35 AM, Ryan Rawson wrote:

> Here are some thoughts:
>
> - Performance is important.
> - Unified protocol would help the use of tracer tags, which could give
> us top to bottom profiling.
> - Would like to do rolling restart of HBase, even under relatively
> major upgrades.
> - PHP (even pure php) bindings.
>
> I'll poke at the code, what is the current state?
>

Re: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Here are some thoughts:

- Performance is important.
- Unified protocol would help the use of tracer tags, which could give
us top to bottom profiling.
- Would like to do rolling restart of HBase, even under relatively
major upgrades.
- PHP (even pure php) bindings.

I'll poke at the code, what is the current state?

On Tue, Jun 23, 2009 at 11:55 PM, Jeff Hammerbacher<ha...@cloudera.com> wrote:
> Ryan,
>
> The working plan is to use Avro to replace the current DataNode streaming
> protocol, in addition to the HTTP-based protocol for fetching data during
> the shuffle and the standard Hadoop IPC. Having a unified serialization and
> RPC strategy should make enforcing things like security constraints and
> versions far more straightforward. It would be great to see the HBase
> community express their specific needs to the Avro folks so that we could
> have unified serialization and RPC throughout the stack (not you, Michael).
>
> Later,
> Jeff
>
> On Mon, Jun 22, 2009 at 9:36 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> HBase has performance requirements on par with the datanode streaming
>> protocol - we want to provide the data in our systems as fast as we can
>> read
>> and stream them.
>>
>> -ryan
>>
>> On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>>
>> > CCing avro-dev@ to comment, but my understanding is that it is currently
>> > functioning and way superior to the alternatives you mention below :)
>> >
>> > -- amr
>> >
>> >
>> > Andrew Purtell wrote:
>> >
>> >> I my opinion, we should not bother to wait for Avro. I've been hearing
>> >> about it on and off for three months now. If it is ready the day we
>> >> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
>> >> we should just use Thrift or pbufs. Thrift may be preferable as its
>> >> compact binary protocol is competitive with pbufs plus it has a fully
>> >> implemented async rpc stack. I think this applies for both 1015 and
>> >> 1295. Also I'm skeptical that something to supplant RMI won't have
>> >> overheads related to that we don't need, e.g. transmitting class and
>> >> method names as strings, etc.
>> >>   - Andy
>> >>
>> >>
>> >>
>> >>
>> >> ________________________________
>> >> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
>> >> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
>> >> Sent: Saturday, June 20, 2009 3:26:58 PM
>> >> Subject: RE: who's doing what for 0.21?
>> >>
>> >> I am also interested in 1295 (I have quite a bit of experience
>> >> with cross data center replication), but more interested in
>> >> getting more of the master into zookeeper.
>> >>
>> >> As for 1556, I might wait a bit. At the Cloudera off-site, one
>> >> of the things talked about was doing something similar for
>> >> Hadoop which we might leverage.
>> >>
>> >> What really needs to get done around builds is when you mark
>> >> a Jira as patch available, we should do a patch build and test
>> >> like Hadoop does. Noone has had time to do it to date, but if
>> >> you are taking on the build, that would be a "nice to have".
>> >>
>> >> For 1015, should you wait for Avro?
>> >>
>> >> And if you missed it, here are the notes from the Cloudera off-site:
>> >> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>> >>
>> >> ---
>> >> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>> >>
>> >>
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: Andrew Purtell [mailto:apurtell@apache.org]
>> >>> Sent: Saturday, June 20, 2009 12:11 PM
>> >>> To: hbase-dev@hadoop.apache.org
>> >>> Subject: who's doing what for 0.21?
>> >>>
>> >>> http://tinyurl.com/m7nt72
>> >>>
>> >>> I have an interest in these:
>> >>>
>> >>>  https://issues.apache.org/jira/browse/HBASE-1015
>> >>>  https://issues.apache.org/jira/browse/HBASE-1295
>> >>>  https://issues.apache.org/jira/browse/HBASE-1556
>> >>>
>> >>> I think for 1015 and 1295, there is interest on the part of at least
>> >>> myself,
>> >>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>> >>> making
>> >>> 1295 a priority for him. We should figure out how to divide up and
>> >>> assign
>> >>> out the work.
>> >>>
>> >>> Also, probably I'll end up taking on the grunt work of 1556, because
>> >>> it
>> >>> needs to be done.
>> >>>
>> >>> Have we set a time and place for the next dev meeting?
>> >>>
>> >>>   - Andy
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>

Re: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Here are some thoughts:

- Performance is important.
- Unified protocol would help the use of tracer tags, which could give
us top to bottom profiling.
- Would like to do rolling restart of HBase, even under relatively
major upgrades.
- PHP (even pure php) bindings.

I'll poke at the code, what is the current state?

On Tue, Jun 23, 2009 at 11:55 PM, Jeff Hammerbacher<ha...@cloudera.com> wrote:
> Ryan,
>
> The working plan is to use Avro to replace the current DataNode streaming
> protocol, in addition to the HTTP-based protocol for fetching data during
> the shuffle and the standard Hadoop IPC. Having a unified serialization and
> RPC strategy should make enforcing things like security constraints and
> versions far more straightforward. It would be great to see the HBase
> community express their specific needs to the Avro folks so that we could
> have unified serialization and RPC throughout the stack (not you, Michael).
>
> Later,
> Jeff
>
> On Mon, Jun 22, 2009 at 9:36 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> HBase has performance requirements on par with the datanode streaming
>> protocol - we want to provide the data in our systems as fast as we can
>> read
>> and stream them.
>>
>> -ryan
>>
>> On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>>
>> > CCing avro-dev@ to comment, but my understanding is that it is currently
>> > functioning and way superior to the alternatives you mention below :)
>> >
>> > -- amr
>> >
>> >
>> > Andrew Purtell wrote:
>> >
>> >> I my opinion, we should not bother to wait for Avro. I've been hearing
>> >> about it on and off for three months now. If it is ready the day we
>> >> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
>> >> we should just use Thrift or pbufs. Thrift may be preferable as its
>> >> compact binary protocol is competitive with pbufs plus it has a fully
>> >> implemented async rpc stack. I think this applies for both 1015 and
>> >> 1295. Also I'm skeptical that something to supplant RMI won't have
>> >> overheads related to that we don't need, e.g. transmitting class and
>> >> method names as strings, etc.
>> >>   - Andy
>> >>
>> >>
>> >>
>> >>
>> >> ________________________________
>> >> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
>> >> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
>> >> Sent: Saturday, June 20, 2009 3:26:58 PM
>> >> Subject: RE: who's doing what for 0.21?
>> >>
>> >> I am also interested in 1295 (I have quite a bit of experience
>> >> with cross data center replication), but more interested in
>> >> getting more of the master into zookeeper.
>> >>
>> >> As for 1556, I might wait a bit. At the Cloudera off-site, one
>> >> of the things talked about was doing something similar for
>> >> Hadoop which we might leverage.
>> >>
>> >> What really needs to get done around builds is when you mark
>> >> a Jira as patch available, we should do a patch build and test
>> >> like Hadoop does. Noone has had time to do it to date, but if
>> >> you are taking on the build, that would be a "nice to have".
>> >>
>> >> For 1015, should you wait for Avro?
>> >>
>> >> And if you missed it, here are the notes from the Cloudera off-site:
>> >> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>> >>
>> >> ---
>> >> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>> >>
>> >>
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: Andrew Purtell [mailto:apurtell@apache.org]
>> >>> Sent: Saturday, June 20, 2009 12:11 PM
>> >>> To: hbase-dev@hadoop.apache.org
>> >>> Subject: who's doing what for 0.21?
>> >>>
>> >>> http://tinyurl.com/m7nt72
>> >>>
>> >>> I have an interest in these:
>> >>>
>> >>>  https://issues.apache.org/jira/browse/HBASE-1015
>> >>>  https://issues.apache.org/jira/browse/HBASE-1295
>> >>>  https://issues.apache.org/jira/browse/HBASE-1556
>> >>>
>> >>> I think for 1015 and 1295, there is interest on the part of at least
>> >>> myself,
>> >>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>> >>> making
>> >>> 1295 a priority for him. We should figure out how to divide up and
>> >>> assign
>> >>> out the work.
>> >>>
>> >>> Also, probably I'll end up taking on the grunt work of 1556, because
>> >>> it
>> >>> needs to be done.
>> >>>
>> >>> Have we set a time and place for the next dev meeting?
>> >>>
>> >>>   - Andy
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>

Re: who's doing what for 0.21?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Ryan,

The working plan is to use Avro to replace the current DataNode streaming
protocol, in addition to the HTTP-based protocol for fetching data during
the shuffle and the standard Hadoop IPC. Having a unified serialization and
RPC strategy should make enforcing things like security constraints and
versions far more straightforward. It would be great to see the HBase
community express their specific needs to the Avro folks so that we could
have unified serialization and RPC throughout the stack (not you, Michael).

Later,
Jeff

On Mon, Jun 22, 2009 at 9:36 PM, Ryan Rawson <ry...@gmail.com> wrote:

> HBase has performance requirements on par with the datanode streaming
> protocol - we want to provide the data in our systems as fast as we can
> read
> and stream them.
>
> -ryan
>
> On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>
> > CCing avro-dev@ to comment, but my understanding is that it is currently
> > functioning and way superior to the alternatives you mention below :)
> >
> > -- amr
> >
> >
> > Andrew Purtell wrote:
> >
> >> I my opinion, we should not bother to wait for Avro. I've been hearing
> >> about it on and off for three months now. If it is ready the day we
> >> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
> >> we should just use Thrift or pbufs. Thrift may be preferable as its
> >> compact binary protocol is competitive with pbufs plus it has a fully
> >> implemented async rpc stack. I think this applies for both 1015 and
> >> 1295. Also I'm skeptical that something to supplant RMI won't have
> >> overheads related to that we don't need, e.g. transmitting class and
> >> method names as strings, etc.
> >>   - Andy
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
> >> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
> >> Sent: Saturday, June 20, 2009 3:26:58 PM
> >> Subject: RE: who's doing what for 0.21?
> >>
> >> I am also interested in 1295 (I have quite a bit of experience
> >> with cross data center replication), but more interested in
> >> getting more of the master into zookeeper.
> >>
> >> As for 1556, I might wait a bit. At the Cloudera off-site, one
> >> of the things talked about was doing something similar for
> >> Hadoop which we might leverage.
> >>
> >> What really needs to get done around builds is when you mark
> >> a Jira as patch available, we should do a patch build and test
> >> like Hadoop does. Noone has had time to do it to date, but if
> >> you are taking on the build, that would be a "nice to have".
> >>
> >> For 1015, should you wait for Avro?
> >>
> >> And if you missed it, here are the notes from the Cloudera off-site:
> >> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
> >>
> >> ---
> >> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> >>
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Andrew Purtell [mailto:apurtell@apache.org]
> >>> Sent: Saturday, June 20, 2009 12:11 PM
> >>> To: hbase-dev@hadoop.apache.org
> >>> Subject: who's doing what for 0.21?
> >>>
> >>> http://tinyurl.com/m7nt72
> >>>
> >>> I have an interest in these:
> >>>
> >>>  https://issues.apache.org/jira/browse/HBASE-1015
> >>>  https://issues.apache.org/jira/browse/HBASE-1295
> >>>  https://issues.apache.org/jira/browse/HBASE-1556
> >>>
> >>> I think for 1015 and 1295, there is interest on the part of at least
> >>> myself,
> >>> dj_ryan, and jgray. dj_ryan was saying something about su executives
> >>> making
> >>> 1295 a priority for him. We should figure out how to divide up and
> >>> assign
> >>> out the work.
> >>>
> >>> Also, probably I'll end up taking on the grunt work of 1556, because
> >>> it
> >>> needs to be done.
> >>>
> >>> Have we set a time and place for the next dev meeting?
> >>>
> >>>   - Andy
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
>

Re: who's doing what for 0.21?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Ryan,

The working plan is to use Avro to replace the current DataNode streaming
protocol, in addition to the HTTP-based protocol for fetching data during
the shuffle and the standard Hadoop IPC. Having a unified serialization and
RPC strategy should make enforcing things like security constraints and
versions far more straightforward. It would be great to see the HBase
community express their specific needs to the Avro folks so that we could
have unified serialization and RPC throughout the stack (not you, Michael).

Later,
Jeff

On Mon, Jun 22, 2009 at 9:36 PM, Ryan Rawson <ry...@gmail.com> wrote:

> HBase has performance requirements on par with the datanode streaming
> protocol - we want to provide the data in our systems as fast as we can
> read
> and stream them.
>
> -ryan
>
> On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>
> > CCing avro-dev@ to comment, but my understanding is that it is currently
> > functioning and way superior to the alternatives you mention below :)
> >
> > -- amr
> >
> >
> > Andrew Purtell wrote:
> >
> >> I my opinion, we should not bother to wait for Avro. I've been hearing
> >> about it on and off for three months now. If it is ready the day we
> >> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
> >> we should just use Thrift or pbufs. Thrift may be preferable as its
> >> compact binary protocol is competitive with pbufs plus it has a fully
> >> implemented async rpc stack. I think this applies for both 1015 and
> >> 1295. Also I'm skeptical that something to supplant RMI won't have
> >> overheads related to that we don't need, e.g. transmitting class and
> >> method names as strings, etc.
> >>   - Andy
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
> >> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
> >> Sent: Saturday, June 20, 2009 3:26:58 PM
> >> Subject: RE: who's doing what for 0.21?
> >>
> >> I am also interested in 1295 (I have quite a bit of experience
> >> with cross data center replication), but more interested in
> >> getting more of the master into zookeeper.
> >>
> >> As for 1556, I might wait a bit. At the Cloudera off-site, one
> >> of the things talked about was doing something similar for
> >> Hadoop which we might leverage.
> >>
> >> What really needs to get done around builds is when you mark
> >> a Jira as patch available, we should do a patch build and test
> >> like Hadoop does. Noone has had time to do it to date, but if
> >> you are taking on the build, that would be a "nice to have".
> >>
> >> For 1015, should you wait for Avro?
> >>
> >> And if you missed it, here are the notes from the Cloudera off-site:
> >> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
> >>
> >> ---
> >> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> >>
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Andrew Purtell [mailto:apurtell@apache.org]
> >>> Sent: Saturday, June 20, 2009 12:11 PM
> >>> To: hbase-dev@hadoop.apache.org
> >>> Subject: who's doing what for 0.21?
> >>>
> >>> http://tinyurl.com/m7nt72
> >>>
> >>> I have an interest in these:
> >>>
> >>>  https://issues.apache.org/jira/browse/HBASE-1015
> >>>  https://issues.apache.org/jira/browse/HBASE-1295
> >>>  https://issues.apache.org/jira/browse/HBASE-1556
> >>>
> >>> I think for 1015 and 1295, there is interest on the part of at least
> >>> myself,
> >>> dj_ryan, and jgray. dj_ryan was saying something about su executives
> >>> making
> >>> 1295 a priority for him. We should figure out how to divide up and
> >>> assign
> >>> out the work.
> >>>
> >>> Also, probably I'll end up taking on the grunt work of 1556, because
> >>> it
> >>> needs to be done.
> >>>
> >>> Have we set a time and place for the next dev meeting?
> >>>
> >>>   - Andy
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
>

Re: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

HBase has performance requirements on par with the datanode streaming
protocol - we want to provide the data in our systems as fast as we can read
and stream them.

-ryan

On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:

> CCing avro-dev@ to comment, but my understanding is that it is currently
> functioning and way superior to the alternatives you mention below :)
>
> -- amr
>
>
> Andrew Purtell wrote:
>
>> I my opinion, we should not bother to wait for Avro. I've been hearing
>> about it on and off for three months now. If it is ready the day we
>> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
>> we should just use Thrift or pbufs. Thrift may be preferable as its
>> compact binary protocol is competitive with pbufs plus it has a fully
>> implemented async rpc stack. I think this applies for both 1015 and
>> 1295. Also I'm skeptical that something to supplant RMI won't have
>> overheads related to that we don't need, e.g. transmitting class and
>> method names as strings, etc.
>>   - Andy
>>
>>
>>
>>
>> ________________________________
>> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
>> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
>> Sent: Saturday, June 20, 2009 3:26:58 PM
>> Subject: RE: who's doing what for 0.21?
>>
>> I am also interested in 1295 (I have quite a bit of experience
>> with cross data center replication), but more interested in
>> getting more of the master into zookeeper.
>>
>> As for 1556, I might wait a bit. At the Cloudera off-site, one
>> of the things talked about was doing something similar for
>> Hadoop which we might leverage.
>>
>> What really needs to get done around builds is when you mark
>> a Jira as patch available, we should do a patch build and test
>> like Hadoop does. Noone has had time to do it to date, but if
>> you are taking on the build, that would be a "nice to have".
>>
>> For 1015, should you wait for Avro?
>>
>> And if you missed it, here are the notes from the Cloudera off-site:
>> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>>
>> ---
>> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: Andrew Purtell [mailto:apurtell@apache.org]
>>> Sent: Saturday, June 20, 2009 12:11 PM
>>> To: hbase-dev@hadoop.apache.org
>>> Subject: who's doing what for 0.21?
>>>
>>> http://tinyurl.com/m7nt72
>>>
>>> I have an interest in these:
>>>
>>>  https://issues.apache.org/jira/browse/HBASE-1015
>>>  https://issues.apache.org/jira/browse/HBASE-1295
>>>  https://issues.apache.org/jira/browse/HBASE-1556
>>>
>>> I think for 1015 and 1295, there is interest on the part of at least
>>> myself,
>>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>>> making
>>> 1295 a priority for him. We should figure out how to divide up and
>>> assign
>>> out the work.
>>>
>>> Also, probably I'll end up taking on the grunt work of 1556, because
>>> it
>>> needs to be done.
>>>
>>> Have we set a time and place for the next dev meeting?
>>>
>>>   - Andy
>>>
>>>
>>>
>>
>>
>>
>>
>

Re: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

HBase has performance requirements on par with the datanode streaming
protocol - we want to provide the data in our systems as fast as we can read
and stream them.

-ryan

On Mon, Jun 22, 2009 at 9:19 PM, Amr Awadallah <aa...@cloudera.com> wrote:

> CCing avro-dev@ to comment, but my understanding is that it is currently
> functioning and way superior to the alternatives you mention below :)
>
> -- amr
>
>
> Andrew Purtell wrote:
>
>> I my opinion, we should not bother to wait for Avro. I've been hearing
>> about it on and off for three months now. If it is ready the day we
>> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
>> we should just use Thrift or pbufs. Thrift may be preferable as its
>> compact binary protocol is competitive with pbufs plus it has a fully
>> implemented async rpc stack. I think this applies for both 1015 and
>> 1295. Also I'm skeptical that something to supplant RMI won't have
>> overheads related to that we don't need, e.g. transmitting class and
>> method names as strings, etc.
>>   - Andy
>>
>>
>>
>>
>> ________________________________
>> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
>> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
>> Sent: Saturday, June 20, 2009 3:26:58 PM
>> Subject: RE: who's doing what for 0.21?
>>
>> I am also interested in 1295 (I have quite a bit of experience
>> with cross data center replication), but more interested in
>> getting more of the master into zookeeper.
>>
>> As for 1556, I might wait a bit. At the Cloudera off-site, one
>> of the things talked about was doing something similar for
>> Hadoop which we might leverage.
>>
>> What really needs to get done around builds is when you mark
>> a Jira as patch available, we should do a patch build and test
>> like Hadoop does. Noone has had time to do it to date, but if
>> you are taking on the build, that would be a "nice to have".
>>
>> For 1015, should you wait for Avro?
>>
>> And if you missed it, here are the notes from the Cloudera off-site:
>> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>>
>> ---
>> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: Andrew Purtell [mailto:apurtell@apache.org]
>>> Sent: Saturday, June 20, 2009 12:11 PM
>>> To: hbase-dev@hadoop.apache.org
>>> Subject: who's doing what for 0.21?
>>>
>>> http://tinyurl.com/m7nt72
>>>
>>> I have an interest in these:
>>>
>>>  https://issues.apache.org/jira/browse/HBASE-1015
>>>  https://issues.apache.org/jira/browse/HBASE-1295
>>>  https://issues.apache.org/jira/browse/HBASE-1556
>>>
>>> I think for 1015 and 1295, there is interest on the part of at least
>>> myself,
>>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>>> making
>>> 1295 a priority for him. We should figure out how to divide up and
>>> assign
>>> out the work.
>>>
>>> Also, probably I'll end up taking on the grunt work of 1556, because
>>> it
>>> needs to be done.
>>>
>>> Have we set a time and place for the next dev meeting?
>>>
>>>   - Andy
>>>
>>>
>>>
>>
>>
>>
>>
>

Re: who's doing what for 0.21?

Posted by Amr Awadallah <aa...@cloudera.com>.

CCing avro-dev@ to comment, but my understanding is that it is currently 
functioning and way superior to the alternatives you mention below :)

-- amr

Andrew Purtell wrote:
> I my opinion, we should not bother to wait for Avro. I've been hearing
> about it on and off for three months now. If it is ready the day we
> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
> we should just use Thrift or pbufs. Thrift may be preferable as its
> compact binary protocol is competitive with pbufs plus it has a fully
> implemented async rpc stack. I think this applies for both 1015 and
> 1295. Also I'm skeptical that something to supplant RMI won't have
> overheads related to that we don't need, e.g. transmitting class and
> method names as strings, etc. 
>
>    - Andy
>
>
>
>
> ________________________________
> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
> Sent: Saturday, June 20, 2009 3:26:58 PM
> Subject: RE: who's doing what for 0.21?
>
> I am also interested in 1295 (I have quite a bit of experience
> with cross data center replication), but more interested in
> getting more of the master into zookeeper.
>
> As for 1556, I might wait a bit. At the Cloudera off-site, one
> of the things talked about was doing something similar for
> Hadoop which we might leverage.
>
> What really needs to get done around builds is when you mark
> a Jira as patch available, we should do a patch build and test
> like Hadoop does. Noone has had time to do it to date, but if
> you are taking on the build, that would be a "nice to have".
>
> For 1015, should you wait for Avro?
>
> And if you missed it, here are the notes from the Cloudera off-site:
> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
>   
>> -----Original Message-----
>> From: Andrew Purtell [mailto:apurtell@apache.org]
>> Sent: Saturday, June 20, 2009 12:11 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: who's doing what for 0.21?
>>
>> http://tinyurl.com/m7nt72
>>
>> I have an interest in these:
>>
>>  https://issues.apache.org/jira/browse/HBASE-1015
>>  https://issues.apache.org/jira/browse/HBASE-1295
>>  https://issues.apache.org/jira/browse/HBASE-1556
>>
>> I think for 1015 and 1295, there is interest on the part of at least
>> myself,
>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>> making
>> 1295 a priority for him. We should figure out how to divide up and
>> assign
>> out the work.
>>
>> Also, probably I'll end up taking on the grunt work of 1556, because
>> it
>> needs to be done.
>>
>> Have we set a time and place for the next dev meeting?
>>
>>    - Andy
>>
>>     
>
>
>       
>

Re: who's doing what for 0.21?

Posted by Amr Awadallah <aa...@cloudera.com>.

CCing avro-dev@ to comment, but my understanding is that it is currently 
functioning and way superior to the alternatives you mention below :)

-- amr

Andrew Purtell wrote:
> I my opinion, we should not bother to wait for Avro. I've been hearing
> about it on and off for three months now. If it is ready the day we
> start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
> we should just use Thrift or pbufs. Thrift may be preferable as its
> compact binary protocol is competitive with pbufs plus it has a fully
> implemented async rpc stack. I think this applies for both 1015 and
> 1295. Also I'm skeptical that something to supplant RMI won't have
> overheads related to that we don't need, e.g. transmitting class and
> method names as strings, etc. 
>
>    - Andy
>
>
>
>
> ________________________________
> From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
> To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
> Sent: Saturday, June 20, 2009 3:26:58 PM
> Subject: RE: who's doing what for 0.21?
>
> I am also interested in 1295 (I have quite a bit of experience
> with cross data center replication), but more interested in
> getting more of the master into zookeeper.
>
> As for 1556, I might wait a bit. At the Cloudera off-site, one
> of the things talked about was doing something similar for
> Hadoop which we might leverage.
>
> What really needs to get done around builds is when you mark
> a Jira as patch available, we should do a patch build and test
> like Hadoop does. Noone has had time to do it to date, but if
> you are taking on the build, that would be a "nice to have".
>
> For 1015, should you wait for Avro?
>
> And if you missed it, here are the notes from the Cloudera off-site:
> http://wiki.apache.org/hadoop/DeveloperOffsite20090612
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
>   
>> -----Original Message-----
>> From: Andrew Purtell [mailto:apurtell@apache.org]
>> Sent: Saturday, June 20, 2009 12:11 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: who's doing what for 0.21?
>>
>> http://tinyurl.com/m7nt72
>>
>> I have an interest in these:
>>
>>  https://issues.apache.org/jira/browse/HBASE-1015
>>  https://issues.apache.org/jira/browse/HBASE-1295
>>  https://issues.apache.org/jira/browse/HBASE-1556
>>
>> I think for 1015 and 1295, there is interest on the part of at least
>> myself,
>> dj_ryan, and jgray. dj_ryan was saying something about su executives
>> making
>> 1295 a priority for him. We should figure out how to divide up and
>> assign
>> out the work.
>>
>> Also, probably I'll end up taking on the grunt work of 1556, because
>> it
>> needs to be done.
>>
>> Have we set a time and place for the next dev meeting?
>>
>>    - Andy
>>
>>     
>
>
>       
>

Re: who's doing what for 0.21?

Posted by Andrew Purtell <ap...@apache.org>.

I my opinion, we should not bother to wait for Avro. I've been hearing
about it on and off for three months now. If it is ready the day we
start work on 1015 for 0.21, and it fits the bill, fine, but otherwise
we should just use Thrift or pbufs. Thrift may be preferable as its
compact binary protocol is competitive with pbufs plus it has a fully
implemented async rpc stack. I think this applies for both 1015 and
1295. Also I'm skeptical that something to supplant RMI won't have
overheads related to that we don't need, e.g. transmitting class and
method names as strings, etc. 

   - Andy




________________________________
From: Jim Kellerman (POWERSET) <Ji...@microsoft.com>
To: "hbase-dev@hadoop.apache.org" <hb...@hadoop.apache.org>
Sent: Saturday, June 20, 2009 3:26:58 PM
Subject: RE: who's doing what for 0.21?

I am also interested in 1295 (I have quite a bit of experience
with cross data center replication), but more interested in
getting more of the master into zookeeper.

As for 1556, I might wait a bit. At the Cloudera off-site, one
of the things talked about was doing something similar for
Hadoop which we might leverage.

What really needs to get done around builds is when you mark
a Jira as patch available, we should do a patch build and test
like Hadoop does. Noone has had time to do it to date, but if
you are taking on the build, that would be a "nice to have".

For 1015, should you wait for Avro?

And if you missed it, here are the notes from the Cloudera off-site:
http://wiki.apache.org/hadoop/DeveloperOffsite20090612

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Saturday, June 20, 2009 12:11 PM
> To: hbase-dev@hadoop.apache.org
> Subject: who's doing what for 0.21?
>
> http://tinyurl.com/m7nt72
>
> I have an interest in these:
>
>  https://issues.apache.org/jira/browse/HBASE-1015
>  https://issues.apache.org/jira/browse/HBASE-1295
>  https://issues.apache.org/jira/browse/HBASE-1556
>
> I think for 1015 and 1295, there is interest on the part of at least
> myself,
> dj_ryan, and jgray. dj_ryan was saying something about su executives
> making
> 1295 a priority for him. We should figure out how to divide up and
> assign
> out the work.
>
> Also, probably I'll end up taking on the grunt work of 1556, because
> it
> needs to be done.
>
> Have we set a time and place for the next dev meeting?
>
>    - Andy
>

Re: RE: RE: who's doing what for 0.21?

Posted by stack <st...@duboce.net>.

The numbers published so far have avro better than the competition:
http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/200905.mbox/browser

Fellas close to us are talking about digging in on these tests to elicit how
realistic they are.

Avro -- http://people.apache.org/~cutting/avro/ -- is attractive for its
non-versioning goal, its more compact binary representation and because its
close-in to hadoop.  If it has the performance to-boot then its definetly
worth consideration.

St.Ack




On Sun, Jun 21, 2009 at 10:48 AM, Andrew Purtell <ap...@apache.org>wrote:

> > From: Ryan Rawson <ry...@gmail.com>
> > I note none of those targets have performance as an explicit goal.
>
> Exactly. If using Avro instead of an alternative such as Thrift or pbufs
> will be a performance hit, we should not consider it.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Ryan Rawson <ry...@gmail.com>
> To: hbase-dev@hadoop.apache.org
> Sent: Saturday, June 20, 2009 6:15:16 PM
> Subject: Re: RE: RE: who's doing what for 0.21?
>
> I note none of those targets have performance as an explicit goal.
>
> Right now hadoop rpc is too low performing they end up using the xciever
> protocol...so will they replace that as well?  Using a json defined self
> describing auto negotiating no code gen system?
>
> If we want to consider using it, we need a 1st class seat at the table and
> have to push hard for what we need.
>
> Does anyone know the people involved personally?
>
> On Jun 20, 2009 6:04 PM, "Jim Kellerman (POWERSET)" <
> Jim.Kellerman@microsoft.com> wrote:
>
> +1 on performance testing
>
> but on a side note, when Avro is integrated into Hadoop, they
> are looking at replacing the RPC
> - to make it language agnostic
> - and to introduce non-blocking soket I/O
> - and *finally* to multiplex sockets, reducing the number of open
> file handles.
>
> --- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) >
> -----Original Message-----
>
> > From: Ryan Rawson [mailto:ryanobjc@gmail.com] > Sent: Saturday, June 20,
> 2009 3:48 PM > To: hbase-...
>
> > Subject: Re: RE: who's doing what for 0.21? > > Man I can't believe my
> email client let me send an...
>
>
>
>
>

Re: RE: RE: who's doing what for 0.21?

Posted by Andrew Purtell <ap...@apache.org>.

> From: Ryan Rawson <ry...@gmail.com>
> I note none of those targets have performance as an explicit goal.

Exactly. If using Avro instead of an alternative such as Thrift or pbufs
will be a performance hit, we should not consider it.

   - Andy

________________________________
From: Ryan Rawson <ry...@gmail.com>
To: hbase-dev@hadoop.apache.org
Sent: Saturday, June 20, 2009 6:15:16 PM
Subject: Re: RE: RE: who's doing what for 0.21?

I note none of those targets have performance as an explicit goal.

Right now hadoop rpc is too low performing they end up using the xciever
protocol...so will they replace that as well?  Using a json defined self
describing auto negotiating no code gen system?

If we want to consider using it, we need a 1st class seat at the table and
have to push hard for what we need.

Does anyone know the people involved personally?

On Jun 20, 2009 6:04 PM, "Jim Kellerman (POWERSET)" <
Jim.Kellerman@microsoft.com> wrote:

+1 on performance testing

but on a side note, when Avro is integrated into Hadoop, they
are looking at replacing the RPC
- to make it language agnostic
- and to introduce non-blocking soket I/O
- and *finally* to multiplex sockets, reducing the number of open
file handles.

--- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) >
-----Original Message-----

> From: Ryan Rawson [mailto:ryanobjc@gmail.com] > Sent: Saturday, June 20,
2009 3:48 PM > To: hbase-...

> Subject: Re: RE: who's doing what for 0.21? > > Man I can't believe my
email client let me send an...

RE: RE: RE: who's doing what for 0.21?

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.

> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Saturday, June 20, 2009 6:15 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: RE: RE: who's doing what for 0.21?
>
> I note none of those targets have performance as an explicit goal.
>
> Right now hadoop rpc is too low performing they end up using the
> xciever
> protocol...so will they replace that as well?  Using a json defined
> self
> describing auto negotiating no code gen system?

That is my understanding.

> If we want to consider using it, we need a 1st class seat at the
> table and
> have to push hard for what we need.
>
> Does anyone know the people involved personally?

Doug Cutting is leading the project. I don't know who else is
involved.

I think they recognize that the current RPC is slow and a
resource hog (xceivers, open file handles).

They also want to get rid of the raw socket protocol used
between clients and data nodes.

The goal overall however is language agnostic wire protocol
without the need to compile anything.

Finally, I agree that if it is not performant then we should
not use it.

> On Jun 20, 2009 6:04 PM, "Jim Kellerman (POWERSET)" <
> Jim.Kellerman@microsoft.com> wrote:
>
> +1 on performance testing
>
> but on a side note, when Avro is integrated into Hadoop, they
> are looking at replacing the RPC
> - to make it language agnostic
> - and to introduce non-blocking soket I/O
> - and *finally* to multiplex sockets, reducing the number of open
>  file handles.
>
> --- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) >
> -----Original Message-----
>
> > From: Ryan Rawson [mailto:ryanobjc@gmail.com] > Sent: Saturday,
> June 20,
> 2009 3:48 PM > To: hbase-...
>
> > Subject: Re: RE: who's doing what for 0.21? > > Man I can't
> believe my
> email client let me send an...

Re: RE: RE: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

I note none of those targets have performance as an explicit goal.

Right now hadoop rpc is too low performing they end up using the xciever
protocol...so will they replace that as well?  Using a json defined self
describing auto negotiating no code gen system?

If we want to consider using it, we need a 1st class seat at the table and
have to push hard for what we need.

Does anyone know the people involved personally?

On Jun 20, 2009 6:04 PM, "Jim Kellerman (POWERSET)" <
Jim.Kellerman@microsoft.com> wrote:

+1 on performance testing

but on a side note, when Avro is integrated into Hadoop, they
are looking at replacing the RPC
- to make it language agnostic
- and to introduce non-blocking soket I/O
- and *finally* to multiplex sockets, reducing the number of open
 file handles.

--- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) >
-----Original Message-----

> From: Ryan Rawson [mailto:ryanobjc@gmail.com] > Sent: Saturday, June 20,
2009 3:48 PM > To: hbase-...

> Subject: Re: RE: who's doing what for 0.21? > > Man I can't believe my
email client let me send an...

RE: RE: who's doing what for 0.21?

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.

+1 on performance testing

but on a side note, when Avro is integrated into Hadoop, they
are looking at replacing the RPC
- to make it language agnostic
- and to introduce non-blocking soket I/O
- and *finally* to multiplex sockets, reducing the number of open
  file handles.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Saturday, June 20, 2009 3:48 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: RE: who's doing what for 0.21?
>
> Man I can't believe my email client let me send an empty message!
>
> I had a look at avro, it looked snazzy, with dynamic schema
> negotiation and
> run time this and that and no code generation.
>
> I have to ask though, will it help us achieve our performance goals?
> We
> should test carefully.
>
> On Jun 20, 2009 3:40 PM, "Ryan Rawson" <ry...@gmail.com> wrote:
>
> > On Jun 20, 2009 3:27 PM, "Jim Kellerman (POWERSET)" <
> Jim.Kellerman@microsoft.com> wrote: > > I am ...
>
> > -----Original Message----- > From: Andrew Purtell [mailto:
> apurtell@apache.org] > Sent: Saturday,...

Re: RE: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Man I can't believe my email client let me send an empty message!

I had a look at avro, it looked snazzy, with dynamic schema negotiation and
run time this and that and no code generation.

I have to ask though, will it help us achieve our performance goals? We
should test carefully.

On Jun 20, 2009 3:40 PM, "Ryan Rawson" <ry...@gmail.com> wrote:

> On Jun 20, 2009 3:27 PM, "Jim Kellerman (POWERSET)" <
Jim.Kellerman@microsoft.com> wrote: > > I am ...

> -----Original Message----- > From: Andrew Purtell [mailto:
apurtell@apache.org] > Sent: Saturday,...

Re: RE: who's doing what for 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

On Jun 20, 2009 3:27 PM, "Jim Kellerman (POWERSET)" <
Jim.Kellerman@microsoft.com> wrote:

I am also interested in 1295 (I have quite a bit of experience
with cross data center replication), but more interested in
getting more of the master into zookeeper.

As for 1556, I might wait a bit. At the Cloudera off-site, one
of the things talked about was doing something similar for
Hadoop which we might leverage.

What really needs to get done around builds is when you mark
a Jira as patch available, we should do a patch build and test
like Hadoop does. Noone has had time to do it to date, but if
you are taking on the build, that would be a "nice to have".

For 1015, should you wait for Avro?

And if you missed it, here are the notes from the Cloudera off-site:
http://wiki.apache.org/hadoop/DeveloperOffsite20090612

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)

> -----Original Message----- > From: Andrew Purtell [mailto:
apurtell@apache.org] > Sent: Saturday,...

RE: who's doing what for 0.21?

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.

I am also interested in 1295 (I have quite a bit of experience
with cross data center replication), but more interested in
getting more of the master into zookeeper.

As for 1556, I might wait a bit. At the Cloudera off-site, one
of the things talked about was doing something similar for
Hadoop which we might leverage.

What really needs to get done around builds is when you mark
a Jira as patch available, we should do a patch build and test
like Hadoop does. Noone has had time to do it to date, but if
you are taking on the build, that would be a "nice to have".

For 1015, should you wait for Avro?

And if you missed it, here are the notes from the Cloudera off-site:
http://wiki.apache.org/hadoop/DeveloperOffsite20090612

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Saturday, June 20, 2009 12:11 PM
> To: hbase-dev@hadoop.apache.org
> Subject: who's doing what for 0.21?
>
> http://tinyurl.com/m7nt72
>
> I have an interest in these:
>
>   https://issues.apache.org/jira/browse/HBASE-1015
>   https://issues.apache.org/jira/browse/HBASE-1295
>   https://issues.apache.org/jira/browse/HBASE-1556
>
> I think for 1015 and 1295, there is interest on the part of at least
> myself,
> dj_ryan, and jgray. dj_ryan was saying something about su executives
> making
> 1295 a priority for him. We should figure out how to divide up and
> assign
> out the work.
>
> Also, probably I'll end up taking on the grunt work of 1556, because
> it
> needs to be done.
>
> Have we set a time and place for the next dev meeting?
>
>    - Andy
>