You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Xiaoqiao He <xq...@gmail.com> on 2019/04/11 05:30:09 UTC

[VOTE]: Support for RBF data locality Solution

Hi forks,

The current implementation of RBF is not sensitive about data locality,
since NameNode could not get real client hostname by invoke
Server#getRemoteAddress when RPC request forward by Router to NameNode.
Therefore, it will lead to several challenges, for instance,

   - a. Client could have to go for remote read instead of local read,
   Short-Circuit could not be used in most cases.
   - b. Block placement policy could not run as except based on defined
   rack aware. Thus it will loss local node write.

There are some different solutions to solve data locality issue after
discussion, some of them will change RPC protocol, so we look forward to
furthermore suggestions and votes. HDFS-13248 is tracking the issue.

   - Approach A: Changing IPC/RPC layer protocol (IpcConnectionContextProto
   or RpcHeader#RpcRequestHeaderProto) and add extra field about client
   hostname. Of course the new field is optional, only input by Router and
   parse by Namenode in generally. This approach is compatibility and Client
   should do nothing after changing.
   - Approach B: Changing ClientProtocol and add extra interface
   create/append/getBlockLocations with additional parameter about client
   hostname. As approach A, it is input by Router and parse by Namenode, and
   also is compatibility.
   - Approach C: Solve write and read locality separately based on current
   interface and no changes, for write, hack client hostname as one of favor
   nodes for addBlocks, for read, reorder targets at Router after Namenode
   returns result to Router.

As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
layer protocol to support RPC data locality. We welcome more suggestions,
votes or just give us feedback to push forward this feature. Thanks.

Best Regards,
Hexiaoqiao

reference
[1] https://issues.apache.org/jira/browse/HDFS-13248
[2] https://issues.apache.org/jira/browse/HDFS-10467

[3] https://issues.apache.org/jira/browse/HDFS-12615

Re: [VOTE]: Support for RBF data locality Solution

Posted by Xiaoqiao He <xq...@gmail.com>.
Thanks everyone for discussing and voting for the issue.
Totally 6 +1s for Approach A (include my own +1).
I would like summary voting solution:

   - Add extra optional field about client hostname in
   RpcHeader#RpcRequestHeaderProto,
   - Router set RpcRequestHeader#clientHostname if necessary,
   - Namenode will get clientHostname when invoke #getRemoteAddress if
   RpcRequestHeader#clientHostname set, otherwise keeps current logic.

I will create new issue to push this feature forward.
Thanks all again.

On Fri, Apr 12, 2019 at 7:31 PM Vinayakumar B <vi...@apache.org>
wrote:

> +1 for approach A.
>
> On Thu, 11 Apr 2019, 12:23 pm Akira Ajisaka, <aa...@apache.org> wrote:
>
>> The Approach A looks good to me.
>>
>> Thanks,
>> Akira
>>
>> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
>> >
>> > Hi forks,
>> >
>> > The current implementation of RBF is not sensitive about data locality,
>> > since NameNode could not get real client hostname by invoke
>> > Server#getRemoteAddress when RPC request forward by Router to NameNode.
>> > Therefore, it will lead to several challenges, for instance,
>> >
>> >    - a. Client could have to go for remote read instead of local read,
>> >    Short-Circuit could not be used in most cases.
>> >    - b. Block placement policy could not run as except based on defined
>> >    rack aware. Thus it will loss local node write.
>> >
>> > There are some different solutions to solve data locality issue after
>> > discussion, some of them will change RPC protocol, so we look forward to
>> > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
>> >
>> >    - Approach A: Changing IPC/RPC layer protocol
>> (IpcConnectionContextProto
>> >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
>> >    hostname. Of course the new field is optional, only input by Router
>> and
>> >    parse by Namenode in generally. This approach is compatibility and
>> Client
>> >    should do nothing after changing.
>> >    - Approach B: Changing ClientProtocol and add extra interface
>> >    create/append/getBlockLocations with additional parameter about
>> client
>> >    hostname. As approach A, it is input by Router and parse by
>> Namenode, and
>> >    also is compatibility.
>> >    - Approach C: Solve write and read locality separately based on
>> current
>> >    interface and no changes, for write, hack client hostname as one of
>> favor
>> >    nodes for addBlocks, for read, reorder targets at Router after
>> Namenode
>> >    returns result to Router.
>> >
>> > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
>> > layer protocol to support RPC data locality. We welcome more
>> suggestions,
>> > votes or just give us feedback to push forward this feature. Thanks.
>> >
>> > Best Regards,
>> > Hexiaoqiao
>> >
>> > reference
>> > [1] https://issues.apache.org/jira/browse/HDFS-13248
>> > [2] https://issues.apache.org/jira/browse/HDFS-10467
>> >
>> > [3] https://issues.apache.org/jira/browse/HDFS-12615
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>>
>>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Xiaoqiao He <xq...@gmail.com>.
Thanks everyone for discussing and voting for the issue.
Totally 6 +1s for Approach A (include my own +1).
I would like summary voting solution:

   - Add extra optional field about client hostname in
   RpcHeader#RpcRequestHeaderProto,
   - Router set RpcRequestHeader#clientHostname if necessary,
   - Namenode will get clientHostname when invoke #getRemoteAddress if
   RpcRequestHeader#clientHostname set, otherwise keeps current logic.

I will create new issue to push this feature forward.
Thanks all again.

On Fri, Apr 12, 2019 at 7:31 PM Vinayakumar B <vi...@apache.org>
wrote:

> +1 for approach A.
>
> On Thu, 11 Apr 2019, 12:23 pm Akira Ajisaka, <aa...@apache.org> wrote:
>
>> The Approach A looks good to me.
>>
>> Thanks,
>> Akira
>>
>> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
>> >
>> > Hi forks,
>> >
>> > The current implementation of RBF is not sensitive about data locality,
>> > since NameNode could not get real client hostname by invoke
>> > Server#getRemoteAddress when RPC request forward by Router to NameNode.
>> > Therefore, it will lead to several challenges, for instance,
>> >
>> >    - a. Client could have to go for remote read instead of local read,
>> >    Short-Circuit could not be used in most cases.
>> >    - b. Block placement policy could not run as except based on defined
>> >    rack aware. Thus it will loss local node write.
>> >
>> > There are some different solutions to solve data locality issue after
>> > discussion, some of them will change RPC protocol, so we look forward to
>> > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
>> >
>> >    - Approach A: Changing IPC/RPC layer protocol
>> (IpcConnectionContextProto
>> >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
>> >    hostname. Of course the new field is optional, only input by Router
>> and
>> >    parse by Namenode in generally. This approach is compatibility and
>> Client
>> >    should do nothing after changing.
>> >    - Approach B: Changing ClientProtocol and add extra interface
>> >    create/append/getBlockLocations with additional parameter about
>> client
>> >    hostname. As approach A, it is input by Router and parse by
>> Namenode, and
>> >    also is compatibility.
>> >    - Approach C: Solve write and read locality separately based on
>> current
>> >    interface and no changes, for write, hack client hostname as one of
>> favor
>> >    nodes for addBlocks, for read, reorder targets at Router after
>> Namenode
>> >    returns result to Router.
>> >
>> > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
>> > layer protocol to support RPC data locality. We welcome more
>> suggestions,
>> > votes or just give us feedback to push forward this feature. Thanks.
>> >
>> > Best Regards,
>> > Hexiaoqiao
>> >
>> > reference
>> > [1] https://issues.apache.org/jira/browse/HDFS-13248
>> > [2] https://issues.apache.org/jira/browse/HDFS-10467
>> >
>> > [3] https://issues.apache.org/jira/browse/HDFS-12615
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>>
>>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Vinayakumar B <vi...@apache.org>.
+1 for approach A.

On Thu, 11 Apr 2019, 12:23 pm Akira Ajisaka, <aa...@apache.org> wrote:

> The Approach A looks good to me.
>
> Thanks,
> Akira
>
> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
> >
> > Hi forks,
> >
> > The current implementation of RBF is not sensitive about data locality,
> > since NameNode could not get real client hostname by invoke
> > Server#getRemoteAddress when RPC request forward by Router to NameNode.
> > Therefore, it will lead to several challenges, for instance,
> >
> >    - a. Client could have to go for remote read instead of local read,
> >    Short-Circuit could not be used in most cases.
> >    - b. Block placement policy could not run as except based on defined
> >    rack aware. Thus it will loss local node write.
> >
> > There are some different solutions to solve data locality issue after
> > discussion, some of them will change RPC protocol, so we look forward to
> > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
> >
> >    - Approach A: Changing IPC/RPC layer protocol
> (IpcConnectionContextProto
> >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
> >    hostname. Of course the new field is optional, only input by Router
> and
> >    parse by Namenode in generally. This approach is compatibility and
> Client
> >    should do nothing after changing.
> >    - Approach B: Changing ClientProtocol and add extra interface
> >    create/append/getBlockLocations with additional parameter about client
> >    hostname. As approach A, it is input by Router and parse by Namenode,
> and
> >    also is compatibility.
> >    - Approach C: Solve write and read locality separately based on
> current
> >    interface and no changes, for write, hack client hostname as one of
> favor
> >    nodes for addBlocks, for read, reorder targets at Router after
> Namenode
> >    returns result to Router.
> >
> > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> > layer protocol to support RPC data locality. We welcome more suggestions,
> > votes or just give us feedback to push forward this feature. Thanks.
> >
> > Best Regards,
> > Hexiaoqiao
> >
> > reference
> > [1] https://issues.apache.org/jira/browse/HDFS-13248
> > [2] https://issues.apache.org/jira/browse/HDFS-10467
> >
> > [3] https://issues.apache.org/jira/browse/HDFS-12615
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Ayush Saxena <ay...@gmail.com>.
Thanx Hexiaoqiao for putting this up.
As already discussed at the JIRA
The Approach A sounds best to me.

-Ayush

> On 11-Apr-2019, at 11:50 PM, Giovanni Matteo Fumarola <gi...@gmail.com> wrote:
> 
> +1 on Approach A.
> 
>> On Thu, Apr 11, 2019 at 10:30 AM Iñigo Goiri <el...@gmail.com> wrote:
>> 
>> Thanks Hexiaoqiao for starting the vote.
>> As I said in the JIRA, I prefer Approach A.
>> 
>> I wanted to bring a broader audience as this has changes in RBF, HDFS and
>> Commons.
>> I think adding a new optional field to the RPC header should be lightweight
>> enough.
>> The idea of passing a proxied client is already available in places like
>> UGI but not to this level.
>> I haven't been able to figure other uses but maybe other applications could
>> take advantage of this new field.
>> 
>> Please, raise any concerns regarding any of the 3 approaches proposed.
>> 
>> On Wed, Apr 10, 2019 at 11:53 PM Akira Ajisaka <aa...@apache.org>
>> wrote:
>> 
>>> The Approach A looks good to me.
>>> 
>>> Thanks,
>>> Akira
>>> 
>>>> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
>>>> 
>>>> Hi forks,
>>>> 
>>>> The current implementation of RBF is not sensitive about data locality,
>>>> since NameNode could not get real client hostname by invoke
>>>> Server#getRemoteAddress when RPC request forward by Router to NameNode.
>>>> Therefore, it will lead to several challenges, for instance,
>>>> 
>>>>   - a. Client could have to go for remote read instead of local read,
>>>>   Short-Circuit could not be used in most cases.
>>>>   - b. Block placement policy could not run as except based on defined
>>>>   rack aware. Thus it will loss local node write.
>>>> 
>>>> There are some different solutions to solve data locality issue after
>>>> discussion, some of them will change RPC protocol, so we look forward
>> to
>>>> furthermore suggestions and votes. HDFS-13248 is tracking the issue.
>>>> 
>>>>   - Approach A: Changing IPC/RPC layer protocol
>>> (IpcConnectionContextProto
>>>>   or RpcHeader#RpcRequestHeaderProto) and add extra field about client
>>>>   hostname. Of course the new field is optional, only input by Router
>>> and
>>>>   parse by Namenode in generally. This approach is compatibility and
>>> Client
>>>>   should do nothing after changing.
>>>>   - Approach B: Changing ClientProtocol and add extra interface
>>>>   create/append/getBlockLocations with additional parameter about
>> client
>>>>   hostname. As approach A, it is input by Router and parse by
>> Namenode,
>>> and
>>>>   also is compatibility.
>>>>   - Approach C: Solve write and read locality separately based on
>>> current
>>>>   interface and no changes, for write, hack client hostname as one of
>>> favor
>>>>   nodes for addBlocks, for read, reorder targets at Router after
>>> Namenode
>>>>   returns result to Router.
>>>> 
>>>> As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
>>>> layer protocol to support RPC data locality. We welcome more
>> suggestions,
>>>> votes or just give us feedback to push forward this feature. Thanks.
>>>> 
>>>> Best Regards,
>>>> Hexiaoqiao
>>>> 
>>>> reference
>>>> [1] https://issues.apache.org/jira/browse/HDFS-13248
>>>> [2] https://issues.apache.org/jira/browse/HDFS-10467
>>>> 
>>>> [3] https://issues.apache.org/jira/browse/HDFS-12615
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>>> 
>>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [VOTE]: Support for RBF data locality Solution

Posted by Ayush Saxena <ay...@gmail.com>.
Thanx Hexiaoqiao for putting this up.
As already discussed at the JIRA
The Approach A sounds best to me.

-Ayush

> On 11-Apr-2019, at 11:50 PM, Giovanni Matteo Fumarola <gi...@gmail.com> wrote:
> 
> +1 on Approach A.
> 
>> On Thu, Apr 11, 2019 at 10:30 AM Iñigo Goiri <el...@gmail.com> wrote:
>> 
>> Thanks Hexiaoqiao for starting the vote.
>> As I said in the JIRA, I prefer Approach A.
>> 
>> I wanted to bring a broader audience as this has changes in RBF, HDFS and
>> Commons.
>> I think adding a new optional field to the RPC header should be lightweight
>> enough.
>> The idea of passing a proxied client is already available in places like
>> UGI but not to this level.
>> I haven't been able to figure other uses but maybe other applications could
>> take advantage of this new field.
>> 
>> Please, raise any concerns regarding any of the 3 approaches proposed.
>> 
>> On Wed, Apr 10, 2019 at 11:53 PM Akira Ajisaka <aa...@apache.org>
>> wrote:
>> 
>>> The Approach A looks good to me.
>>> 
>>> Thanks,
>>> Akira
>>> 
>>>> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
>>>> 
>>>> Hi forks,
>>>> 
>>>> The current implementation of RBF is not sensitive about data locality,
>>>> since NameNode could not get real client hostname by invoke
>>>> Server#getRemoteAddress when RPC request forward by Router to NameNode.
>>>> Therefore, it will lead to several challenges, for instance,
>>>> 
>>>>   - a. Client could have to go for remote read instead of local read,
>>>>   Short-Circuit could not be used in most cases.
>>>>   - b. Block placement policy could not run as except based on defined
>>>>   rack aware. Thus it will loss local node write.
>>>> 
>>>> There are some different solutions to solve data locality issue after
>>>> discussion, some of them will change RPC protocol, so we look forward
>> to
>>>> furthermore suggestions and votes. HDFS-13248 is tracking the issue.
>>>> 
>>>>   - Approach A: Changing IPC/RPC layer protocol
>>> (IpcConnectionContextProto
>>>>   or RpcHeader#RpcRequestHeaderProto) and add extra field about client
>>>>   hostname. Of course the new field is optional, only input by Router
>>> and
>>>>   parse by Namenode in generally. This approach is compatibility and
>>> Client
>>>>   should do nothing after changing.
>>>>   - Approach B: Changing ClientProtocol and add extra interface
>>>>   create/append/getBlockLocations with additional parameter about
>> client
>>>>   hostname. As approach A, it is input by Router and parse by
>> Namenode,
>>> and
>>>>   also is compatibility.
>>>>   - Approach C: Solve write and read locality separately based on
>>> current
>>>>   interface and no changes, for write, hack client hostname as one of
>>> favor
>>>>   nodes for addBlocks, for read, reorder targets at Router after
>>> Namenode
>>>>   returns result to Router.
>>>> 
>>>> As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
>>>> layer protocol to support RPC data locality. We welcome more
>> suggestions,
>>>> votes or just give us feedback to push forward this feature. Thanks.
>>>> 
>>>> Best Regards,
>>>> Hexiaoqiao
>>>> 
>>>> reference
>>>> [1] https://issues.apache.org/jira/browse/HDFS-13248
>>>> [2] https://issues.apache.org/jira/browse/HDFS-10467
>>>> 
>>>> [3] https://issues.apache.org/jira/browse/HDFS-12615
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>>> 
>>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [VOTE]: Support for RBF data locality Solution

Posted by Giovanni Matteo Fumarola <gi...@gmail.com>.
+1 on Approach A.

On Thu, Apr 11, 2019 at 10:30 AM Iñigo Goiri <el...@gmail.com> wrote:

> Thanks Hexiaoqiao for starting the vote.
> As I said in the JIRA, I prefer Approach A.
>
> I wanted to bring a broader audience as this has changes in RBF, HDFS and
> Commons.
> I think adding a new optional field to the RPC header should be lightweight
> enough.
> The idea of passing a proxied client is already available in places like
> UGI but not to this level.
> I haven't been able to figure other uses but maybe other applications could
> take advantage of this new field.
>
> Please, raise any concerns regarding any of the 3 approaches proposed.
>
> On Wed, Apr 10, 2019 at 11:53 PM Akira Ajisaka <aa...@apache.org>
> wrote:
>
> > The Approach A looks good to me.
> >
> > Thanks,
> > Akira
> >
> > On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
> > >
> > > Hi forks,
> > >
> > > The current implementation of RBF is not sensitive about data locality,
> > > since NameNode could not get real client hostname by invoke
> > > Server#getRemoteAddress when RPC request forward by Router to NameNode.
> > > Therefore, it will lead to several challenges, for instance,
> > >
> > >    - a. Client could have to go for remote read instead of local read,
> > >    Short-Circuit could not be used in most cases.
> > >    - b. Block placement policy could not run as except based on defined
> > >    rack aware. Thus it will loss local node write.
> > >
> > > There are some different solutions to solve data locality issue after
> > > discussion, some of them will change RPC protocol, so we look forward
> to
> > > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
> > >
> > >    - Approach A: Changing IPC/RPC layer protocol
> > (IpcConnectionContextProto
> > >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
> > >    hostname. Of course the new field is optional, only input by Router
> > and
> > >    parse by Namenode in generally. This approach is compatibility and
> > Client
> > >    should do nothing after changing.
> > >    - Approach B: Changing ClientProtocol and add extra interface
> > >    create/append/getBlockLocations with additional parameter about
> client
> > >    hostname. As approach A, it is input by Router and parse by
> Namenode,
> > and
> > >    also is compatibility.
> > >    - Approach C: Solve write and read locality separately based on
> > current
> > >    interface and no changes, for write, hack client hostname as one of
> > favor
> > >    nodes for addBlocks, for read, reorder targets at Router after
> > Namenode
> > >    returns result to Router.
> > >
> > > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> > > layer protocol to support RPC data locality. We welcome more
> suggestions,
> > > votes or just give us feedback to push forward this feature. Thanks.
> > >
> > > Best Regards,
> > > Hexiaoqiao
> > >
> > > reference
> > > [1] https://issues.apache.org/jira/browse/HDFS-13248
> > > [2] https://issues.apache.org/jira/browse/HDFS-10467
> > >
> > > [3] https://issues.apache.org/jira/browse/HDFS-12615
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >
> >
>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Giovanni Matteo Fumarola <gi...@gmail.com>.
+1 on Approach A.

On Thu, Apr 11, 2019 at 10:30 AM Iñigo Goiri <el...@gmail.com> wrote:

> Thanks Hexiaoqiao for starting the vote.
> As I said in the JIRA, I prefer Approach A.
>
> I wanted to bring a broader audience as this has changes in RBF, HDFS and
> Commons.
> I think adding a new optional field to the RPC header should be lightweight
> enough.
> The idea of passing a proxied client is already available in places like
> UGI but not to this level.
> I haven't been able to figure other uses but maybe other applications could
> take advantage of this new field.
>
> Please, raise any concerns regarding any of the 3 approaches proposed.
>
> On Wed, Apr 10, 2019 at 11:53 PM Akira Ajisaka <aa...@apache.org>
> wrote:
>
> > The Approach A looks good to me.
> >
> > Thanks,
> > Akira
> >
> > On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
> > >
> > > Hi forks,
> > >
> > > The current implementation of RBF is not sensitive about data locality,
> > > since NameNode could not get real client hostname by invoke
> > > Server#getRemoteAddress when RPC request forward by Router to NameNode.
> > > Therefore, it will lead to several challenges, for instance,
> > >
> > >    - a. Client could have to go for remote read instead of local read,
> > >    Short-Circuit could not be used in most cases.
> > >    - b. Block placement policy could not run as except based on defined
> > >    rack aware. Thus it will loss local node write.
> > >
> > > There are some different solutions to solve data locality issue after
> > > discussion, some of them will change RPC protocol, so we look forward
> to
> > > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
> > >
> > >    - Approach A: Changing IPC/RPC layer protocol
> > (IpcConnectionContextProto
> > >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
> > >    hostname. Of course the new field is optional, only input by Router
> > and
> > >    parse by Namenode in generally. This approach is compatibility and
> > Client
> > >    should do nothing after changing.
> > >    - Approach B: Changing ClientProtocol and add extra interface
> > >    create/append/getBlockLocations with additional parameter about
> client
> > >    hostname. As approach A, it is input by Router and parse by
> Namenode,
> > and
> > >    also is compatibility.
> > >    - Approach C: Solve write and read locality separately based on
> > current
> > >    interface and no changes, for write, hack client hostname as one of
> > favor
> > >    nodes for addBlocks, for read, reorder targets at Router after
> > Namenode
> > >    returns result to Router.
> > >
> > > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> > > layer protocol to support RPC data locality. We welcome more
> suggestions,
> > > votes or just give us feedback to push forward this feature. Thanks.
> > >
> > > Best Regards,
> > > Hexiaoqiao
> > >
> > > reference
> > > [1] https://issues.apache.org/jira/browse/HDFS-13248
> > > [2] https://issues.apache.org/jira/browse/HDFS-10467
> > >
> > > [3] https://issues.apache.org/jira/browse/HDFS-12615
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >
> >
>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Iñigo Goiri <el...@gmail.com>.
Thanks Hexiaoqiao for starting the vote.
As I said in the JIRA, I prefer Approach A.

I wanted to bring a broader audience as this has changes in RBF, HDFS and
Commons.
I think adding a new optional field to the RPC header should be lightweight
enough.
The idea of passing a proxied client is already available in places like
UGI but not to this level.
I haven't been able to figure other uses but maybe other applications could
take advantage of this new field.

Please, raise any concerns regarding any of the 3 approaches proposed.

On Wed, Apr 10, 2019 at 11:53 PM Akira Ajisaka <aa...@apache.org> wrote:

> The Approach A looks good to me.
>
> Thanks,
> Akira
>
> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
> >
> > Hi forks,
> >
> > The current implementation of RBF is not sensitive about data locality,
> > since NameNode could not get real client hostname by invoke
> > Server#getRemoteAddress when RPC request forward by Router to NameNode.
> > Therefore, it will lead to several challenges, for instance,
> >
> >    - a. Client could have to go for remote read instead of local read,
> >    Short-Circuit could not be used in most cases.
> >    - b. Block placement policy could not run as except based on defined
> >    rack aware. Thus it will loss local node write.
> >
> > There are some different solutions to solve data locality issue after
> > discussion, some of them will change RPC protocol, so we look forward to
> > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
> >
> >    - Approach A: Changing IPC/RPC layer protocol
> (IpcConnectionContextProto
> >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
> >    hostname. Of course the new field is optional, only input by Router
> and
> >    parse by Namenode in generally. This approach is compatibility and
> Client
> >    should do nothing after changing.
> >    - Approach B: Changing ClientProtocol and add extra interface
> >    create/append/getBlockLocations with additional parameter about client
> >    hostname. As approach A, it is input by Router and parse by Namenode,
> and
> >    also is compatibility.
> >    - Approach C: Solve write and read locality separately based on
> current
> >    interface and no changes, for write, hack client hostname as one of
> favor
> >    nodes for addBlocks, for read, reorder targets at Router after
> Namenode
> >    returns result to Router.
> >
> > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> > layer protocol to support RPC data locality. We welcome more suggestions,
> > votes or just give us feedback to push forward this feature. Thanks.
> >
> > Best Regards,
> > Hexiaoqiao
> >
> > reference
> > [1] https://issues.apache.org/jira/browse/HDFS-13248
> > [2] https://issues.apache.org/jira/browse/HDFS-10467
> >
> > [3] https://issues.apache.org/jira/browse/HDFS-12615
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Iñigo Goiri <el...@gmail.com>.
Thanks Hexiaoqiao for starting the vote.
As I said in the JIRA, I prefer Approach A.

I wanted to bring a broader audience as this has changes in RBF, HDFS and
Commons.
I think adding a new optional field to the RPC header should be lightweight
enough.
The idea of passing a proxied client is already available in places like
UGI but not to this level.
I haven't been able to figure other uses but maybe other applications could
take advantage of this new field.

Please, raise any concerns regarding any of the 3 approaches proposed.

On Wed, Apr 10, 2019 at 11:53 PM Akira Ajisaka <aa...@apache.org> wrote:

> The Approach A looks good to me.
>
> Thanks,
> Akira
>
> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
> >
> > Hi forks,
> >
> > The current implementation of RBF is not sensitive about data locality,
> > since NameNode could not get real client hostname by invoke
> > Server#getRemoteAddress when RPC request forward by Router to NameNode.
> > Therefore, it will lead to several challenges, for instance,
> >
> >    - a. Client could have to go for remote read instead of local read,
> >    Short-Circuit could not be used in most cases.
> >    - b. Block placement policy could not run as except based on defined
> >    rack aware. Thus it will loss local node write.
> >
> > There are some different solutions to solve data locality issue after
> > discussion, some of them will change RPC protocol, so we look forward to
> > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
> >
> >    - Approach A: Changing IPC/RPC layer protocol
> (IpcConnectionContextProto
> >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
> >    hostname. Of course the new field is optional, only input by Router
> and
> >    parse by Namenode in generally. This approach is compatibility and
> Client
> >    should do nothing after changing.
> >    - Approach B: Changing ClientProtocol and add extra interface
> >    create/append/getBlockLocations with additional parameter about client
> >    hostname. As approach A, it is input by Router and parse by Namenode,
> and
> >    also is compatibility.
> >    - Approach C: Solve write and read locality separately based on
> current
> >    interface and no changes, for write, hack client hostname as one of
> favor
> >    nodes for addBlocks, for read, reorder targets at Router after
> Namenode
> >    returns result to Router.
> >
> > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> > layer protocol to support RPC data locality. We welcome more suggestions,
> > votes or just give us feedback to push forward this feature. Thanks.
> >
> > Best Regards,
> > Hexiaoqiao
> >
> > reference
> > [1] https://issues.apache.org/jira/browse/HDFS-13248
> > [2] https://issues.apache.org/jira/browse/HDFS-10467
> >
> > [3] https://issues.apache.org/jira/browse/HDFS-12615
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Vinayakumar B <vi...@apache.org>.
+1 for approach A.

On Thu, 11 Apr 2019, 12:23 pm Akira Ajisaka, <aa...@apache.org> wrote:

> The Approach A looks good to me.
>
> Thanks,
> Akira
>
> On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
> >
> > Hi forks,
> >
> > The current implementation of RBF is not sensitive about data locality,
> > since NameNode could not get real client hostname by invoke
> > Server#getRemoteAddress when RPC request forward by Router to NameNode.
> > Therefore, it will lead to several challenges, for instance,
> >
> >    - a. Client could have to go for remote read instead of local read,
> >    Short-Circuit could not be used in most cases.
> >    - b. Block placement policy could not run as except based on defined
> >    rack aware. Thus it will loss local node write.
> >
> > There are some different solutions to solve data locality issue after
> > discussion, some of them will change RPC protocol, so we look forward to
> > furthermore suggestions and votes. HDFS-13248 is tracking the issue.
> >
> >    - Approach A: Changing IPC/RPC layer protocol
> (IpcConnectionContextProto
> >    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
> >    hostname. Of course the new field is optional, only input by Router
> and
> >    parse by Namenode in generally. This approach is compatibility and
> Client
> >    should do nothing after changing.
> >    - Approach B: Changing ClientProtocol and add extra interface
> >    create/append/getBlockLocations with additional parameter about client
> >    hostname. As approach A, it is input by Router and parse by Namenode,
> and
> >    also is compatibility.
> >    - Approach C: Solve write and read locality separately based on
> current
> >    interface and no changes, for write, hack client hostname as one of
> favor
> >    nodes for addBlocks, for read, reorder targets at Router after
> Namenode
> >    returns result to Router.
> >
> > As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> > layer protocol to support RPC data locality. We welcome more suggestions,
> > votes or just give us feedback to push forward this feature. Thanks.
> >
> > Best Regards,
> > Hexiaoqiao
> >
> > reference
> > [1] https://issues.apache.org/jira/browse/HDFS-13248
> > [2] https://issues.apache.org/jira/browse/HDFS-10467
> >
> > [3] https://issues.apache.org/jira/browse/HDFS-12615
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>

Re: [VOTE]: Support for RBF data locality Solution

Posted by Akira Ajisaka <aa...@apache.org>.
The Approach A looks good to me.

Thanks,
Akira

On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
>
> Hi forks,
>
> The current implementation of RBF is not sensitive about data locality,
> since NameNode could not get real client hostname by invoke
> Server#getRemoteAddress when RPC request forward by Router to NameNode.
> Therefore, it will lead to several challenges, for instance,
>
>    - a. Client could have to go for remote read instead of local read,
>    Short-Circuit could not be used in most cases.
>    - b. Block placement policy could not run as except based on defined
>    rack aware. Thus it will loss local node write.
>
> There are some different solutions to solve data locality issue after
> discussion, some of them will change RPC protocol, so we look forward to
> furthermore suggestions and votes. HDFS-13248 is tracking the issue.
>
>    - Approach A: Changing IPC/RPC layer protocol (IpcConnectionContextProto
>    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
>    hostname. Of course the new field is optional, only input by Router and
>    parse by Namenode in generally. This approach is compatibility and Client
>    should do nothing after changing.
>    - Approach B: Changing ClientProtocol and add extra interface
>    create/append/getBlockLocations with additional parameter about client
>    hostname. As approach A, it is input by Router and parse by Namenode, and
>    also is compatibility.
>    - Approach C: Solve write and read locality separately based on current
>    interface and no changes, for write, hack client hostname as one of favor
>    nodes for addBlocks, for read, reorder targets at Router after Namenode
>    returns result to Router.
>
> As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> layer protocol to support RPC data locality. We welcome more suggestions,
> votes or just give us feedback to push forward this feature. Thanks.
>
> Best Regards,
> Hexiaoqiao
>
> reference
> [1] https://issues.apache.org/jira/browse/HDFS-13248
> [2] https://issues.apache.org/jira/browse/HDFS-10467
>
> [3] https://issues.apache.org/jira/browse/HDFS-12615

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [VOTE]: Support for RBF data locality Solution

Posted by Akira Ajisaka <aa...@apache.org>.
The Approach A looks good to me.

Thanks,
Akira

On Thu, Apr 11, 2019 at 2:30 PM Xiaoqiao He <xq...@gmail.com> wrote:
>
> Hi forks,
>
> The current implementation of RBF is not sensitive about data locality,
> since NameNode could not get real client hostname by invoke
> Server#getRemoteAddress when RPC request forward by Router to NameNode.
> Therefore, it will lead to several challenges, for instance,
>
>    - a. Client could have to go for remote read instead of local read,
>    Short-Circuit could not be used in most cases.
>    - b. Block placement policy could not run as except based on defined
>    rack aware. Thus it will loss local node write.
>
> There are some different solutions to solve data locality issue after
> discussion, some of them will change RPC protocol, so we look forward to
> furthermore suggestions and votes. HDFS-13248 is tracking the issue.
>
>    - Approach A: Changing IPC/RPC layer protocol (IpcConnectionContextProto
>    or RpcHeader#RpcRequestHeaderProto) and add extra field about client
>    hostname. Of course the new field is optional, only input by Router and
>    parse by Namenode in generally. This approach is compatibility and Client
>    should do nothing after changing.
>    - Approach B: Changing ClientProtocol and add extra interface
>    create/append/getBlockLocations with additional parameter about client
>    hostname. As approach A, it is input by Router and parse by Namenode, and
>    also is compatibility.
>    - Approach C: Solve write and read locality separately based on current
>    interface and no changes, for write, hack client hostname as one of favor
>    nodes for addBlocks, for read, reorder targets at Router after Namenode
>    returns result to Router.
>
> As discussion and evaluation in HDFS-13248, we prefer to change IPC/RPC
> layer protocol to support RPC data locality. We welcome more suggestions,
> votes or just give us feedback to push forward this feature. Thanks.
>
> Best Regards,
> Hexiaoqiao
>
> reference
> [1] https://issues.apache.org/jira/browse/HDFS-13248
> [2] https://issues.apache.org/jira/browse/HDFS-10467
>
> [3] https://issues.apache.org/jira/browse/HDFS-12615

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org