You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@knox.apache.org by Maksim Kononenko <mk...@hortonworks.com> on 2013/10/25 14:55:34 UTC

Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Hi guys,

I was researching/testing Knox HA with Apache HTTP Server +  mod_proxy +
mod_proxy_balancer.
Here is what I found.
I.   3 load balancer scheduler algorithms available for use: Request
Counting, Weighted Traffic Counting and Pending Request Counting. (
http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler)
II.  Load balancer stickyness. (
http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness)
     I configured and tested stickyness. Worked as it had to be.
III. Failover. (
http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass)
     1. I ran foolowing use cases:
        a) Knox instance is down before client request comes in.
            Steps:
                - Configure Apache HTTP Server to proxy two Knox instances;
                - Shoot down Knox instance A;
                - Execute client request;
                - Verify that Knox instance A is marked as unavailable and
client's request is redirected to Knox instance B;
                - Verify that all subsequent requests in scope of the same
client's session are passed just to Knox instance B;
                - Verify that client's requests in scope of new session are
tried to be passed to Knox instance A.
                  It is required because Knox instance A could be started
before new client's session.
            This use case works fine.
        b) Knox instance goes down when it processes client's PUT request.
            Steps:
                - Start executing PUT file to HDFS with medium size (200Mb);
                - After some time shoot down Knox instance which processes
this request;
                - Verify that client gets 500 status code and no failover
takes place.
            This use case works as it is described. Apache HTTP Server is
not able to do failover in this case.
        c) Knox instance goes down when it processes client's GET request.
            Steps:
                - Start executing GET file from HDFS with medium size
(200Mb);
                - After some time shoot down Knox instance which processes
this request;
                - Verify that client gets 200 status code, 'Content-Length'
header with value equals to file size and some bytes in the body.
                  To execute this test I used as a client:
                    1) HttpClient - it doesn't produce any error when
stream is closed.
                    2) CURL - it doesn't produce any error when stream is
closed.
                    3) Firefox browser - it doesn't produce any error when
stream is closed.
                  All clients just download available bytes before stream
is closed, so client has to manually compare 'Content-Length' header value
and received bytes length.
                - No failover takes place.
            This use case works as it is described. Apache HTTP Server is
not able to do failover in this case.
     2. Additional use cases.
        What new cases could you advise?
IV. What functionality did I miss?

Maksim.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Posted by Dilli Arumugam <da...@hortonworks.com>.

Agreed.
Dilli


On Fri, Oct 25, 2013 at 9:23 AM, Kevin Minder
<ke...@hortonworks.com>wrote:

> I believe that since the Content-Length is a header that is written before
> the body is rewritten that the best we can do is avoid removing the
> Content-Length header when we know that we will not be rewriting the body.
>
>
> On 10/25/13 12:09 PM, Dilli Arumugam wrote:
>
>> Kevin,
>>
>> I should have done some tests and detected Content-Length is not reaching
>> the client.
>> Good,  Maksim detected it.
>>
>> As far your comment (2), I believe if Knox is rewriting the content, it
>> should  rewrite the Content-Length ideally. But, it is not going to be
>> practical. Needs some research on how to fix the problem right.
>>
>> Thanks
>> Dilli
>>
>>
>> On Fri, Oct 25, 2013 at 9:01 AM, Kevin Minder
>> <ke...@hortonworks.com>**wrote:
>>
>>  I was afraid that Knox might actually be removing the Content-Length
>>> header.  Dilli is going to yell at me about that BTW!
>>>
>>> So there are two things that need to be done.
>>>
>>> 1) Determine the client (e.g. curl) behavior when Content-Length is
>>> specified.
>>>
>>> 2) Make changes in Knox so that the Content-Length response header is
>>> only
>>> removed if the body is being rewritten.
>>>
>>> Please file a jira for #2.  I've already given this some thought so I can
>>> add detail.
>>>
>>>
>>> On 10/25/13 11:55 AM, Maksim Kononenko wrote:
>>>
>>>  On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder
>>>> <ke...@hortonworks.com>****wrote:
>>>>
>>>>   Maksim,
>>>>
>>>>> Great work!
>>>>> Discussion inline below.
>>>>> Recommended next steps.
>>>>> 1) Add the setup steps required to get all of this working to the
>>>>> user's
>>>>> guide.  File a jira.
>>>>> 2) Figure out a way to automate these tests.  Might be hard on Apache
>>>>> infra.
>>>>> Kevin.
>>>>>
>>>>>
>>>>> On 10/25/13 8:55 AM, Maksim Kononenko wrote:
>>>>>
>>>>>   Hi guys,
>>>>>
>>>>>> I was researching/testing Knox HA with Apache HTTP Server +
>>>>>>  mod_proxy +
>>>>>> mod_proxy_balancer.
>>>>>> Here is what I found.
>>>>>> I.   3 load balancer scheduler algorithms available for use: Request
>>>>>> Counting, Weighted Traffic Counting and Pending Request Counting. (
>>>>>> http://httpd.apache.org/docs/******2.2/mod/mod_proxy_balancer.****<http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**>
>>>>>> <http://httpd.apache.org/**docs/**2.2/mod/mod_proxy_**balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>>>> >
>>>>>> html#scheduler<http://httpd.****apache.org/docs/2.2/mod/mod_**<http://apache.org/docs/2.2/mod/mod_**>
>>>>>> proxy_balancer.html#scheduler<**http://httpd.apache.org/docs/**
>>>>>> 2.2/mod/mod_proxy_balancer.**html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
>>>>>> >
>>>>>>
>>>>>> )
>>>>>> II.  Load balancer stickyness. (
>>>>>> http://httpd.apache.org/docs/******2.2/mod/mod_proxy_balancer.****<http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**>
>>>>>> <http://httpd.apache.org/**docs/**2.2/mod/mod_proxy_**balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>>>> >
>>>>>> html#stickyness<http://httpd.****apache.org/docs/2.2/mod/mod_****<http://apache.org/docs/2.2/mod/mod_**>
>>>>>> proxy_balancer.html#**stickyness<http://httpd.**
>>>>>> apache.org/docs/2.2/mod/mod_**proxy_balancer.html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
>>>>>> **>
>>>>>> **>
>>>>>>
>>>>>> )
>>>>>>         I configured and tested stickyness. Worked as it had to be.
>>>>>> III. Failover. (
>>>>>> http://httpd.apache.org/docs/******2.2/mod/mod_proxy.html#******
>>>>>> proxypass<http://httpd.apache.org/docs/****2.2/mod/mod_proxy.html#****proxypass>
>>>>>> <http://httpd.apache.**org/docs/**2.2/mod/mod_proxy.**
>>>>>> html#**proxypass<http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass>
>>>>>> >
>>>>>> <http://httpd.apache.**org/**docs/2.2/mod/mod_proxy.**html#**
>>>>>> proxypass<http://httpd.apache.**org/docs/2.2/mod/mod_proxy.**
>>>>>> html#proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
>>>>>> >
>>>>>>
>>>>>> )
>>>>>>         1. I ran foolowing use cases:
>>>>>>            a) Knox instance is down before client request comes in.
>>>>>>                Steps:
>>>>>>                    - Configure Apache HTTP Server to proxy two Knox
>>>>>> instances;
>>>>>>                    - Shoot down Knox instance A;
>>>>>>                    - Execute client request;
>>>>>>                    - Verify that Knox instance A is marked as
>>>>>> unavailable
>>>>>> and
>>>>>> client's request is redirected to Knox instance B;
>>>>>>                    - Verify that all subsequent requests in scope of
>>>>>> the
>>>>>> same
>>>>>> client's session are passed just to Knox instance B;
>>>>>>                    - Verify that client's requests in scope of new
>>>>>> session
>>>>>> are
>>>>>> tried to be passed to Knox instance A.
>>>>>>                      It is required because Knox instance A could be
>>>>>> started
>>>>>> before new client's session.
>>>>>>
>>>>>>   This seems a little sub-optimal to me but there may be nothing we
>>>>>> can
>>>>>>
>>>>> do
>>>>> about it.
>>>>> The issue that I have is that I don't think Apache should be trying
>>>>> instance-A first every time in this case.
>>>>> So the question is how is Apache distributing load over instance-A and
>>>>> instance-B?
>>>>> Does it always try instance-A first or does it sometimes try instance-B
>>>>> first?
>>>>> In addition if it gets a failure for instance-A ideally it would take
>>>>> it
>>>>> out of the "pool" for some (ideally configurable) period of time.
>>>>>
>>>>>  It depends on the  load balancer scheduler algorithm. For my tests I
>>>> used
>>>> Request Counting.
>>>> I'll look for any configuration related to take out of the "pool" time.
>>>>
>>>>                  This use case works fine.
>>>>
>>>>>            b) Knox instance goes down when it processes client's PUT
>>>>>> request.
>>>>>>                Steps:
>>>>>>                    - Start executing PUT file to HDFS with medium size
>>>>>> (200Mb);
>>>>>>                    - After some time shoot down Knox instance which
>>>>>> processes
>>>>>> this request;
>>>>>>                    - Verify that client gets 500 status code and no
>>>>>> failover
>>>>>> takes place.
>>>>>>                This use case works as it is described. Apache HTTP
>>>>>> Server is
>>>>>> not able to do failover in this case.
>>>>>>            c) Knox instance goes down when it processes client's GET
>>>>>> request.
>>>>>>                Steps:
>>>>>>                    - Start executing GET file from HDFS with medium
>>>>>> size
>>>>>> (200Mb);
>>>>>>                    - After some time shoot down Knox instance which
>>>>>> processes
>>>>>> this request;
>>>>>>                    - Verify that client gets 200 status code,
>>>>>> 'Content-Length'
>>>>>> header with value equals to file size and some bytes in the body.
>>>>>>                      To execute this test I used as a client:
>>>>>>                        1) HttpClient - it doesn't produce any error
>>>>>> when
>>>>>> stream is closed.
>>>>>>                        2) CURL - it doesn't produce any error when
>>>>>> stream is
>>>>>> closed.
>>>>>>                        3) Firefox browser - it doesn't produce any
>>>>>> error
>>>>>> when
>>>>>> stream is closed.
>>>>>>                      All clients just download available bytes before
>>>>>> stream
>>>>>> is closed, so client has to manually compare 'Content-Length' header
>>>>>> value
>>>>>> and received bytes length.
>>>>>>                    - No failover takes place.
>>>>>>                This use case works as it is described. Apache HTTP
>>>>>> Server is
>>>>>> not able to do failover in this case.
>>>>>>
>>>>>>   This is unexpected and unfortunate.
>>>>>>
>>>>> I would have hoped that HttpClient and cURL at least would provide some
>>>>> indication that the stream was incomplete according to the
>>>>> Content-Length
>>>>> header.
>>>>> The only thing I would recommend you trying is taking Knox out of the
>>>>> picture, use cURL to GET the same file directly from HDFS, kill the
>>>>> DataNode halfway through the stream and ensure that you see the same
>>>>> behavior on the client side.
>>>>>
>>>>>  I just rechecked all headers/data and found that I was wrong about
>>>> Content-Length header. Knox received this header from DN but it didn't
>>>> send
>>>> it to client. I misunderstood a little bit logs on the Knox side.
>>>> I ran tests against DN usign CURL and it wrote "curl: (18) transfer
>>>> closed
>>>> with 107092406 bytes remaining to read" when I stopped DN.
>>>>
>>>>           2. Additional use cases.
>>>>
>>>>>            What new cases could you advise?
>>>>>>
>>>>>>   I just want to confirm that you have tested a scenario for HDFS
>>>>>> where
>>>>>>
>>>>> the
>>>>> call to the NameNode goes to instance-A and the subsequent call to the
>>>>> DataNode goes to instance-B and this works.
>>>>>
>>>>>    IV. What functionality did I miss?
>>>>> Other than the note above I don't see anything missing.
>>>>>
>>>>>   Maksim.
>>>>>
>>>>>>
>>>>>>   --
>>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity
>>>>> to which it is addressed and may contain information that is
>>>>> confidential,
>>>>> privileged and exempt from disclosure under applicable law. If the
>>>>> reader
>>>>> of this message is not the intended recipient, you are hereby notified
>>>>> that
>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>> received this communication in error, please contact the sender
>>>>> immediately
>>>>> and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>  --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is
>>> confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>>> immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Posted by Kevin Minder <ke...@hortonworks.com>.

I believe that since the Content-Length is a header that is written 
before the body is rewritten that the best we can do is avoid removing 
the Content-Length header when we know that we will not be rewriting the 
body.

On 10/25/13 12:09 PM, Dilli Arumugam wrote:
> Kevin,
>
> I should have done some tests and detected Content-Length is not reaching
> the client.
> Good,  Maksim detected it.
>
> As far your comment (2), I believe if Knox is rewriting the content, it
> should  rewrite the Content-Length ideally. But, it is not going to be
> practical. Needs some research on how to fix the problem right.
>
> Thanks
> Dilli
>
>
> On Fri, Oct 25, 2013 at 9:01 AM, Kevin Minder
> <ke...@hortonworks.com>wrote:
>
>> I was afraid that Knox might actually be removing the Content-Length
>> header.  Dilli is going to yell at me about that BTW!
>>
>> So there are two things that need to be done.
>>
>> 1) Determine the client (e.g. curl) behavior when Content-Length is
>> specified.
>>
>> 2) Make changes in Knox so that the Content-Length response header is only
>> removed if the body is being rewritten.
>>
>> Please file a jira for #2.  I've already given this some thought so I can
>> add detail.
>>
>>
>> On 10/25/13 11:55 AM, Maksim Kononenko wrote:
>>
>>> On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder
>>> <ke...@hortonworks.com>**wrote:
>>>
>>>   Maksim,
>>>> Great work!
>>>> Discussion inline below.
>>>> Recommended next steps.
>>>> 1) Add the setup steps required to get all of this working to the user's
>>>> guide.  File a jira.
>>>> 2) Figure out a way to automate these tests.  Might be hard on Apache
>>>> infra.
>>>> Kevin.
>>>>
>>>>
>>>> On 10/25/13 8:55 AM, Maksim Kononenko wrote:
>>>>
>>>>   Hi guys,
>>>>> I was researching/testing Knox HA with Apache HTTP Server +  mod_proxy +
>>>>> mod_proxy_balancer.
>>>>> Here is what I found.
>>>>> I.   3 load balancer scheduler algorithms available for use: Request
>>>>> Counting, Weighted Traffic Counting and Pending Request Counting. (
>>>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>>> html#scheduler<http://httpd.**apache.org/docs/2.2/mod/mod_**
>>>>> proxy_balancer.html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
>>>>> )
>>>>> II.  Load balancer stickyness. (
>>>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>>> html#stickyness<http://httpd.**apache.org/docs/2.2/mod/mod_**
>>>>> proxy_balancer.html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
>>>>> **>
>>>>> )
>>>>>         I configured and tested stickyness. Worked as it had to be.
>>>>> III. Failover. (
>>>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy.html#****proxypass<http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass>
>>>>> <http://httpd.apache.**org/docs/2.2/mod/mod_proxy.**html#proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
>>>>> )
>>>>>         1. I ran foolowing use cases:
>>>>>            a) Knox instance is down before client request comes in.
>>>>>                Steps:
>>>>>                    - Configure Apache HTTP Server to proxy two Knox
>>>>> instances;
>>>>>                    - Shoot down Knox instance A;
>>>>>                    - Execute client request;
>>>>>                    - Verify that Knox instance A is marked as unavailable
>>>>> and
>>>>> client's request is redirected to Knox instance B;
>>>>>                    - Verify that all subsequent requests in scope of the
>>>>> same
>>>>> client's session are passed just to Knox instance B;
>>>>>                    - Verify that client's requests in scope of new
>>>>> session
>>>>> are
>>>>> tried to be passed to Knox instance A.
>>>>>                      It is required because Knox instance A could be
>>>>> started
>>>>> before new client's session.
>>>>>
>>>>>   This seems a little sub-optimal to me but there may be nothing we can
>>>> do
>>>> about it.
>>>> The issue that I have is that I don't think Apache should be trying
>>>> instance-A first every time in this case.
>>>> So the question is how is Apache distributing load over instance-A and
>>>> instance-B?
>>>> Does it always try instance-A first or does it sometimes try instance-B
>>>> first?
>>>> In addition if it gets a failure for instance-A ideally it would take it
>>>> out of the "pool" for some (ideally configurable) period of time.
>>>>
>>> It depends on the  load balancer scheduler algorithm. For my tests I used
>>> Request Counting.
>>> I'll look for any configuration related to take out of the "pool" time.
>>>
>>>                  This use case works fine.
>>>>>            b) Knox instance goes down when it processes client's PUT
>>>>> request.
>>>>>                Steps:
>>>>>                    - Start executing PUT file to HDFS with medium size
>>>>> (200Mb);
>>>>>                    - After some time shoot down Knox instance which
>>>>> processes
>>>>> this request;
>>>>>                    - Verify that client gets 500 status code and no
>>>>> failover
>>>>> takes place.
>>>>>                This use case works as it is described. Apache HTTP
>>>>> Server is
>>>>> not able to do failover in this case.
>>>>>            c) Knox instance goes down when it processes client's GET
>>>>> request.
>>>>>                Steps:
>>>>>                    - Start executing GET file from HDFS with medium size
>>>>> (200Mb);
>>>>>                    - After some time shoot down Knox instance which
>>>>> processes
>>>>> this request;
>>>>>                    - Verify that client gets 200 status code,
>>>>> 'Content-Length'
>>>>> header with value equals to file size and some bytes in the body.
>>>>>                      To execute this test I used as a client:
>>>>>                        1) HttpClient - it doesn't produce any error when
>>>>> stream is closed.
>>>>>                        2) CURL - it doesn't produce any error when
>>>>> stream is
>>>>> closed.
>>>>>                        3) Firefox browser - it doesn't produce any error
>>>>> when
>>>>> stream is closed.
>>>>>                      All clients just download available bytes before
>>>>> stream
>>>>> is closed, so client has to manually compare 'Content-Length' header
>>>>> value
>>>>> and received bytes length.
>>>>>                    - No failover takes place.
>>>>>                This use case works as it is described. Apache HTTP
>>>>> Server is
>>>>> not able to do failover in this case.
>>>>>
>>>>>   This is unexpected and unfortunate.
>>>> I would have hoped that HttpClient and cURL at least would provide some
>>>> indication that the stream was incomplete according to the Content-Length
>>>> header.
>>>> The only thing I would recommend you trying is taking Knox out of the
>>>> picture, use cURL to GET the same file directly from HDFS, kill the
>>>> DataNode halfway through the stream and ensure that you see the same
>>>> behavior on the client side.
>>>>
>>> I just rechecked all headers/data and found that I was wrong about
>>> Content-Length header. Knox received this header from DN but it didn't
>>> send
>>> it to client. I misunderstood a little bit logs on the Knox side.
>>> I ran tests against DN usign CURL and it wrote "curl: (18) transfer closed
>>> with 107092406 bytes remaining to read" when I stopped DN.
>>>
>>>           2. Additional use cases.
>>>>>            What new cases could you advise?
>>>>>
>>>>>   I just want to confirm that you have tested a scenario for HDFS where
>>>> the
>>>> call to the NameNode goes to instance-A and the subsequent call to the
>>>> DataNode goes to instance-B and this works.
>>>>
>>>>    IV. What functionality did I miss?
>>>> Other than the note above I don't see anything missing.
>>>>
>>>>   Maksim.
>>>>>
>>>>>   --
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or entity
>>>> to which it is addressed and may contain information that is
>>>> confidential,
>>>> privileged and exempt from disclosure under applicable law. If the reader
>>>> of this message is not the intended recipient, you are hereby notified
>>>> that
>>>> any printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of this communication is strictly prohibited. If you have
>>>> received this communication in error, please contact the sender
>>>> immediately
>>>> and delete it from your system. Thank You.
>>>>
>>>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Posted by Dilli Arumugam <da...@hortonworks.com>.

Kevin,

I should have done some tests and detected Content-Length is not reaching
the client.
Good,  Maksim detected it.

As far your comment (2), I believe if Knox is rewriting the content, it
should  rewrite the Content-Length ideally. But, it is not going to be
practical. Needs some research on how to fix the problem right.

Thanks
Dilli


On Fri, Oct 25, 2013 at 9:01 AM, Kevin Minder
<ke...@hortonworks.com>wrote:

> I was afraid that Knox might actually be removing the Content-Length
> header.  Dilli is going to yell at me about that BTW!
>
> So there are two things that need to be done.
>
> 1) Determine the client (e.g. curl) behavior when Content-Length is
> specified.
>
> 2) Make changes in Knox so that the Content-Length response header is only
> removed if the body is being rewritten.
>
> Please file a jira for #2.  I've already given this some thought so I can
> add detail.
>
>
> On 10/25/13 11:55 AM, Maksim Kononenko wrote:
>
>> On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder
>> <ke...@hortonworks.com>**wrote:
>>
>>  Maksim,
>>> Great work!
>>> Discussion inline below.
>>> Recommended next steps.
>>> 1) Add the setup steps required to get all of this working to the user's
>>> guide.  File a jira.
>>> 2) Figure out a way to automate these tests.  Might be hard on Apache
>>> infra.
>>> Kevin.
>>>
>>>
>>> On 10/25/13 8:55 AM, Maksim Kononenko wrote:
>>>
>>>  Hi guys,
>>>>
>>>> I was researching/testing Knox HA with Apache HTTP Server +  mod_proxy +
>>>> mod_proxy_balancer.
>>>> Here is what I found.
>>>> I.   3 load balancer scheduler algorithms available for use: Request
>>>> Counting, Weighted Traffic Counting and Pending Request Counting. (
>>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>> html#scheduler<http://httpd.**apache.org/docs/2.2/mod/mod_**
>>>> proxy_balancer.html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
>>>> >
>>>> )
>>>> II.  Load balancer stickyness. (
>>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>> html#stickyness<http://httpd.**apache.org/docs/2.2/mod/mod_**
>>>> proxy_balancer.html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
>>>> **>
>>>> )
>>>>        I configured and tested stickyness. Worked as it had to be.
>>>> III. Failover. (
>>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy.html#****proxypass<http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass>
>>>> <http://httpd.apache.**org/docs/2.2/mod/mod_proxy.**html#proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
>>>> >
>>>> )
>>>>        1. I ran foolowing use cases:
>>>>           a) Knox instance is down before client request comes in.
>>>>               Steps:
>>>>                   - Configure Apache HTTP Server to proxy two Knox
>>>> instances;
>>>>                   - Shoot down Knox instance A;
>>>>                   - Execute client request;
>>>>                   - Verify that Knox instance A is marked as unavailable
>>>> and
>>>> client's request is redirected to Knox instance B;
>>>>                   - Verify that all subsequent requests in scope of the
>>>> same
>>>> client's session are passed just to Knox instance B;
>>>>                   - Verify that client's requests in scope of new
>>>> session
>>>> are
>>>> tried to be passed to Knox instance A.
>>>>                     It is required because Knox instance A could be
>>>> started
>>>> before new client's session.
>>>>
>>>>  This seems a little sub-optimal to me but there may be nothing we can
>>> do
>>> about it.
>>> The issue that I have is that I don't think Apache should be trying
>>> instance-A first every time in this case.
>>> So the question is how is Apache distributing load over instance-A and
>>> instance-B?
>>> Does it always try instance-A first or does it sometimes try instance-B
>>> first?
>>> In addition if it gets a failure for instance-A ideally it would take it
>>> out of the "pool" for some (ideally configurable) period of time.
>>>
>> It depends on the  load balancer scheduler algorithm. For my tests I used
>> Request Counting.
>> I'll look for any configuration related to take out of the "pool" time.
>>
>>                 This use case works fine.
>>>
>>>>           b) Knox instance goes down when it processes client's PUT
>>>> request.
>>>>               Steps:
>>>>                   - Start executing PUT file to HDFS with medium size
>>>> (200Mb);
>>>>                   - After some time shoot down Knox instance which
>>>> processes
>>>> this request;
>>>>                   - Verify that client gets 500 status code and no
>>>> failover
>>>> takes place.
>>>>               This use case works as it is described. Apache HTTP
>>>> Server is
>>>> not able to do failover in this case.
>>>>           c) Knox instance goes down when it processes client's GET
>>>> request.
>>>>               Steps:
>>>>                   - Start executing GET file from HDFS with medium size
>>>> (200Mb);
>>>>                   - After some time shoot down Knox instance which
>>>> processes
>>>> this request;
>>>>                   - Verify that client gets 200 status code,
>>>> 'Content-Length'
>>>> header with value equals to file size and some bytes in the body.
>>>>                     To execute this test I used as a client:
>>>>                       1) HttpClient - it doesn't produce any error when
>>>> stream is closed.
>>>>                       2) CURL - it doesn't produce any error when
>>>> stream is
>>>> closed.
>>>>                       3) Firefox browser - it doesn't produce any error
>>>> when
>>>> stream is closed.
>>>>                     All clients just download available bytes before
>>>> stream
>>>> is closed, so client has to manually compare 'Content-Length' header
>>>> value
>>>> and received bytes length.
>>>>                   - No failover takes place.
>>>>               This use case works as it is described. Apache HTTP
>>>> Server is
>>>> not able to do failover in this case.
>>>>
>>>>  This is unexpected and unfortunate.
>>> I would have hoped that HttpClient and cURL at least would provide some
>>> indication that the stream was incomplete according to the Content-Length
>>> header.
>>> The only thing I would recommend you trying is taking Knox out of the
>>> picture, use cURL to GET the same file directly from HDFS, kill the
>>> DataNode halfway through the stream and ensure that you see the same
>>> behavior on the client side.
>>>
>> I just rechecked all headers/data and found that I was wrong about
>> Content-Length header. Knox received this header from DN but it didn't
>> send
>> it to client. I misunderstood a little bit logs on the Knox side.
>> I ran tests against DN usign CURL and it wrote "curl: (18) transfer closed
>> with 107092406 bytes remaining to read" when I stopped DN.
>>
>>          2. Additional use cases.
>>>
>>>>           What new cases could you advise?
>>>>
>>>>  I just want to confirm that you have tested a scenario for HDFS where
>>> the
>>> call to the NameNode goes to instance-A and the subsequent call to the
>>> DataNode goes to instance-B and this works.
>>>
>>>   IV. What functionality did I miss?
>>> Other than the note above I don't see anything missing.
>>>
>>>  Maksim.
>>>>
>>>>
>>>>  --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is
>>> confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>>> immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Posted by Kevin Minder <ke...@hortonworks.com>.

I was afraid that Knox might actually be removing the Content-Length 
header.  Dilli is going to yell at me about that BTW!

So there are two things that need to be done.

1) Determine the client (e.g. curl) behavior when Content-Length is 
specified.

2) Make changes in Knox so that the Content-Length response header is 
only removed if the body is being rewritten.

Please file a jira for #2.  I've already given this some thought so I 
can add detail.

On 10/25/13 11:55 AM, Maksim Kononenko wrote:
> On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder
> <ke...@hortonworks.com>wrote:
>
>> Maksim,
>> Great work!
>> Discussion inline below.
>> Recommended next steps.
>> 1) Add the setup steps required to get all of this working to the user's
>> guide.  File a jira.
>> 2) Figure out a way to automate these tests.  Might be hard on Apache
>> infra.
>> Kevin.
>>
>>
>> On 10/25/13 8:55 AM, Maksim Kononenko wrote:
>>
>>> Hi guys,
>>>
>>> I was researching/testing Knox HA with Apache HTTP Server +  mod_proxy +
>>> mod_proxy_balancer.
>>> Here is what I found.
>>> I.   3 load balancer scheduler algorithms available for use: Request
>>> Counting, Weighted Traffic Counting and Pending Request Counting. (
>>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**
>>> html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
>>> )
>>> II.  Load balancer stickyness. (
>>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**
>>> html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
>>> )
>>>        I configured and tested stickyness. Worked as it had to be.
>>> III. Failover. (
>>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
>>> )
>>>        1. I ran foolowing use cases:
>>>           a) Knox instance is down before client request comes in.
>>>               Steps:
>>>                   - Configure Apache HTTP Server to proxy two Knox
>>> instances;
>>>                   - Shoot down Knox instance A;
>>>                   - Execute client request;
>>>                   - Verify that Knox instance A is marked as unavailable
>>> and
>>> client's request is redirected to Knox instance B;
>>>                   - Verify that all subsequent requests in scope of the
>>> same
>>> client's session are passed just to Knox instance B;
>>>                   - Verify that client's requests in scope of new session
>>> are
>>> tried to be passed to Knox instance A.
>>>                     It is required because Knox instance A could be started
>>> before new client's session.
>>>
>> This seems a little sub-optimal to me but there may be nothing we can do
>> about it.
>> The issue that I have is that I don't think Apache should be trying
>> instance-A first every time in this case.
>> So the question is how is Apache distributing load over instance-A and
>> instance-B?
>> Does it always try instance-A first or does it sometimes try instance-B
>> first?
>> In addition if it gets a failure for instance-A ideally it would take it
>> out of the "pool" for some (ideally configurable) period of time.
> It depends on the  load balancer scheduler algorithm. For my tests I used
> Request Counting.
> I'll look for any configuration related to take out of the "pool" time.
>
>>                This use case works fine.
>>>           b) Knox instance goes down when it processes client's PUT
>>> request.
>>>               Steps:
>>>                   - Start executing PUT file to HDFS with medium size
>>> (200Mb);
>>>                   - After some time shoot down Knox instance which
>>> processes
>>> this request;
>>>                   - Verify that client gets 500 status code and no failover
>>> takes place.
>>>               This use case works as it is described. Apache HTTP Server is
>>> not able to do failover in this case.
>>>           c) Knox instance goes down when it processes client's GET
>>> request.
>>>               Steps:
>>>                   - Start executing GET file from HDFS with medium size
>>> (200Mb);
>>>                   - After some time shoot down Knox instance which
>>> processes
>>> this request;
>>>                   - Verify that client gets 200 status code,
>>> 'Content-Length'
>>> header with value equals to file size and some bytes in the body.
>>>                     To execute this test I used as a client:
>>>                       1) HttpClient - it doesn't produce any error when
>>> stream is closed.
>>>                       2) CURL - it doesn't produce any error when stream is
>>> closed.
>>>                       3) Firefox browser - it doesn't produce any error
>>> when
>>> stream is closed.
>>>                     All clients just download available bytes before stream
>>> is closed, so client has to manually compare 'Content-Length' header value
>>> and received bytes length.
>>>                   - No failover takes place.
>>>               This use case works as it is described. Apache HTTP Server is
>>> not able to do failover in this case.
>>>
>> This is unexpected and unfortunate.
>> I would have hoped that HttpClient and cURL at least would provide some
>> indication that the stream was incomplete according to the Content-Length
>> header.
>> The only thing I would recommend you trying is taking Knox out of the
>> picture, use cURL to GET the same file directly from HDFS, kill the
>> DataNode halfway through the stream and ensure that you see the same
>> behavior on the client side.
> I just rechecked all headers/data and found that I was wrong about
> Content-Length header. Knox received this header from DN but it didn't send
> it to client. I misunderstood a little bit logs on the Knox side.
> I ran tests against DN usign CURL and it wrote "curl: (18) transfer closed
> with 107092406 bytes remaining to read" when I stopped DN.
>
>>         2. Additional use cases.
>>>           What new cases could you advise?
>>>
>> I just want to confirm that you have tested a scenario for HDFS where the
>> call to the NameNode goes to instance-A and the subsequent call to the
>> DataNode goes to instance-B and this works.
>>
>>   IV. What functionality did I miss?
>> Other than the note above I don't see anything missing.
>>
>>> Maksim.
>>>
>>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Posted by Maksim Kononenko <mk...@hortonworks.com>.

On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder
<ke...@hortonworks.com>wrote:

> Maksim,
> Great work!
> Discussion inline below.
> Recommended next steps.
> 1) Add the setup steps required to get all of this working to the user's
> guide.  File a jira.
> 2) Figure out a way to automate these tests.  Might be hard on Apache
> infra.
> Kevin.
>
>
> On 10/25/13 8:55 AM, Maksim Kononenko wrote:
>
>> Hi guys,
>>
>> I was researching/testing Knox HA with Apache HTTP Server +  mod_proxy +
>> mod_proxy_balancer.
>> Here is what I found.
>> I.   3 load balancer scheduler algorithms available for use: Request
>> Counting, Weighted Traffic Counting and Pending Request Counting. (
>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**
>> html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
>> )
>> II.  Load balancer stickyness. (
>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**
>> html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
>> )
>>       I configured and tested stickyness. Worked as it had to be.
>> III. Failover. (
>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
>> )
>>       1. I ran foolowing use cases:
>>          a) Knox instance is down before client request comes in.
>>              Steps:
>>                  - Configure Apache HTTP Server to proxy two Knox
>> instances;
>>                  - Shoot down Knox instance A;
>>                  - Execute client request;
>>                  - Verify that Knox instance A is marked as unavailable
>> and
>> client's request is redirected to Knox instance B;
>>                  - Verify that all subsequent requests in scope of the
>> same
>> client's session are passed just to Knox instance B;
>>                  - Verify that client's requests in scope of new session
>> are
>> tried to be passed to Knox instance A.
>>                    It is required because Knox instance A could be started
>> before new client's session.
>>
> This seems a little sub-optimal to me but there may be nothing we can do
> about it.
> The issue that I have is that I don't think Apache should be trying
> instance-A first every time in this case.
> So the question is how is Apache distributing load over instance-A and
> instance-B?
> Does it always try instance-A first or does it sometimes try instance-B
> first?
> In addition if it gets a failure for instance-A ideally it would take it
> out of the "pool" for some (ideally configurable) period of time.

It depends on the  load balancer scheduler algorithm. For my tests I used
Request Counting.
I'll look for any configuration related to take out of the "pool" time.

>
>               This use case works fine.
>>          b) Knox instance goes down when it processes client's PUT
>> request.
>>              Steps:
>>                  - Start executing PUT file to HDFS with medium size
>> (200Mb);
>>                  - After some time shoot down Knox instance which
>> processes
>> this request;
>>                  - Verify that client gets 500 status code and no failover
>> takes place.
>>              This use case works as it is described. Apache HTTP Server is
>> not able to do failover in this case.
>>          c) Knox instance goes down when it processes client's GET
>> request.
>>              Steps:
>>                  - Start executing GET file from HDFS with medium size
>> (200Mb);
>>                  - After some time shoot down Knox instance which
>> processes
>> this request;
>>                  - Verify that client gets 200 status code,
>> 'Content-Length'
>> header with value equals to file size and some bytes in the body.
>>                    To execute this test I used as a client:
>>                      1) HttpClient - it doesn't produce any error when
>> stream is closed.
>>                      2) CURL - it doesn't produce any error when stream is
>> closed.
>>                      3) Firefox browser - it doesn't produce any error
>> when
>> stream is closed.
>>                    All clients just download available bytes before stream
>> is closed, so client has to manually compare 'Content-Length' header value
>> and received bytes length.
>>                  - No failover takes place.
>>              This use case works as it is described. Apache HTTP Server is
>> not able to do failover in this case.
>>
> This is unexpected and unfortunate.
> I would have hoped that HttpClient and cURL at least would provide some
> indication that the stream was incomplete according to the Content-Length
> header.
> The only thing I would recommend you trying is taking Knox out of the
> picture, use cURL to GET the same file directly from HDFS, kill the
> DataNode halfway through the stream and ensure that you see the same
> behavior on the client side.

I just rechecked all headers/data and found that I was wrong about
Content-Length header. Knox received this header from DN but it didn't send
it to client. I misunderstood a little bit logs on the Knox side.
I ran tests against DN usign CURL and it wrote "curl: (18) transfer closed
with 107092406 bytes remaining to read" when I stopped DN.

>
>        2. Additional use cases.
>>          What new cases could you advise?
>>
> I just want to confirm that you have tested a scenario for HDFS where the
> call to the NameNode goes to instance-A and the subsequent call to the
> DataNode goes to instance-B and this works.
>
>  IV. What functionality did I miss?
>>
> Other than the note above I don't see anything missing.
>
>>
>> Maksim.
>>
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Posted by Dilli Arumugam <da...@hortonworks.com>.

Yes Maksim, excellent.

Was hoping Curl would detect the incomplete content and report error.
We could detect and report error in Knox DSL.
That  would be an illustrative example.

Dilli


On Fri, Oct 25, 2013 at 6:42 AM, Kevin Minder
<ke...@hortonworks.com>wrote:

> Maksim,
> Great work!
> Discussion inline below.
> Recommended next steps.
> 1) Add the setup steps required to get all of this working to the user's
> guide.  File a jira.
> 2) Figure out a way to automate these tests.  Might be hard on Apache
> infra.
> Kevin.
>
>
> On 10/25/13 8:55 AM, Maksim Kononenko wrote:
>
>> Hi guys,
>>
>> I was researching/testing Knox HA with Apache HTTP Server +  mod_proxy +
>> mod_proxy_balancer.
>> Here is what I found.
>> I.   3 load balancer scheduler algorithms available for use: Request
>> Counting, Weighted Traffic Counting and Pending Request Counting. (
>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**
>> html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
>> )
>> II.  Load balancer stickyness. (
>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**
>> html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
>> )
>>       I configured and tested stickyness. Worked as it had to be.
>> III. Failover. (
>> http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
>> )
>>       1. I ran foolowing use cases:
>>          a) Knox instance is down before client request comes in.
>>              Steps:
>>                  - Configure Apache HTTP Server to proxy two Knox
>> instances;
>>                  - Shoot down Knox instance A;
>>                  - Execute client request;
>>                  - Verify that Knox instance A is marked as unavailable
>> and
>> client's request is redirected to Knox instance B;
>>                  - Verify that all subsequent requests in scope of the
>> same
>> client's session are passed just to Knox instance B;
>>                  - Verify that client's requests in scope of new session
>> are
>> tried to be passed to Knox instance A.
>>                    It is required because Knox instance A could be started
>> before new client's session.
>>
> This seems a little sub-optimal to me but there may be nothing we can do
> about it.
> The issue that I have is that I don't think Apache should be trying
> instance-A first every time in this case.
> So the question is how is Apache distributing load over instance-A and
> instance-B?
> Does it always try instance-A first or does it sometimes try instance-B
> first?
> In addition if it gets a failure for instance-A ideally it would take it
> out of the "pool" for some (ideally configurable) period of time.
>
>               This use case works fine.
>>          b) Knox instance goes down when it processes client's PUT
>> request.
>>              Steps:
>>                  - Start executing PUT file to HDFS with medium size
>> (200Mb);
>>                  - After some time shoot down Knox instance which
>> processes
>> this request;
>>                  - Verify that client gets 500 status code and no failover
>> takes place.
>>              This use case works as it is described. Apache HTTP Server is
>> not able to do failover in this case.
>>          c) Knox instance goes down when it processes client's GET
>> request.
>>              Steps:
>>                  - Start executing GET file from HDFS with medium size
>> (200Mb);
>>                  - After some time shoot down Knox instance which
>> processes
>> this request;
>>                  - Verify that client gets 200 status code,
>> 'Content-Length'
>> header with value equals to file size and some bytes in the body.
>>                    To execute this test I used as a client:
>>                      1) HttpClient - it doesn't produce any error when
>> stream is closed.
>>                      2) CURL - it doesn't produce any error when stream is
>> closed.
>>                      3) Firefox browser - it doesn't produce any error
>> when
>> stream is closed.
>>                    All clients just download available bytes before stream
>> is closed, so client has to manually compare 'Content-Length' header value
>> and received bytes length.
>>                  - No failover takes place.
>>              This use case works as it is described. Apache HTTP Server is
>> not able to do failover in this case.
>>
> This is unexpected and unfortunate.
> I would have hoped that HttpClient and cURL at least would provide some
> indication that the stream was incomplete according to the Content-Length
> header.
> The only thing I would recommend you trying is taking Knox out of the
> picture, use cURL to GET the same file directly from HDFS, kill the
> DataNode halfway through the stream and ensure that you see the same
> behavior on the client side.
>
>        2. Additional use cases.
>>          What new cases could you advise?
>>
> I just want to confirm that you have tested a scenario for HDFS where the
> call to the NameNode goes to instance-A and the subsequent call to the
> DataNode goes to instance-B and this works.
>
>  IV. What functionality did I miss?
>>
> Other than the note above I don't see anything missing.
>
>
>> Maksim.
>>
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Knox HA with Apache HTTP Server + mod_proxy + mod_proxy_balancer

Posted by Kevin Minder <ke...@hortonworks.com>.

Maksim,
Great work!
Discussion inline below.
Recommended next steps.
1) Add the setup steps required to get all of this working to the user's 
guide.  File a jira.
2) Figure out a way to automate these tests.  Might be hard on Apache infra.
Kevin.

On 10/25/13 8:55 AM, Maksim Kononenko wrote:
> Hi guys,
>
> I was researching/testing Knox HA with Apache HTTP Server +  mod_proxy +
> mod_proxy_balancer.
> Here is what I found.
> I.   3 load balancer scheduler algorithms available for use: Request
> Counting, Weighted Traffic Counting and Pending Request Counting. (
> http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler)
> II.  Load balancer stickyness. (
> http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness)
>       I configured and tested stickyness. Worked as it had to be.
> III. Failover. (
> http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass)
>       1. I ran foolowing use cases:
>          a) Knox instance is down before client request comes in.
>              Steps:
>                  - Configure Apache HTTP Server to proxy two Knox instances;
>                  - Shoot down Knox instance A;
>                  - Execute client request;
>                  - Verify that Knox instance A is marked as unavailable and
> client's request is redirected to Knox instance B;
>                  - Verify that all subsequent requests in scope of the same
> client's session are passed just to Knox instance B;
>                  - Verify that client's requests in scope of new session are
> tried to be passed to Knox instance A.
>                    It is required because Knox instance A could be started
> before new client's session.
This seems a little sub-optimal to me but there may be nothing we can do 
about it.
The issue that I have is that I don't think Apache should be trying 
instance-A first every time in this case.
So the question is how is Apache distributing load over instance-A and 
instance-B?
Does it always try instance-A first or does it sometimes try instance-B 
first?
In addition if it gets a failure for instance-A ideally it would take it 
out of the "pool" for some (ideally configurable) period of time.
>              This use case works fine.
>          b) Knox instance goes down when it processes client's PUT request.
>              Steps:
>                  - Start executing PUT file to HDFS with medium size (200Mb);
>                  - After some time shoot down Knox instance which processes
> this request;
>                  - Verify that client gets 500 status code and no failover
> takes place.
>              This use case works as it is described. Apache HTTP Server is
> not able to do failover in this case.
>          c) Knox instance goes down when it processes client's GET request.
>              Steps:
>                  - Start executing GET file from HDFS with medium size
> (200Mb);
>                  - After some time shoot down Knox instance which processes
> this request;
>                  - Verify that client gets 200 status code, 'Content-Length'
> header with value equals to file size and some bytes in the body.
>                    To execute this test I used as a client:
>                      1) HttpClient - it doesn't produce any error when
> stream is closed.
>                      2) CURL - it doesn't produce any error when stream is
> closed.
>                      3) Firefox browser - it doesn't produce any error when
> stream is closed.
>                    All clients just download available bytes before stream
> is closed, so client has to manually compare 'Content-Length' header value
> and received bytes length.
>                  - No failover takes place.
>              This use case works as it is described. Apache HTTP Server is
> not able to do failover in this case.
This is unexpected and unfortunate.
I would have hoped that HttpClient and cURL at least would provide some 
indication that the stream was incomplete according to the 
Content-Length header.
The only thing I would recommend you trying is taking Knox out of the 
picture, use cURL to GET the same file directly from HDFS, kill the 
DataNode halfway through the stream and ensure that you see the same 
behavior on the client side.
>       2. Additional use cases.
>          What new cases could you advise?
I just want to confirm that you have tested a scenario for HDFS where 
the call to the NameNode goes to instance-A and the subsequent call to 
the DataNode goes to instance-B and this works.
> IV. What functionality did I miss?
Other than the note above I don't see anything missing.
>
> Maksim.
>


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.