You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by David O'Gwynn <do...@acm.org> on 2014/04/14 05:58:10 UTC

Optimal # proxy servers

Hi community,

I was reading a thread "Error stressing with pyaccumulo app" from
February, and the topic of optimal number of proxy servers for a
cluster of a given size came up. Does anyone have any insight into
that question? Is there a thread in the archive that addresses this
question directly?

My gut tells me that you should have a number proportional to the
number of tablet servers, but I'm afraid I don't really understand
what the proxy server is doing.

Re: Optimal # proxy servers

Posted by Eric Newton <er...@gmail.com>.
It will work fine... and you can run more than one in your cluster if needed.

If you observe a performance problem, please post a ticket to jira.


On Mon, Apr 14, 2014 at 12:12 PM, David O'Gwynn <do...@acm.org> wrote:
> Ah, thanks Eric, that answers my question. It sounds like using the
> proxy server for batch_scans and ingest is a bit beyond its scope. Are
> there plans for beefing up the proxy to handle a wider range of
> purposes from multiple clients?
>
> Thanks,
> David
>
> On Mon, Apr 14, 2014 at 11:06 AM, Eric Newton <er...@gmail.com> wrote:
>> High ingest and batch scans use resources within the proxy for queuing
>> data.  If I was using a proxy for these activities, I would want to
>> have a proxy for each client.  Administrative requests, and even basic
>> single-range scans are simple pass-throughs with a much lower chance
>> of overloading the proxy.
>>
>>
>> On Mon, Apr 14, 2014 at 9:56 AM, David Medinets
>> <da...@gmail.com> wrote:
>>> "number of proxy servers should be proportional to the number of clients" -
>>> I hate to be pedantic but
>>> this is a very general statement. Can you be more specific? Should the
>>> proportion be 1:1 or 5:1? What factors affect the ratio?
>>>
>>>
>>> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <er...@gmail.com> wrote:
>>>>
>>>> The number of proxy servers should be proportional to the number of
>>>> clients.
>>>>
>>>> The proxy can talk to all the tablet servers, but the client of the
>>>> proxy only has the proxy to make requests on its behalf.
>>>>
>>>> As always, it's going to depend on what you want to do, what your
>>>> schema looks like, and the total number of servers you have.
>>>>
>>>> -Eric
>>>>
>>>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <do...@acm.org> wrote:
>>>> > Hi community,
>>>> >
>>>> > I was reading a thread "Error stressing with pyaccumulo app" from
>>>> > February, and the topic of optimal number of proxy servers for a
>>>> > cluster of a given size came up. Does anyone have any insight into
>>>> > that question? Is there a thread in the archive that addresses this
>>>> > question directly?
>>>> >
>>>> > My gut tells me that you should have a number proportional to the
>>>> > number of tablet servers, but I'm afraid I don't really understand
>>>> > what the proxy server is doing.
>>>
>>>

Re: Optimal # proxy servers

Posted by Josh Elser <jo...@gmail.com>.
Hrm. 10x may have been overstating too. 5x is probably more accurate. 
YMMV :)

On 4/14/14, 1:38 PM, Josh Elser wrote:
> If you can about maximizing your throughput, ingest is probably not
> desirable through the proxy (you can probably get ~10x faster using the
> Java BatchWriter API).
>
> I wouldn't avoid the proxy server purely because of using batch_scans
> though. If you look at the Java impl of the BatchScanner, it essentially
> keeps a queue which many servers are concurrently throwing results onto
> and providing a Java Iterator to that queue to the client. With this in
> mind, this is very similar to what the proxy server is doing for you.
>
> On 4/14/14, 12:12 PM, David O'Gwynn wrote:
>> Ah, thanks Eric, that answers my question. It sounds like using the
>> proxy server for batch_scans and ingest is a bit beyond its scope. Are
>> there plans for beefing up the proxy to handle a wider range of
>> purposes from multiple clients?
>>
>> Thanks,
>> David
>>
>> On Mon, Apr 14, 2014 at 11:06 AM, Eric Newton <er...@gmail.com>
>> wrote:
>>> High ingest and batch scans use resources within the proxy for queuing
>>> data.  If I was using a proxy for these activities, I would want to
>>> have a proxy for each client.  Administrative requests, and even basic
>>> single-range scans are simple pass-throughs with a much lower chance
>>> of overloading the proxy.
>>>
>>>
>>> On Mon, Apr 14, 2014 at 9:56 AM, David Medinets
>>> <da...@gmail.com> wrote:
>>>> "number of proxy servers should be proportional to the number of
>>>> clients" -
>>>> I hate to be pedantic but
>>>> this is a very general statement. Can you be more specific? Should the
>>>> proportion be 1:1 or 5:1? What factors affect the ratio?
>>>>
>>>>
>>>> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <er...@gmail.com>
>>>> wrote:
>>>>>
>>>>> The number of proxy servers should be proportional to the number of
>>>>> clients.
>>>>>
>>>>> The proxy can talk to all the tablet servers, but the client of the
>>>>> proxy only has the proxy to make requests on its behalf.
>>>>>
>>>>> As always, it's going to depend on what you want to do, what your
>>>>> schema looks like, and the total number of servers you have.
>>>>>
>>>>> -Eric
>>>>>
>>>>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <do...@acm.org>
>>>>> wrote:
>>>>>> Hi community,
>>>>>>
>>>>>> I was reading a thread "Error stressing with pyaccumulo app" from
>>>>>> February, and the topic of optimal number of proxy servers for a
>>>>>> cluster of a given size came up. Does anyone have any insight into
>>>>>> that question? Is there a thread in the archive that addresses this
>>>>>> question directly?
>>>>>>
>>>>>> My gut tells me that you should have a number proportional to the
>>>>>> number of tablet servers, but I'm afraid I don't really understand
>>>>>> what the proxy server is doing.
>>>>
>>>>

Re: Optimal # proxy servers

Posted by Josh Elser <jo...@gmail.com>.
If you can about maximizing your throughput, ingest is probably not 
desirable through the proxy (you can probably get ~10x faster using the 
Java BatchWriter API).

I wouldn't avoid the proxy server purely because of using batch_scans 
though. If you look at the Java impl of the BatchScanner, it essentially 
keeps a queue which many servers are concurrently throwing results onto 
and providing a Java Iterator to that queue to the client. With this in 
mind, this is very similar to what the proxy server is doing for you.

On 4/14/14, 12:12 PM, David O'Gwynn wrote:
> Ah, thanks Eric, that answers my question. It sounds like using the
> proxy server for batch_scans and ingest is a bit beyond its scope. Are
> there plans for beefing up the proxy to handle a wider range of
> purposes from multiple clients?
>
> Thanks,
> David
>
> On Mon, Apr 14, 2014 at 11:06 AM, Eric Newton <er...@gmail.com> wrote:
>> High ingest and batch scans use resources within the proxy for queuing
>> data.  If I was using a proxy for these activities, I would want to
>> have a proxy for each client.  Administrative requests, and even basic
>> single-range scans are simple pass-throughs with a much lower chance
>> of overloading the proxy.
>>
>>
>> On Mon, Apr 14, 2014 at 9:56 AM, David Medinets
>> <da...@gmail.com> wrote:
>>> "number of proxy servers should be proportional to the number of clients" -
>>> I hate to be pedantic but
>>> this is a very general statement. Can you be more specific? Should the
>>> proportion be 1:1 or 5:1? What factors affect the ratio?
>>>
>>>
>>> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <er...@gmail.com> wrote:
>>>>
>>>> The number of proxy servers should be proportional to the number of
>>>> clients.
>>>>
>>>> The proxy can talk to all the tablet servers, but the client of the
>>>> proxy only has the proxy to make requests on its behalf.
>>>>
>>>> As always, it's going to depend on what you want to do, what your
>>>> schema looks like, and the total number of servers you have.
>>>>
>>>> -Eric
>>>>
>>>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <do...@acm.org> wrote:
>>>>> Hi community,
>>>>>
>>>>> I was reading a thread "Error stressing with pyaccumulo app" from
>>>>> February, and the topic of optimal number of proxy servers for a
>>>>> cluster of a given size came up. Does anyone have any insight into
>>>>> that question? Is there a thread in the archive that addresses this
>>>>> question directly?
>>>>>
>>>>> My gut tells me that you should have a number proportional to the
>>>>> number of tablet servers, but I'm afraid I don't really understand
>>>>> what the proxy server is doing.
>>>
>>>

Re: Optimal # proxy servers

Posted by David O'Gwynn <do...@acm.org>.
Ah, thanks Eric, that answers my question. It sounds like using the
proxy server for batch_scans and ingest is a bit beyond its scope. Are
there plans for beefing up the proxy to handle a wider range of
purposes from multiple clients?

Thanks,
David

On Mon, Apr 14, 2014 at 11:06 AM, Eric Newton <er...@gmail.com> wrote:
> High ingest and batch scans use resources within the proxy for queuing
> data.  If I was using a proxy for these activities, I would want to
> have a proxy for each client.  Administrative requests, and even basic
> single-range scans are simple pass-throughs with a much lower chance
> of overloading the proxy.
>
>
> On Mon, Apr 14, 2014 at 9:56 AM, David Medinets
> <da...@gmail.com> wrote:
>> "number of proxy servers should be proportional to the number of clients" -
>> I hate to be pedantic but
>> this is a very general statement. Can you be more specific? Should the
>> proportion be 1:1 or 5:1? What factors affect the ratio?
>>
>>
>> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <er...@gmail.com> wrote:
>>>
>>> The number of proxy servers should be proportional to the number of
>>> clients.
>>>
>>> The proxy can talk to all the tablet servers, but the client of the
>>> proxy only has the proxy to make requests on its behalf.
>>>
>>> As always, it's going to depend on what you want to do, what your
>>> schema looks like, and the total number of servers you have.
>>>
>>> -Eric
>>>
>>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <do...@acm.org> wrote:
>>> > Hi community,
>>> >
>>> > I was reading a thread "Error stressing with pyaccumulo app" from
>>> > February, and the topic of optimal number of proxy servers for a
>>> > cluster of a given size came up. Does anyone have any insight into
>>> > that question? Is there a thread in the archive that addresses this
>>> > question directly?
>>> >
>>> > My gut tells me that you should have a number proportional to the
>>> > number of tablet servers, but I'm afraid I don't really understand
>>> > what the proxy server is doing.
>>
>>

Re: Optimal # proxy servers

Posted by Eric Newton <er...@gmail.com>.
High ingest and batch scans use resources within the proxy for queuing
data.  If I was using a proxy for these activities, I would want to
have a proxy for each client.  Administrative requests, and even basic
single-range scans are simple pass-throughs with a much lower chance
of overloading the proxy.


On Mon, Apr 14, 2014 at 9:56 AM, David Medinets
<da...@gmail.com> wrote:
> "number of proxy servers should be proportional to the number of clients" -
> I hate to be pedantic but
> this is a very general statement. Can you be more specific? Should the
> proportion be 1:1 or 5:1? What factors affect the ratio?
>
>
> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <er...@gmail.com> wrote:
>>
>> The number of proxy servers should be proportional to the number of
>> clients.
>>
>> The proxy can talk to all the tablet servers, but the client of the
>> proxy only has the proxy to make requests on its behalf.
>>
>> As always, it's going to depend on what you want to do, what your
>> schema looks like, and the total number of servers you have.
>>
>> -Eric
>>
>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <do...@acm.org> wrote:
>> > Hi community,
>> >
>> > I was reading a thread "Error stressing with pyaccumulo app" from
>> > February, and the topic of optimal number of proxy servers for a
>> > cluster of a given size came up. Does anyone have any insight into
>> > that question? Is there a thread in the archive that addresses this
>> > question directly?
>> >
>> > My gut tells me that you should have a number proportional to the
>> > number of tablet servers, but I'm afraid I don't really understand
>> > what the proxy server is doing.
>
>

Re: Optimal # proxy servers

Posted by David Medinets <da...@gmail.com>.
"number of proxy servers should be proportional to the number of clients" -
I hate to be pedantic but
this is a very general statement. Can you be more specific? Should the
proportion be 1:1 or 5:1? What factors affect the ratio?


On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <er...@gmail.com> wrote:

> The number of proxy servers should be proportional to the number of
> clients.
>
> The proxy can talk to all the tablet servers, but the client of the
> proxy only has the proxy to make requests on its behalf.
>
> As always, it's going to depend on what you want to do, what your
> schema looks like, and the total number of servers you have.
>
> -Eric
>
> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <do...@acm.org> wrote:
> > Hi community,
> >
> > I was reading a thread "Error stressing with pyaccumulo app" from
> > February, and the topic of optimal number of proxy servers for a
> > cluster of a given size came up. Does anyone have any insight into
> > that question? Is there a thread in the archive that addresses this
> > question directly?
> >
> > My gut tells me that you should have a number proportional to the
> > number of tablet servers, but I'm afraid I don't really understand
> > what the proxy server is doing.
>

Re: Optimal # proxy servers

Posted by Eric Newton <er...@gmail.com>.
The number of proxy servers should be proportional to the number of clients.

The proxy can talk to all the tablet servers, but the client of the
proxy only has the proxy to make requests on its behalf.

As always, it's going to depend on what you want to do, what your
schema looks like, and the total number of servers you have.

-Eric

On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <do...@acm.org> wrote:
> Hi community,
>
> I was reading a thread "Error stressing with pyaccumulo app" from
> February, and the topic of optimal number of proxy servers for a
> cluster of a given size came up. Does anyone have any insight into
> that question? Is there a thread in the archive that addresses this
> question directly?
>
> My gut tells me that you should have a number proportional to the
> number of tablet servers, but I'm afraid I don't really understand
> what the proxy server is doing.