You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Adrien Mogenet <ad...@gmail.com> on 2013/01/13 01:06:09 UTC

Coprocessor / threading model

Hi there,

I'm experiencing some issues with CP. I'm trying to implement an indexing
solution (inspired by Annop's slides). In pre-put, I trigger another Put()
in an external table (to build the secondary index). It works perfect for
one client, but when I'm inserting data from 2 separate clients, I met
issues with HTable object (the one used in pre-Put()), because it's not
thread-safe. I decided to move on TablePool and that fixed my issue.

But if I increase the write-load (and concurrency) HBase is throwing a OOM
exception because it can't create new native threads. Looking at HBase
metrics "threads count", I see that roughly 3500 threads are created.

I'm looking for documentation about how CPs are working with threads :
what/when should I protect against concurrency issues ? How may I solve my
issue ?

Help is welcome :-)

-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: Coprocessor / threading model

Posted by Michel Segel <mi...@hotmail.com>.

There are a couple of different designs that you can use to perform the write to the secondary index.

I wouldn't call this an anti-pattern... (AP's comment)

Using htablepool wouldn't be my first choice, unless you are writing to a durable queue first which then uses the pool to write to the table.  This could work as part of a more general solution to handle indexing at a more general level. But that is a longer discussion.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jan 12, 2013, at 6:06 PM, Adrien Mogenet <ad...@gmail.com> wrote:

> Hi there,
> 
> I'm experiencing some issues with CP. I'm trying to implement an indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another Put()
> in an external table (to build the secondary index). It works perfect for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
> 
> But if I increase the write-load (and concurrency) HBase is throwing a OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
> 
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve my
> issue ?
> 
> Help is welcome :-)
> 
> -- 
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me

Re: Coprocessor / threading model

Posted by anil gupta <an...@gmail.com>.

I also ran into similar problem with one of my secondary index
implementation. But, i could not dig into the problem as i have to shift
focus on some other stuff. I am also interested in knowing the resolution
of this kind of problem in Coprocessors.

On Sat, Jan 12, 2013 at 5:38 PM, Ted <yu...@gmail.com> wrote:

> Please take a look at hbase-6651 which improves thread safety of table
> pool.
>
> Are you using hbase 0.94 ?
>
> Thanks
>
> On Jan 12, 2013, at 4:06 PM, Adrien Mogenet <ad...@gmail.com>
> wrote:
>
> > Hi there,
> >
> > I'm experiencing some issues with CP. I'm trying to implement an indexing
> > solution (inspired by Annop's slides). In pre-put, I trigger another
> Put()
> > in an external table (to build the secondary index). It works perfect for
> > one client, but when I'm inserting data from 2 separate clients, I met
> > issues with HTable object (the one used in pre-Put()), because it's not
> > thread-safe. I decided to move on TablePool and that fixed my issue.
> >
> > But if I increase the write-load (and concurrency) HBase is throwing a
> OOM
> > exception because it can't create new native threads. Looking at HBase
> > metrics "threads count", I see that roughly 3500 threads are created.
> >
> > I'm looking for documentation about how CPs are working with threads :
> > what/when should I protect against concurrency issues ? How may I solve
> my
> > issue ?
> >
> > Help is welcome :-)
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
>



-- 
Thanks & Regards,
Anil Gupta

Re: Coprocessor / threading model

Posted by Ted <yu...@gmail.com>.

Please take a look at hbase-6651 which improves thread safety of table pool. 

Are you using hbase 0.94 ?

Thanks

On Jan 12, 2013, at 4:06 PM, Adrien Mogenet <ad...@gmail.com> wrote:

> Hi there,
> 
> I'm experiencing some issues with CP. I'm trying to implement an indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another Put()
> in an external table (to build the secondary index). It works perfect for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
> 
> But if I increase the write-load (and concurrency) HBase is throwing a OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
> 
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve my
> issue ?
> 
> Help is welcome :-)
> 
> -- 
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me

Re: Coprocessor / threading model

Posted by Varun Sharma <va...@pinterest.com>.

You should look at the jstack - I think HTablePool is the reason for the
large number of threads. Note that HTablePool is a reusable pool HTable(s)
and each HTable consists of an ExecutorService containing 1 thread by
default. Are you closing the HTable you obtain from HTablePool - if you are
not closing the HTable - that will incessantly increase your thread count.
Also on 64 bit machines, I think each thread is allocated 256K or 512K of
stack by default.

Varun

On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:

> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>
>
>
>
> From:   Andrew Purtell <ap...@apache.org>
> To:     "user@hbase.apache.org" <us...@hbase.apache.org>,
> Date:   01/12/2013 09:39 PM
> Subject:        Re: Coprocessor / threading model
>
>
>
> > In pre-put, I trigger another Put() in an external table (to build the
> secondary index).
>
> We should probably call this a Coprocessor anti-pattern.
>
> Coprocessors are meant to operate on the region to which they are
> associated. They are a way you can extend HBase function while it operates
> in region on data for the region. Think of them as loadable kernel
> modules.
> They are not a general purpose server side platform for programming as if
> you are building a HBase client (with HTable, etc.). Just because you can
> do this doesn't mean you should.
>
>
> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet
> <ad...@gmail.com>wrote:
>
> > Hi there,
> >
> > I'm experiencing some issues with CP. I'm trying to implement an
> indexing
> > solution (inspired by Annop's slides). In pre-put, I trigger another
> Put()
> > in an external table (to build the secondary index). It works perfect
> for
> > one client, but when I'm inserting data from 2 separate clients, I met
> > issues with HTable object (the one used in pre-Put()), because it's not
> > thread-safe. I decided to move on TablePool and that fixed my issue.
> >
> > But if I increase the write-load (and concurrency) HBase is throwing a
> OOM
> > exception because it can't create new native threads. Looking at HBase
> > metrics "threads count", I see that roughly 3500 threads are created.
> >
> > I'm looking for documentation about how CPs are working with threads :
> > what/when should I protect against concurrency issues ? How may I solve
> my
> > issue ?
> >
> > Help is welcome :-)
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>

RE: Coprocessor / threading model

Posted by Anoop Sam John <an...@huawei.com>.

Thanks Andrew. A detailed and useful reply.... Nothing more needed to explain the anti pattern..  :)

-Anoop-
________________________________________
From: Andrew Purtell [apurtell@apache.org]
Sent: Wednesday, January 16, 2013 12:50 AM
To: user@hbase.apache.org
Subject: Re: Coprocessor / threading model

HTable is a blocking interface. When a client issues a put, for example, we
do not want to return until we can confirm the store has been durably
persisted. For client convenience many additional details of remote region
invocation are hidden, for example META table lookups for relocated
regions, reconnection, retries. Just about all coprocessor upcalls for the
Observer interface happen with the RPC handler context. RPC handlers are
drawn from a fixed pool of threads. Your CP code is tying up one of a fixed
resource for as long as it has control. And in here you are running the
complex HTable machinery. For many reasons your method call on HTable may
block (potentially for a long time) and therefore the RPC handler your
invocation is executing within will also block. An accidental cycle can
cause a deadlock once there are no free handlers somewhere, which will
happen as part of normal operation when the cluster is loaded, and the
higher the load the more likely.

Instead you can do what Anoop has described in this thread and install a CP
into the master that insures index regions are assigned to the same
regionserver as the primary table, and then call from a region of the
primary table into a colocated region of the index table, or vice versa,
bypassing HTable and the RPC stack. This is just making an in process
method call on one object from another.

Or, you could allocate a small executor pool for cross region RPC. When the
upcall into your CP happens, dispatch work to the executor and return
immediately to release the RPC worker thread back to the pool. This would
avoid the possibility of deadlock but this may not give you the semantics
you want because that background work could lag unpredictably.

On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:

> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Coprocessor / threading model

Posted by Wei Tan <wt...@us.ibm.com>.

Thanks Andrew for your detailed clarification.
Now I understand that in general, the system is subject to CAP theorem. 
You want good consistency AND latency, then partition tolerance needs to 
be sacrificed: this is the "local index" approach, i.e., colocate index 
and data and avoid RPC.

Otherwise, if you can tolerate consistency but not latency, you put RPCs 
in a queue and process them in the background. By this means you can have 
a "global" index with some lag. 

Best Regards,
Wei

Wei Tan 
Research Staff Member 
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598
wtan@us.ibm.com; 914-945-4386

From:   Andrew Purtell <ap...@apache.org>
To:     "user@hbase.apache.org" <us...@hbase.apache.org>, 
Date:   01/15/2013 02:20 PM
Subject:        Re: Coprocessor / threading model

HTable is a blocking interface. When a client issues a put, for example, 
we
do not want to return until we can confirm the store has been durably
persisted. For client convenience many additional details of remote region
invocation are hidden, for example META table lookups for relocated
regions, reconnection, retries. Just about all coprocessor upcalls for the
Observer interface happen with the RPC handler context. RPC handlers are
drawn from a fixed pool of threads. Your CP code is tying up one of a 
fixed
resource for as long as it has control. And in here you are running the
complex HTable machinery. For many reasons your method call on HTable may
block (potentially for a long time) and therefore the RPC handler your
invocation is executing within will also block. An accidental cycle can
cause a deadlock once there are no free handlers somewhere, which will
happen as part of normal operation when the cluster is loaded, and the
higher the load the more likely.

Instead you can do what Anoop has described in this thread and install a 
CP
into the master that insures index regions are assigned to the same
regionserver as the primary table, and then call from a region of the
primary table into a colocated region of the index table, or vice versa,
bypassing HTable and the RPC stack. This is just making an in process
method call on one object from another.

Or, you could allocate a small executor pool for cross region RPC. When 
the
upcall into your CP happens, dispatch work to the executor and return
immediately to release the RPC worker thread back to the pool. This would
avoid the possibility of deadlock but this may not give you the semantics
you want because that background work could lag unpredictably.

On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:

> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Coprocessor / threading model

Posted by Andrew Purtell <ap...@apache.org>.

HTable is a blocking interface. When a client issues a put, for example, we
do not want to return until we can confirm the store has been durably
persisted. For client convenience many additional details of remote region
invocation are hidden, for example META table lookups for relocated
regions, reconnection, retries. Just about all coprocessor upcalls for the
Observer interface happen with the RPC handler context. RPC handlers are
drawn from a fixed pool of threads. Your CP code is tying up one of a fixed
resource for as long as it has control. And in here you are running the
complex HTable machinery. For many reasons your method call on HTable may
block (potentially for a long time) and therefore the RPC handler your
invocation is executing within will also block. An accidental cycle can
cause a deadlock once there are no free handlers somewhere, which will
happen as part of normal operation when the cluster is loaded, and the
higher the load the more likely.

Instead you can do what Anoop has described in this thread and install a CP
into the master that insures index regions are assigned to the same
regionserver as the primary table, and then call from a region of the
primary table into a colocated region of the index table, or vice versa,
bypassing HTable and the RPC stack. This is just making an in process
method call on one object from another.

Or, you could allocate a small executor pool for cross region RPC. When the
upcall into your CP happens, dispatch work to the executor and return
immediately to release the RPC worker thread back to the pool. This would
avoid the possibility of deadlock but this may not give you the semantics
you want because that background work could lag unpredictably.

On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:

> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Coprocessor / threading model

Posted by Wei Tan <wt...@us.ibm.com>.

Andrew, could you explain more, why doing cross-table operation is an 
anti-pattern of using CP?
Durability might be an issue, as far as I understand. Thanks,

Best Regards,
Wei

From:   Andrew Purtell <ap...@apache.org>
To:     "user@hbase.apache.org" <us...@hbase.apache.org>, 
Date:   01/12/2013 09:39 PM
Subject:        Re: Coprocessor / threading model

> In pre-put, I trigger another Put() in an external table (to build the
secondary index).

We should probably call this a Coprocessor anti-pattern.

Coprocessors are meant to operate on the region to which they are
associated. They are a way you can extend HBase function while it operates
in region on data for the region. Think of them as loadable kernel 
modules.
They are not a general purpose server side platform for programming as if
you are building a HBase client (with HTable, etc.). Just because you can
do this doesn't mean you should.

On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet 
<ad...@gmail.com>wrote:

> Hi there,
>
> I'm experiencing some issues with CP. I'm trying to implement an 
indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another 
Put()
> in an external table (to build the secondary index). It works perfect 
for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
>
> But if I increase the write-load (and concurrency) HBase is throwing a 
OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
>
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve 
my
> issue ?
>
> Help is welcome :-)
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Coprocessor / threading model

Posted by Anoop John <an...@gmail.com>.

In your CP methods you will get ObserverContext object from which you can
get HRS object.
ObserverContext.getEnvironment().getRegionServerServices()
>From this HRS you can get hold to any of the region served by that RS.
Then directly call methods on HRegion to insert data. :)
Good luck..


-Anoop-

On Sun, Jan 13, 2013 at 4:12 PM, Adrien Mogenet <ad...@gmail.com>wrote:

> Thanks for pointing me out the Jira, that's useful for my understanding.
> I'm using HBase 0.94.3, and regions of main and index table are co-located
> on the same RS as in Anoop's design. I'll browse the API tomorrow to find
> out how to not use HTable but inter-CPs communication.
>
>
> On Sun, Jan 13, 2013 at 11:04 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > In Anoop's soln its basicallly the put happens directly on the index
> region
> > rather than doing a put thro HTable.
> >
> > Regards
> > Ram
> >
> > On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <
> andrew.purtell@gmail.com
> > >wrote:
> >
> > > Yes, especially if the cross region communication is in process.
> > >
> > > On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > bq. Coprocessors are meant to operate on the region to which they are
> > > > associated.
> > > >
> > > > For Anoop's case, the secondary table(s) have their regions aligned
> > with
> > > > the corresponding region from primary table. Meaning, related regions
> > are
> > > > served by the same region server.
> > > > Would writes to such regions of secondary table(s) be acceptable ?
> > > >
> > > > Thanks
> > > >
> > > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <apurtell@apache.org
> >
> > > wrote:
> > > >
> > > >>> In pre-put, I trigger another Put() in an external table (to build
> > the
> > > >> secondary index).
> > > >>
> > > >> We should probably call this a Coprocessor anti-pattern.
> > > >>
> > > >> Coprocessors are meant to operate on the region to which they are
> > > >> associated. They are a way you can extend HBase function while it
> > > operates
> > > >> in region on data for the region. Think of them as loadable kernel
> > > modules.
> > > >> They are not a general purpose server side platform for programming
> as
> > > if
> > > >> you are building a HBase client (with HTable, etc.). Just because
> you
> > > can
> > > >> do this doesn't mean you should.
> > > >>
> > > >>
> > > >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <
> > > adrien.mogenet@gmail.com
> > > >>> wrote:
> > > >>
> > > >>> Hi there,
> > > >>>
> > > >>> I'm experiencing some issues with CP. I'm trying to implement an
> > > indexing
> > > >>> solution (inspired by Annop's slides). In pre-put, I trigger
> another
> > > >> Put()
> > > >>> in an external table (to build the secondary index). It works
> perfect
> > > for
> > > >>> one client, but when I'm inserting data from 2 separate clients, I
> > met
> > > >>> issues with HTable object (the one used in pre-Put()), because it's
> > not
> > > >>> thread-safe. I decided to move on TablePool and that fixed my
> issue.
> > > >>>
> > > >>> But if I increase the write-load (and concurrency) HBase is
> throwing
> > a
> > > >> OOM
> > > >>> exception because it can't create new native threads. Looking at
> > HBase
> > > >>> metrics "threads count", I see that roughly 3500 threads are
> created.
> > > >>>
> > > >>> I'm looking for documentation about how CPs are working with
> threads
> > :
> > > >>> what/when should I protect against concurrency issues ? How may I
> > solve
> > > >> my
> > > >>> issue ?
> > > >>>
> > > >>> Help is welcome :-)
> > > >>>
> > > >>> --
> > > >>> Adrien Mogenet
> > > >>> 06.59.16.64.22
> > > >>> http://www.mogenet.me
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best regards,
> > > >>
> > > >>   - Andy
> > > >>
> > > >> Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > >> (via Tom White)
> > > >>
> > >
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

Re: Coprocessor / threading model

Posted by Adrien Mogenet <ad...@gmail.com>.

Thanks for pointing me out the Jira, that's useful for my understanding.
I'm using HBase 0.94.3, and regions of main and index table are co-located
on the same RS as in Anoop's design. I'll browse the API tomorrow to find
out how to not use HTable but inter-CPs communication.


On Sun, Jan 13, 2013 at 11:04 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> In Anoop's soln its basicallly the put happens directly on the index region
> rather than doing a put thro HTable.
>
> Regards
> Ram
>
> On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <andrew.purtell@gmail.com
> >wrote:
>
> > Yes, especially if the cross region communication is in process.
> >
> > On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > bq. Coprocessors are meant to operate on the region to which they are
> > > associated.
> > >
> > > For Anoop's case, the secondary table(s) have their regions aligned
> with
> > > the corresponding region from primary table. Meaning, related regions
> are
> > > served by the same region server.
> > > Would writes to such regions of secondary table(s) be acceptable ?
> > >
> > > Thanks
> > >
> > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> > >
> > >>> In pre-put, I trigger another Put() in an external table (to build
> the
> > >> secondary index).
> > >>
> > >> We should probably call this a Coprocessor anti-pattern.
> > >>
> > >> Coprocessors are meant to operate on the region to which they are
> > >> associated. They are a way you can extend HBase function while it
> > operates
> > >> in region on data for the region. Think of them as loadable kernel
> > modules.
> > >> They are not a general purpose server side platform for programming as
> > if
> > >> you are building a HBase client (with HTable, etc.). Just because you
> > can
> > >> do this doesn't mean you should.
> > >>
> > >>
> > >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <
> > adrien.mogenet@gmail.com
> > >>> wrote:
> > >>
> > >>> Hi there,
> > >>>
> > >>> I'm experiencing some issues with CP. I'm trying to implement an
> > indexing
> > >>> solution (inspired by Annop's slides). In pre-put, I trigger another
> > >> Put()
> > >>> in an external table (to build the secondary index). It works perfect
> > for
> > >>> one client, but when I'm inserting data from 2 separate clients, I
> met
> > >>> issues with HTable object (the one used in pre-Put()), because it's
> not
> > >>> thread-safe. I decided to move on TablePool and that fixed my issue.
> > >>>
> > >>> But if I increase the write-load (and concurrency) HBase is throwing
> a
> > >> OOM
> > >>> exception because it can't create new native threads. Looking at
> HBase
> > >>> metrics "threads count", I see that roughly 3500 threads are created.
> > >>>
> > >>> I'm looking for documentation about how CPs are working with threads
> :
> > >>> what/when should I protect against concurrency issues ? How may I
> solve
> > >> my
> > >>> issue ?
> > >>>
> > >>> Help is welcome :-)
> > >>>
> > >>> --
> > >>> Adrien Mogenet
> > >>> 06.59.16.64.22
> > >>> http://www.mogenet.me
> > >>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >>
> > >>   - Andy
> > >>
> > >> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > >> (via Tom White)
> > >>
> >
>



-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: Coprocessor / threading model

Posted by ramkrishna vasudevan <ra...@gmail.com>.

In Anoop's soln its basicallly the put happens directly on the index region
rather than doing a put thro HTable.

Regards
Ram

On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <an...@gmail.com>wrote:

> Yes, especially if the cross region communication is in process.
>
> On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. Coprocessors are meant to operate on the region to which they are
> > associated.
> >
> > For Anoop's case, the secondary table(s) have their regions aligned with
> > the corresponding region from primary table. Meaning, related regions are
> > served by the same region server.
> > Would writes to such regions of secondary table(s) be acceptable ?
> >
> > Thanks
> >
> > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> >
> >>> In pre-put, I trigger another Put() in an external table (to build the
> >> secondary index).
> >>
> >> We should probably call this a Coprocessor anti-pattern.
> >>
> >> Coprocessors are meant to operate on the region to which they are
> >> associated. They are a way you can extend HBase function while it
> operates
> >> in region on data for the region. Think of them as loadable kernel
> modules.
> >> They are not a general purpose server side platform for programming as
> if
> >> you are building a HBase client (with HTable, etc.). Just because you
> can
> >> do this doesn't mean you should.
> >>
> >>
> >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <
> adrien.mogenet@gmail.com
> >>> wrote:
> >>
> >>> Hi there,
> >>>
> >>> I'm experiencing some issues with CP. I'm trying to implement an
> indexing
> >>> solution (inspired by Annop's slides). In pre-put, I trigger another
> >> Put()
> >>> in an external table (to build the secondary index). It works perfect
> for
> >>> one client, but when I'm inserting data from 2 separate clients, I met
> >>> issues with HTable object (the one used in pre-Put()), because it's not
> >>> thread-safe. I decided to move on TablePool and that fixed my issue.
> >>>
> >>> But if I increase the write-load (and concurrency) HBase is throwing a
> >> OOM
> >>> exception because it can't create new native threads. Looking at HBase
> >>> metrics "threads count", I see that roughly 3500 threads are created.
> >>>
> >>> I'm looking for documentation about how CPs are working with threads :
> >>> what/when should I protect against concurrency issues ? How may I solve
> >> my
> >>> issue ?
> >>>
> >>> Help is welcome :-)
> >>>
> >>> --
> >>> Adrien Mogenet
> >>> 06.59.16.64.22
> >>> http://www.mogenet.me
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >>   - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >>
>

Re: Coprocessor / threading model

Posted by Andrew Purtell <an...@gmail.com>.

Yes, especially if the cross region communication is in process.

On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. Coprocessors are meant to operate on the region to which they are
> associated.
> 
> For Anoop's case, the secondary table(s) have their regions aligned with
> the corresponding region from primary table. Meaning, related regions are
> served by the same region server.
> Would writes to such regions of secondary table(s) be acceptable ?
> 
> Thanks
> 
> On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org> wrote:
> 
>>> In pre-put, I trigger another Put() in an external table (to build the
>> secondary index).
>> 
>> We should probably call this a Coprocessor anti-pattern.
>> 
>> Coprocessors are meant to operate on the region to which they are
>> associated. They are a way you can extend HBase function while it operates
>> in region on data for the region. Think of them as loadable kernel modules.
>> They are not a general purpose server side platform for programming as if
>> you are building a HBase client (with HTable, etc.). Just because you can
>> do this doesn't mean you should.
>> 
>> 
>> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <adrien.mogenet@gmail.com
>>> wrote:
>> 
>>> Hi there,
>>> 
>>> I'm experiencing some issues with CP. I'm trying to implement an indexing
>>> solution (inspired by Annop's slides). In pre-put, I trigger another
>> Put()
>>> in an external table (to build the secondary index). It works perfect for
>>> one client, but when I'm inserting data from 2 separate clients, I met
>>> issues with HTable object (the one used in pre-Put()), because it's not
>>> thread-safe. I decided to move on TablePool and that fixed my issue.
>>> 
>>> But if I increase the write-load (and concurrency) HBase is throwing a
>> OOM
>>> exception because it can't create new native threads. Looking at HBase
>>> metrics "threads count", I see that roughly 3500 threads are created.
>>> 
>>> I'm looking for documentation about how CPs are working with threads :
>>> what/when should I protect against concurrency issues ? How may I solve
>> my
>>> issue ?
>>> 
>>> Help is welcome :-)
>>> 
>>> --
>>> Adrien Mogenet
>>> 06.59.16.64.22
>>> http://www.mogenet.me
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>

Re: Coprocessor / threading model

Posted by Ted Yu <yu...@gmail.com>.

bq. Coprocessors are meant to operate on the region to which they are
associated.

For Anoop's case, the secondary table(s) have their regions aligned with
the corresponding region from primary table. Meaning, related regions are
served by the same region server.
Would writes to such regions of secondary table(s) be acceptable ?

Thanks

On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org> wrote:

> > In pre-put, I trigger another Put() in an external table (to build the
> secondary index).
>
> We should probably call this a Coprocessor anti-pattern.
>
> Coprocessors are meant to operate on the region to which they are
> associated. They are a way you can extend HBase function while it operates
> in region on data for the region. Think of them as loadable kernel modules.
> They are not a general purpose server side platform for programming as if
> you are building a HBase client (with HTable, etc.). Just because you can
> do this doesn't mean you should.
>
>
> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <adrien.mogenet@gmail.com
> >wrote:
>
> > Hi there,
> >
> > I'm experiencing some issues with CP. I'm trying to implement an indexing
> > solution (inspired by Annop's slides). In pre-put, I trigger another
> Put()
> > in an external table (to build the secondary index). It works perfect for
> > one client, but when I'm inserting data from 2 separate clients, I met
> > issues with HTable object (the one used in pre-Put()), because it's not
> > thread-safe. I decided to move on TablePool and that fixed my issue.
> >
> > But if I increase the write-load (and concurrency) HBase is throwing a
> OOM
> > exception because it can't create new native threads. Looking at HBase
> > metrics "threads count", I see that roughly 3500 threads are created.
> >
> > I'm looking for documentation about how CPs are working with threads :
> > what/when should I protect against concurrency issues ? How may I solve
> my
> > issue ?
> >
> > Help is welcome :-)
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Coprocessor / threading model

Posted by Andrew Purtell <ap...@apache.org>.

> In pre-put, I trigger another Put() in an external table (to build the
secondary index).

We should probably call this a Coprocessor anti-pattern.

Coprocessors are meant to operate on the region to which they are
associated. They are a way you can extend HBase function while it operates
in region on data for the region. Think of them as loadable kernel modules.
They are not a general purpose server side platform for programming as if
you are building a HBase client (with HTable, etc.). Just because you can
do this doesn't mean you should.

On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <ad...@gmail.com>wrote:

> Hi there,
>
> I'm experiencing some issues with CP. I'm trying to implement an indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another Put()
> in an external table (to build the secondary index). It works perfect for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
>
> But if I increase the write-load (and concurrency) HBase is throwing a OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
>
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve my
> issue ?
>
> Help is welcome :-)
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)