You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Adrien Mogenet <ad...@gmail.com> on 2013/01/13 01:06:09 UTC
Coprocessor / threading model
Hi there,
I'm experiencing some issues with CP. I'm trying to implement an indexing
solution (inspired by Annop's slides). In pre-put, I trigger another Put()
in an external table (to build the secondary index). It works perfect for
one client, but when I'm inserting data from 2 separate clients, I met
issues with HTable object (the one used in pre-Put()), because it's not
thread-safe. I decided to move on TablePool and that fixed my issue.
But if I increase the write-load (and concurrency) HBase is throwing a OOM
exception because it can't create new native threads. Looking at HBase
metrics "threads count", I see that roughly 3500 threads are created.
I'm looking for documentation about how CPs are working with threads :
what/when should I protect against concurrency issues ? How may I solve my
issue ?
Help is welcome :-)
--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
Re: Coprocessor / threading model
Posted by Michel Segel <mi...@hotmail.com>.
There are a couple of different designs that you can use to perform the write to the secondary index.
I wouldn't call this an anti-pattern... (AP's comment)
Using htablepool wouldn't be my first choice, unless you are writing to a durable queue first which then uses the pool to write to the table. This could work as part of a more general solution to handle indexing at a more general level. But that is a longer discussion.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Jan 12, 2013, at 6:06 PM, Adrien Mogenet <ad...@gmail.com> wrote:
> Hi there,
>
> I'm experiencing some issues with CP. I'm trying to implement an indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another Put()
> in an external table (to build the secondary index). It works perfect for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
>
> But if I increase the write-load (and concurrency) HBase is throwing a OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
>
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve my
> issue ?
>
> Help is welcome :-)
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
Re: Coprocessor / threading model
Posted by anil gupta <an...@gmail.com>.
I also ran into similar problem with one of my secondary index
implementation. But, i could not dig into the problem as i have to shift
focus on some other stuff. I am also interested in knowing the resolution
of this kind of problem in Coprocessors.
On Sat, Jan 12, 2013 at 5:38 PM, Ted <yu...@gmail.com> wrote:
> Please take a look at hbase-6651 which improves thread safety of table
> pool.
>
> Are you using hbase 0.94 ?
>
> Thanks
>
> On Jan 12, 2013, at 4:06 PM, Adrien Mogenet <ad...@gmail.com>
> wrote:
>
> > Hi there,
> >
> > I'm experiencing some issues with CP. I'm trying to implement an indexing
> > solution (inspired by Annop's slides). In pre-put, I trigger another
> Put()
> > in an external table (to build the secondary index). It works perfect for
> > one client, but when I'm inserting data from 2 separate clients, I met
> > issues with HTable object (the one used in pre-Put()), because it's not
> > thread-safe. I decided to move on TablePool and that fixed my issue.
> >
> > But if I increase the write-load (and concurrency) HBase is throwing a
> OOM
> > exception because it can't create new native threads. Looking at HBase
> > metrics "threads count", I see that roughly 3500 threads are created.
> >
> > I'm looking for documentation about how CPs are working with threads :
> > what/when should I protect against concurrency issues ? How may I solve
> my
> > issue ?
> >
> > Help is welcome :-)
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
>
--
Thanks & Regards,
Anil Gupta
Re: Coprocessor / threading model
Posted by Ted <yu...@gmail.com>.
Please take a look at hbase-6651 which improves thread safety of table pool.
Are you using hbase 0.94 ?
Thanks
On Jan 12, 2013, at 4:06 PM, Adrien Mogenet <ad...@gmail.com> wrote:
> Hi there,
>
> I'm experiencing some issues with CP. I'm trying to implement an indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another Put()
> in an external table (to build the secondary index). It works perfect for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
>
> But if I increase the write-load (and concurrency) HBase is throwing a OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
>
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve my
> issue ?
>
> Help is welcome :-)
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
Re: Coprocessor / threading model
Posted by Varun Sharma <va...@pinterest.com>.
You should look at the jstack - I think HTablePool is the reason for the
large number of threads. Note that HTablePool is a reusable pool HTable(s)
and each HTable consists of an ExecutorService containing 1 thread by
default. Are you closing the HTable you obtain from HTablePool - if you are
not closing the HTable - that will incessantly increase your thread count.
Also on 64 bit machines, I think each thread is allocated 256K or 512K of
stack by default.
Varun
On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:
> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>
>
>
>
> From: Andrew Purtell <ap...@apache.org>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>,
> Date: 01/12/2013 09:39 PM
> Subject: Re: Coprocessor / threading model
>
>
>
> > In pre-put, I trigger another Put() in an external table (to build the
> secondary index).
>
> We should probably call this a Coprocessor anti-pattern.
>
> Coprocessors are meant to operate on the region to which they are
> associated. They are a way you can extend HBase function while it operates
> in region on data for the region. Think of them as loadable kernel
> modules.
> They are not a general purpose server side platform for programming as if
> you are building a HBase client (with HTable, etc.). Just because you can
> do this doesn't mean you should.
>
>
> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet
> <ad...@gmail.com>wrote:
>
> > Hi there,
> >
> > I'm experiencing some issues with CP. I'm trying to implement an
> indexing
> > solution (inspired by Annop's slides). In pre-put, I trigger another
> Put()
> > in an external table (to build the secondary index). It works perfect
> for
> > one client, but when I'm inserting data from 2 separate clients, I met
> > issues with HTable object (the one used in pre-Put()), because it's not
> > thread-safe. I decided to move on TablePool and that fixed my issue.
> >
> > But if I increase the write-load (and concurrency) HBase is throwing a
> OOM
> > exception because it can't create new native threads. Looking at HBase
> > metrics "threads count", I see that roughly 3500 threads are created.
> >
> > I'm looking for documentation about how CPs are working with threads :
> > what/when should I protect against concurrency issues ? How may I solve
> my
> > issue ?
> >
> > Help is welcome :-)
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>
>
>
> --
> Best regards,
>
> - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
RE: Coprocessor / threading model
Posted by Anoop Sam John <an...@huawei.com>.
Thanks Andrew. A detailed and useful reply.... Nothing more needed to explain the anti pattern.. :)
-Anoop-
________________________________________
From: Andrew Purtell [apurtell@apache.org]
Sent: Wednesday, January 16, 2013 12:50 AM
To: user@hbase.apache.org
Subject: Re: Coprocessor / threading model
HTable is a blocking interface. When a client issues a put, for example, we
do not want to return until we can confirm the store has been durably
persisted. For client convenience many additional details of remote region
invocation are hidden, for example META table lookups for relocated
regions, reconnection, retries. Just about all coprocessor upcalls for the
Observer interface happen with the RPC handler context. RPC handlers are
drawn from a fixed pool of threads. Your CP code is tying up one of a fixed
resource for as long as it has control. And in here you are running the
complex HTable machinery. For many reasons your method call on HTable may
block (potentially for a long time) and therefore the RPC handler your
invocation is executing within will also block. An accidental cycle can
cause a deadlock once there are no free handlers somewhere, which will
happen as part of normal operation when the cluster is loaded, and the
higher the load the more likely.
Instead you can do what Anoop has described in this thread and install a CP
into the master that insures index regions are assigned to the same
regionserver as the primary table, and then call from a region of the
primary table into a colocated region of the index table, or vice versa,
bypassing HTable and the RPC stack. This is just making an in process
method call on one object from another.
Or, you could allocate a small executor pool for cross region RPC. When the
upcall into your CP happens, dispatch work to the executor and return
immediately to release the RPC worker thread back to the pool. This would
avoid the possibility of deadlock but this may not give you the semantics
you want because that background work could lag unpredictably.
On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:
> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>
--
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
Re: Coprocessor / threading model
Posted by Wei Tan <wt...@us.ibm.com>.
Thanks Andrew for your detailed clarification.
Now I understand that in general, the system is subject to CAP theorem.
You want good consistency AND latency, then partition tolerance needs to
be sacrificed: this is the "local index" approach, i.e., colocate index
and data and avoid RPC.
Otherwise, if you can tolerate consistency but not latency, you put RPCs
in a queue and process them in the background. By this means you can have
a "global" index with some lag.
Best Regards,
Wei
Wei Tan
Research Staff Member
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598
wtan@us.ibm.com; 914-945-4386
From: Andrew Purtell <ap...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>,
Date: 01/15/2013 02:20 PM
Subject: Re: Coprocessor / threading model
HTable is a blocking interface. When a client issues a put, for example,
we
do not want to return until we can confirm the store has been durably
persisted. For client convenience many additional details of remote region
invocation are hidden, for example META table lookups for relocated
regions, reconnection, retries. Just about all coprocessor upcalls for the
Observer interface happen with the RPC handler context. RPC handlers are
drawn from a fixed pool of threads. Your CP code is tying up one of a
fixed
resource for as long as it has control. And in here you are running the
complex HTable machinery. For many reasons your method call on HTable may
block (potentially for a long time) and therefore the RPC handler your
invocation is executing within will also block. An accidental cycle can
cause a deadlock once there are no free handlers somewhere, which will
happen as part of normal operation when the cluster is loaded, and the
higher the load the more likely.
Instead you can do what Anoop has described in this thread and install a
CP
into the master that insures index regions are assigned to the same
regionserver as the primary table, and then call from a region of the
primary table into a colocated region of the index table, or vice versa,
bypassing HTable and the RPC stack. This is just making an in process
method call on one object from another.
Or, you could allocate a small executor pool for cross region RPC. When
the
upcall into your CP happens, dispatch work to the executor and return
immediately to release the RPC worker thread back to the pool. This would
avoid the possibility of deadlock but this may not give you the semantics
you want because that background work could lag unpredictably.
On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:
> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>
--
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
Re: Coprocessor / threading model
Posted by Andrew Purtell <ap...@apache.org>.
HTable is a blocking interface. When a client issues a put, for example, we
do not want to return until we can confirm the store has been durably
persisted. For client convenience many additional details of remote region
invocation are hidden, for example META table lookups for relocated
regions, reconnection, retries. Just about all coprocessor upcalls for the
Observer interface happen with the RPC handler context. RPC handlers are
drawn from a fixed pool of threads. Your CP code is tying up one of a fixed
resource for as long as it has control. And in here you are running the
complex HTable machinery. For many reasons your method call on HTable may
block (potentially for a long time) and therefore the RPC handler your
invocation is executing within will also block. An accidental cycle can
cause a deadlock once there are no free handlers somewhere, which will
happen as part of normal operation when the cluster is loaded, and the
higher the load the more likely.
Instead you can do what Anoop has described in this thread and install a CP
into the master that insures index regions are assigned to the same
regionserver as the primary table, and then call from a region of the
primary table into a colocated region of the index table, or vice versa,
bypassing HTable and the RPC stack. This is just making an in process
method call on one object from another.
Or, you could allocate a small executor pool for cross region RPC. When the
upcall into your CP happens, dispatch work to the executor and return
immediately to release the RPC worker thread back to the pool. This would
avoid the possibility of deadlock but this may not give you the semantics
you want because that background work could lag unpredictably.
On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:
> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>
--
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
Re: Coprocessor / threading model
Posted by Wei Tan <wt...@us.ibm.com>.
Andrew, could you explain more, why doing cross-table operation is an
anti-pattern of using CP?
Durability might be an issue, as far as I understand. Thanks,
Best Regards,
Wei
From: Andrew Purtell <ap...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>,
Date: 01/12/2013 09:39 PM
Subject: Re: Coprocessor / threading model
> In pre-put, I trigger another Put() in an external table (to build the
secondary index).
We should probably call this a Coprocessor anti-pattern.
Coprocessors are meant to operate on the region to which they are
associated. They are a way you can extend HBase function while it operates
in region on data for the region. Think of them as loadable kernel
modules.
They are not a general purpose server side platform for programming as if
you are building a HBase client (with HTable, etc.). Just because you can
do this doesn't mean you should.
On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet
<ad...@gmail.com>wrote:
> Hi there,
>
> I'm experiencing some issues with CP. I'm trying to implement an
indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another
Put()
> in an external table (to build the secondary index). It works perfect
for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
>
> But if I increase the write-load (and concurrency) HBase is throwing a
OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
>
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve
my
> issue ?
>
> Help is welcome :-)
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>
--
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
Re: Coprocessor / threading model
Posted by Anoop John <an...@gmail.com>.
In your CP methods you will get ObserverContext object from which you can
get HRS object.
ObserverContext.getEnvironment().getRegionServerServices()
>From this HRS you can get hold to any of the region served by that RS.
Then directly call methods on HRegion to insert data. :)
Good luck..
-Anoop-
On Sun, Jan 13, 2013 at 4:12 PM, Adrien Mogenet <ad...@gmail.com>wrote:
> Thanks for pointing me out the Jira, that's useful for my understanding.
> I'm using HBase 0.94.3, and regions of main and index table are co-located
> on the same RS as in Anoop's design. I'll browse the API tomorrow to find
> out how to not use HTable but inter-CPs communication.
>
>
> On Sun, Jan 13, 2013 at 11:04 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > In Anoop's soln its basicallly the put happens directly on the index
> region
> > rather than doing a put thro HTable.
> >
> > Regards
> > Ram
> >
> > On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <
> andrew.purtell@gmail.com
> > >wrote:
> >
> > > Yes, especially if the cross region communication is in process.
> > >
> > > On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > bq. Coprocessors are meant to operate on the region to which they are
> > > > associated.
> > > >
> > > > For Anoop's case, the secondary table(s) have their regions aligned
> > with
> > > > the corresponding region from primary table. Meaning, related regions
> > are
> > > > served by the same region server.
> > > > Would writes to such regions of secondary table(s) be acceptable ?
> > > >
> > > > Thanks
> > > >
> > > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <apurtell@apache.org
> >
> > > wrote:
> > > >
> > > >>> In pre-put, I trigger another Put() in an external table (to build
> > the
> > > >> secondary index).
> > > >>
> > > >> We should probably call this a Coprocessor anti-pattern.
> > > >>
> > > >> Coprocessors are meant to operate on the region to which they are
> > > >> associated. They are a way you can extend HBase function while it
> > > operates
> > > >> in region on data for the region. Think of them as loadable kernel
> > > modules.
> > > >> They are not a general purpose server side platform for programming
> as
> > > if
> > > >> you are building a HBase client (with HTable, etc.). Just because
> you
> > > can
> > > >> do this doesn't mean you should.
> > > >>
> > > >>
> > > >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <
> > > adrien.mogenet@gmail.com
> > > >>> wrote:
> > > >>
> > > >>> Hi there,
> > > >>>
> > > >>> I'm experiencing some issues with CP. I'm trying to implement an
> > > indexing
> > > >>> solution (inspired by Annop's slides). In pre-put, I trigger
> another
> > > >> Put()
> > > >>> in an external table (to build the secondary index). It works
> perfect
> > > for
> > > >>> one client, but when I'm inserting data from 2 separate clients, I
> > met
> > > >>> issues with HTable object (the one used in pre-Put()), because it's
> > not
> > > >>> thread-safe. I decided to move on TablePool and that fixed my
> issue.
> > > >>>
> > > >>> But if I increase the write-load (and concurrency) HBase is
> throwing
> > a
> > > >> OOM
> > > >>> exception because it can't create new native threads. Looking at
> > HBase
> > > >>> metrics "threads count", I see that roughly 3500 threads are
> created.
> > > >>>
> > > >>> I'm looking for documentation about how CPs are working with
> threads
> > :
> > > >>> what/when should I protect against concurrency issues ? How may I
> > solve
> > > >> my
> > > >>> issue ?
> > > >>>
> > > >>> Help is welcome :-)
> > > >>>
> > > >>> --
> > > >>> Adrien Mogenet
> > > >>> 06.59.16.64.22
> > > >>> http://www.mogenet.me
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best regards,
> > > >>
> > > >> - Andy
> > > >>
> > > >> Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > >> (via Tom White)
> > > >>
> > >
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>
Re: Coprocessor / threading model
Posted by Adrien Mogenet <ad...@gmail.com>.
Thanks for pointing me out the Jira, that's useful for my understanding.
I'm using HBase 0.94.3, and regions of main and index table are co-located
on the same RS as in Anoop's design. I'll browse the API tomorrow to find
out how to not use HTable but inter-CPs communication.
On Sun, Jan 13, 2013 at 11:04 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:
> In Anoop's soln its basicallly the put happens directly on the index region
> rather than doing a put thro HTable.
>
> Regards
> Ram
>
> On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <andrew.purtell@gmail.com
> >wrote:
>
> > Yes, especially if the cross region communication is in process.
> >
> > On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > bq. Coprocessors are meant to operate on the region to which they are
> > > associated.
> > >
> > > For Anoop's case, the secondary table(s) have their regions aligned
> with
> > > the corresponding region from primary table. Meaning, related regions
> are
> > > served by the same region server.
> > > Would writes to such regions of secondary table(s) be acceptable ?
> > >
> > > Thanks
> > >
> > > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> > >
> > >>> In pre-put, I trigger another Put() in an external table (to build
> the
> > >> secondary index).
> > >>
> > >> We should probably call this a Coprocessor anti-pattern.
> > >>
> > >> Coprocessors are meant to operate on the region to which they are
> > >> associated. They are a way you can extend HBase function while it
> > operates
> > >> in region on data for the region. Think of them as loadable kernel
> > modules.
> > >> They are not a general purpose server side platform for programming as
> > if
> > >> you are building a HBase client (with HTable, etc.). Just because you
> > can
> > >> do this doesn't mean you should.
> > >>
> > >>
> > >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <
> > adrien.mogenet@gmail.com
> > >>> wrote:
> > >>
> > >>> Hi there,
> > >>>
> > >>> I'm experiencing some issues with CP. I'm trying to implement an
> > indexing
> > >>> solution (inspired by Annop's slides). In pre-put, I trigger another
> > >> Put()
> > >>> in an external table (to build the secondary index). It works perfect
> > for
> > >>> one client, but when I'm inserting data from 2 separate clients, I
> met
> > >>> issues with HTable object (the one used in pre-Put()), because it's
> not
> > >>> thread-safe. I decided to move on TablePool and that fixed my issue.
> > >>>
> > >>> But if I increase the write-load (and concurrency) HBase is throwing
> a
> > >> OOM
> > >>> exception because it can't create new native threads. Looking at
> HBase
> > >>> metrics "threads count", I see that roughly 3500 threads are created.
> > >>>
> > >>> I'm looking for documentation about how CPs are working with threads
> :
> > >>> what/when should I protect against concurrency issues ? How may I
> solve
> > >> my
> > >>> issue ?
> > >>>
> > >>> Help is welcome :-)
> > >>>
> > >>> --
> > >>> Adrien Mogenet
> > >>> 06.59.16.64.22
> > >>> http://www.mogenet.me
> > >>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >>
> > >> - Andy
> > >>
> > >> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > >> (via Tom White)
> > >>
> >
>
--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
Re: Coprocessor / threading model
Posted by ramkrishna vasudevan <ra...@gmail.com>.
In Anoop's soln its basicallly the put happens directly on the index region
rather than doing a put thro HTable.
Regards
Ram
On Sun, Jan 13, 2013 at 9:28 AM, Andrew Purtell <an...@gmail.com>wrote:
> Yes, especially if the cross region communication is in process.
>
> On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. Coprocessors are meant to operate on the region to which they are
> > associated.
> >
> > For Anoop's case, the secondary table(s) have their regions aligned with
> > the corresponding region from primary table. Meaning, related regions are
> > served by the same region server.
> > Would writes to such regions of secondary table(s) be acceptable ?
> >
> > Thanks
> >
> > On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> >
> >>> In pre-put, I trigger another Put() in an external table (to build the
> >> secondary index).
> >>
> >> We should probably call this a Coprocessor anti-pattern.
> >>
> >> Coprocessors are meant to operate on the region to which they are
> >> associated. They are a way you can extend HBase function while it
> operates
> >> in region on data for the region. Think of them as loadable kernel
> modules.
> >> They are not a general purpose server side platform for programming as
> if
> >> you are building a HBase client (with HTable, etc.). Just because you
> can
> >> do this doesn't mean you should.
> >>
> >>
> >> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <
> adrien.mogenet@gmail.com
> >>> wrote:
> >>
> >>> Hi there,
> >>>
> >>> I'm experiencing some issues with CP. I'm trying to implement an
> indexing
> >>> solution (inspired by Annop's slides). In pre-put, I trigger another
> >> Put()
> >>> in an external table (to build the secondary index). It works perfect
> for
> >>> one client, but when I'm inserting data from 2 separate clients, I met
> >>> issues with HTable object (the one used in pre-Put()), because it's not
> >>> thread-safe. I decided to move on TablePool and that fixed my issue.
> >>>
> >>> But if I increase the write-load (and concurrency) HBase is throwing a
> >> OOM
> >>> exception because it can't create new native threads. Looking at HBase
> >>> metrics "threads count", I see that roughly 3500 threads are created.
> >>>
> >>> I'm looking for documentation about how CPs are working with threads :
> >>> what/when should I protect against concurrency issues ? How may I solve
> >> my
> >>> issue ?
> >>>
> >>> Help is welcome :-)
> >>>
> >>> --
> >>> Adrien Mogenet
> >>> 06.59.16.64.22
> >>> http://www.mogenet.me
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >> - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >>
>
Re: Coprocessor / threading model
Posted by Andrew Purtell <an...@gmail.com>.
Yes, especially if the cross region communication is in process.
On Jan 12, 2013, at 6:48 PM, Ted Yu <yu...@gmail.com> wrote:
> bq. Coprocessors are meant to operate on the region to which they are
> associated.
>
> For Anoop's case, the secondary table(s) have their regions aligned with
> the corresponding region from primary table. Meaning, related regions are
> served by the same region server.
> Would writes to such regions of secondary table(s) be acceptable ?
>
> Thanks
>
> On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org> wrote:
>
>>> In pre-put, I trigger another Put() in an external table (to build the
>> secondary index).
>>
>> We should probably call this a Coprocessor anti-pattern.
>>
>> Coprocessors are meant to operate on the region to which they are
>> associated. They are a way you can extend HBase function while it operates
>> in region on data for the region. Think of them as loadable kernel modules.
>> They are not a general purpose server side platform for programming as if
>> you are building a HBase client (with HTable, etc.). Just because you can
>> do this doesn't mean you should.
>>
>>
>> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <adrien.mogenet@gmail.com
>>> wrote:
>>
>>> Hi there,
>>>
>>> I'm experiencing some issues with CP. I'm trying to implement an indexing
>>> solution (inspired by Annop's slides). In pre-put, I trigger another
>> Put()
>>> in an external table (to build the secondary index). It works perfect for
>>> one client, but when I'm inserting data from 2 separate clients, I met
>>> issues with HTable object (the one used in pre-Put()), because it's not
>>> thread-safe. I decided to move on TablePool and that fixed my issue.
>>>
>>> But if I increase the write-load (and concurrency) HBase is throwing a
>> OOM
>>> exception because it can't create new native threads. Looking at HBase
>>> metrics "threads count", I see that roughly 3500 threads are created.
>>>
>>> I'm looking for documentation about how CPs are working with threads :
>>> what/when should I protect against concurrency issues ? How may I solve
>> my
>>> issue ?
>>>
>>> Help is welcome :-)
>>>
>>> --
>>> Adrien Mogenet
>>> 06.59.16.64.22
>>> http://www.mogenet.me
>>
>>
>>
>> --
>> Best regards,
>>
>> - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
Re: Coprocessor / threading model
Posted by Ted Yu <yu...@gmail.com>.
bq. Coprocessors are meant to operate on the region to which they are
associated.
For Anoop's case, the secondary table(s) have their regions aligned with
the corresponding region from primary table. Meaning, related regions are
served by the same region server.
Would writes to such regions of secondary table(s) be acceptable ?
Thanks
On Sat, Jan 12, 2013 at 6:39 PM, Andrew Purtell <ap...@apache.org> wrote:
> > In pre-put, I trigger another Put() in an external table (to build the
> secondary index).
>
> We should probably call this a Coprocessor anti-pattern.
>
> Coprocessors are meant to operate on the region to which they are
> associated. They are a way you can extend HBase function while it operates
> in region on data for the region. Think of them as loadable kernel modules.
> They are not a general purpose server side platform for programming as if
> you are building a HBase client (with HTable, etc.). Just because you can
> do this doesn't mean you should.
>
>
> On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <adrien.mogenet@gmail.com
> >wrote:
>
> > Hi there,
> >
> > I'm experiencing some issues with CP. I'm trying to implement an indexing
> > solution (inspired by Annop's slides). In pre-put, I trigger another
> Put()
> > in an external table (to build the secondary index). It works perfect for
> > one client, but when I'm inserting data from 2 separate clients, I met
> > issues with HTable object (the one used in pre-Put()), because it's not
> > thread-safe. I decided to move on TablePool and that fixed my issue.
> >
> > But if I increase the write-load (and concurrency) HBase is throwing a
> OOM
> > exception because it can't create new native threads. Looking at HBase
> > metrics "threads count", I see that roughly 3500 threads are created.
> >
> > I'm looking for documentation about how CPs are working with threads :
> > what/when should I protect against concurrency issues ? How may I solve
> my
> > issue ?
> >
> > Help is welcome :-)
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>
>
>
> --
> Best regards,
>
> - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
Re: Coprocessor / threading model
Posted by Andrew Purtell <ap...@apache.org>.
> In pre-put, I trigger another Put() in an external table (to build the
secondary index).
We should probably call this a Coprocessor anti-pattern.
Coprocessors are meant to operate on the region to which they are
associated. They are a way you can extend HBase function while it operates
in region on data for the region. Think of them as loadable kernel modules.
They are not a general purpose server side platform for programming as if
you are building a HBase client (with HTable, etc.). Just because you can
do this doesn't mean you should.
On Sat, Jan 12, 2013 at 4:06 PM, Adrien Mogenet <ad...@gmail.com>wrote:
> Hi there,
>
> I'm experiencing some issues with CP. I'm trying to implement an indexing
> solution (inspired by Annop's slides). In pre-put, I trigger another Put()
> in an external table (to build the secondary index). It works perfect for
> one client, but when I'm inserting data from 2 separate clients, I met
> issues with HTable object (the one used in pre-Put()), because it's not
> thread-safe. I decided to move on TablePool and that fixed my issue.
>
> But if I increase the write-load (and concurrency) HBase is throwing a OOM
> exception because it can't create new native threads. Looking at HBase
> metrics "threads count", I see that roughly 3500 threads are created.
>
> I'm looking for documentation about how CPs are working with threads :
> what/when should I protect against concurrency issues ? How may I solve my
> issue ?
>
> Help is welcome :-)
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>
--
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)