You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Praveen Sripati <pr...@gmail.com> on 2012/01/26 07:24:07 UTC

Difference between coprocessors and filters

Hi,

Coprocessors introduced in 0.92 can also be used to filter out the data
similar to the filters. What are the differences between filters and
coprocessors leaving aside the code/API? One thing I can think is, filters
are defined at the client, while the coprocessors are defined on the
server. So, coprocessors are reusable across clients.

Regards,
Praveen

Re: Difference between coprocessors and filters

Posted by Mingjie Lai <ml...@apache.org>.
Andy explained the problem clearly.

In addition, there is a hbase coprocessor article under review at:
https://docs.google.com/document/d/1PgfgBcqk2iPZOZId9LonUMx_Dz49ICEAQQzktVt5hpY/edit

It can help you to understand more about coprocessor. After a public 
review, It will be posted to apache blog site.

Thanks,
Mingjie

On 01/26/2012 10:58 AM, Andrew Purtell wrote:
> Praveen,
>
>> What are the differences between filters and coprocessors leaving
>> aside the code/API?
>
> I don't think one can leave aside the API ...
>
>
> Filters came first. They were a query performance optimization, pushing predicates to execute server side, eliminating unnecessary data transfer over the network and unnecessary processing at the client. The API for writing filters and the opportunities for executing filter code were constrained to this use case.
>
>
> Coprocessors extend opportunities for server side execution. They could be used to implement filtering alternatives. They also make it possible now to do things much like triggers and stored procedures. The API for writing coprocessors and opportunities for executing coprocessor code cover many more extension use cases now.
>
>
>> One thing I can think is, filters are defined at the client, while
>> the coprocessors are defined on the server. So, coprocessors are
>> reusable across clients.
>
> I'm not sure that is a valid distinction.
>
> It might help you to think of both filters and coprocessors as extensions loaded into the server runtime. In both cases, code, from the developer or from the distribution, is loaded on demand into the regionserver (though with coprocessors it can be more dynamic than with filters). In both cases, the client builds a request which is acted upon server side by newly resident code there.
>
> Best regards,
>
>      - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>
>
>
>> ________________________________
>> From: Praveen Sripati<pr...@gmail.com>
>> To: user@hbase.apache.org
>> Sent: Wednesday, January 25, 2012 10:24 PM
>> Subject: Difference between coprocessors and filters
>>
>> Hi,
>>
>> Coprocessors introduced in 0.92 can also be used to filter out the data
>> similar to the filters. What are the differences between filters and
>> coprocessors leaving aside the code/API? One thing I can think is, filters
>> are defined at the client, while the coprocessors are defined on the
>> server. So, coprocessors are reusable across clients.
>>
>> Regards,
>> Praveen
>>
>>
>>

Re: Difference between coprocessors and filters

Posted by Andrew Purtell <ap...@apache.org>.
Right, I wasn't thinking of a replacement, just that this seems like missing information that is available at the callout location that should be passed through. 
 
Best regards,

     - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) 


----- Original Message -----
> From: lars hofhansl <lh...@yahoo.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Cc: 
> Sent: Sunday, January 29, 2012 3:23 PM
> Subject: Re: Difference between coprocessors and filters
> 
> Sure. Although I am not sure whether coprocessors should actually do that. They 
> can operate on top of a scanner that is already filtered by a filter.
> I think that would need a completely new API.
> 
> 
> -- Lars
> 
> 
> ________________________________
> From: Andrew Purtell <ap...@yahoo.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org> 
> Sent: Sunday, January 29, 2012 11:09 AM
> Subject: Re: Difference between coprocessors and filters
> 
>>  Seek-hints could be added to coprocessors - preScannerNext/postScannerNext 
> -, but they do not currently support this.)
> 
> Coprocessors shouldn't be a catch-all for every extension case, but this is 
> arguably missing information that should be passed through. File a JIRA for it 
> Lars? 
> 
> Best regards,
> 
>     - Andy
> 
> 
> On Jan 28, 2012, at 8:53 PM, lars hofhansl <lh...@yahoo.com> wrote:
> 
>>  Seek-hints could be added to coprocessors - preScannerNext/postScannerNext 
> -, but they do not currently support this.)
> 

Re: Difference between coprocessors and filters

Posted by lars hofhansl <lh...@yahoo.com>.
Sure. Although I am not sure whether coprocessors should actually do that. They can operate on top of a scanner that is already filtered by a filter.
I think that would need a completely new API.


-- Lars


________________________________
 From: Andrew Purtell <ap...@yahoo.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Sunday, January 29, 2012 11:09 AM
Subject: Re: Difference between coprocessors and filters
 
> Seek-hints could be added to coprocessors - preScannerNext/postScannerNext -, but they do not currently support this.)

Coprocessors shouldn't be a catch-all for every extension case, but this is arguably missing information that should be passed through. File a JIRA for it Lars? 

Best regards,

    - Andy


On Jan 28, 2012, at 8:53 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Seek-hints could be added to coprocessors - preScannerNext/postScannerNext -, but they do not currently support this.)

Re: Difference between coprocessors and filters

Posted by Andrew Purtell <ap...@yahoo.com>.
> Seek-hints could be added to coprocessors - preScannerNext/postScannerNext -, but they do not currently support this.)

Coprocessors shouldn't be a catch-all for every extension case, but this is arguably missing information that should be passed through. File a JIRA for it Lars? 

Best regards,

    - Andy


On Jan 28, 2012, at 8:53 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Seek-hints could be added to coprocessors - preScannerNext/postScannerNext -, but they do not currently support this.)

Re: Difference between coprocessors and filters

Posted by lars hofhansl <lh...@yahoo.com>.
I might add to that, that filters - depending on the use case - are potentially much more efficient than coprocessors, because that can provide the
scannerwith"seek-hints"; allowing the scanner to seek past columns, entire rows, or even entire blocks of cells with a single seek operation.
Bottom line. If you can use filers, do that. If not, use coprocessors.


(Seek-hints could be added to coprocessors - preScannerNext/postScannerNext -, but they do not currently support this.)


-- Lars



________________________________
 From: Andrew Purtell <ap...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Thursday, January 26, 2012 10:58 AM
Subject: Re: Difference between coprocessors and filters
 
Praveen,

> What are the differences between filters and coprocessors leaving
> aside the code/API?

I don't think one can leave aside the API ...


Filters came first. They were a query performance optimization, pushing predicates to execute server side, eliminating unnecessary data transfer over the network and unnecessary processing at the client. The API for writing filters and the opportunities for executing filter code were constrained to this use case.


Coprocessors extend opportunities for server side execution. They could be used to implement filtering alternatives. They also make it possible now to do things much like triggers and stored procedures. The API for writing coprocessors and opportunities for executing coprocessor code cover many more extension use cases now.


> One thing I can think is, filters are defined at the client, while
> the coprocessors are defined on the server. So, coprocessors are
> reusable across clients.

I'm not sure that is a valid distinction.

It might help you to think of both filters and coprocessors as extensions loaded into the server runtime. In both cases, code, from the developer or from the distribution, is loaded on demand into the regionserver (though with coprocessors it can be more dynamic than with filters). In both cases, the client builds a request which is acted upon server side by newly resident code there.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) 



>________________________________
> From: Praveen Sripati <pr...@gmail.com>
>To: user@hbase.apache.org 
>Sent: Wednesday, January 25, 2012 10:24 PM
>Subject: Difference between coprocessors and filters
> 
>Hi,
>
>Coprocessors introduced in 0.92 can also be used to filter out the data
>similar to the filters. What are the differences between filters and
>coprocessors leaving aside the code/API? One thing I can think is, filters
>are defined at the client, while the coprocessors are defined on the
>server. So, coprocessors are reusable across clients.
>
>Regards,
>Praveen
>
>
>

Re: Difference between coprocessors and filters

Posted by Andrew Purtell <ap...@apache.org>.
Praveen,

> What are the differences between filters and coprocessors leaving
> aside the code/API?

I don't think one can leave aside the API ...


Filters came first. They were a query performance optimization, pushing predicates to execute server side, eliminating unnecessary data transfer over the network and unnecessary processing at the client. The API for writing filters and the opportunities for executing filter code were constrained to this use case.


Coprocessors extend opportunities for server side execution. They could be used to implement filtering alternatives. They also make it possible now to do things much like triggers and stored procedures. The API for writing coprocessors and opportunities for executing coprocessor code cover many more extension use cases now.


> One thing I can think is, filters are defined at the client, while
> the coprocessors are defined on the server. So, coprocessors are
> reusable across clients.

I'm not sure that is a valid distinction.

It might help you to think of both filters and coprocessors as extensions loaded into the server runtime. In both cases, code, from the developer or from the distribution, is loaded on demand into the regionserver (though with coprocessors it can be more dynamic than with filters). In both cases, the client builds a request which is acted upon server side by newly resident code there.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) 



>________________________________
> From: Praveen Sripati <pr...@gmail.com>
>To: user@hbase.apache.org 
>Sent: Wednesday, January 25, 2012 10:24 PM
>Subject: Difference between coprocessors and filters
> 
>Hi,
>
>Coprocessors introduced in 0.92 can also be used to filter out the data
>similar to the filters. What are the differences between filters and
>coprocessors leaving aside the code/API? One thing I can think is, filters
>are defined at the client, while the coprocessors are defined on the
>server. So, coprocessors are reusable across clients.
>
>Regards,
>Praveen
>
>
>