You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ferdy <fe...@kalooga.com> on 2010/06/28 11:28:51 UTC

how to specify filters in configuration

Currently it's possible to specify filters on a Scan object. Is there a 
way to specify them in configuration instead? So that they aren't hardcoded?

Generic Hbase tools (extending TableMapper) could benefit a lot of such 
a configuration. For example, we would like to import/export data with 
user specified filters, but we would not want code a specific Tool every 
time.

What are the current possibilities?



RE: how to specify filters in configuration

Posted by Michael Segel <mi...@hotmail.com>.
Ferdy,

I think you still misunderstand...

The basic scanner API can already do this for you.

Its a question about the class you have that is building the scanner and filter objects.

It doesn't make sense to put this data in to the configuration object, unless you're specifically talking about your input going in to a map/reduce job where the scan object is happening within each mapper instance.

Then you'll have a different issue.

> Date: Wed, 30 Jun 2010 14:53:03 +0200
> From: ferdy.galema@kalooga.com
> To: user@hbase.apache.org
> Subject: Re: how to specify filters in configuration
> 
> Thanks for you response.
> 
> You're definitely right about the separate config file for job specific 
> parameters.
> 
> About the filter configuration, I was planning on extending the Scan 
> object so that it is able to parse a Configuration (i.e. 
> scan.setConf(conf)).  Startrow, stoprow and filters can be parsed out of 
> the configuration - only when specified (otherwise default behaviour is 
> kept). As a side effect, generic frameworks such as the TableMapper 
> could use this method, so that every job extending it can benefit from 
> the ability to configure filters at runtime.
> 
> Now startrow and stoprow is not that difficult to configure (we could 
> use properties 'scan.startrow' and 'scan.stoprow' for example). The only 
> real difficulty is creating a nice way to configure a filter. Ofcouse 
> this is because there are several implementations 
> (SingleColumnValueFilter, FilterList), each with it's own specific options.
> 
> So I was just figuring out how to provide a clean way to do implement 
> filter options in configuration.
> 
> Michael Segel wrote:
> > Ferdy,
> >
> > I don't think you understand.
> >
> > What you're asking for doesn't make sense.
> >
> > Your filters could be built dynamically, so you code it once and based on the parameters passed in, you build a filter and apply it to the scan.
> > Whether you pass in the parameters in a configuration file or from a GUI attached to the client code doesn't matter.
> >
> > Just a clarification ... I've seen some developers do this and its not a good practice... You want to avoid putting job specific parameters in a hadoop config file.
> > Use the config file as a way to pass in cloud specific parameters that you want to override and use a separate config file to pass in the application specific command line options or use command line options.  (I'm sure someone is going to argue a counter point to this.)
> >
> > But getting back to your point. You just need to write some dynamic code for your filters and then you can pass in your column list to filter on as a parameter.
> >
> > HTH
> >
> > -Mike
> >
> >   
> >> Date: Tue, 29 Jun 2010 09:17:55 +0200
> >> From: ferdy.galema@kalooga.com
> >> To: user@hbase.apache.org
> >> Subject: Re: how to specify filters in configuration
> >>
> >> The point is that instead of coding a scan with it's filters, we would 
> >> like a way to do this in configuration. Different jobs could be run more 
> >> ad hoc.
> >>
> >> On 06/28/2010 05:55 PM, Michael Segel wrote:
> >>     
> >>> I'm not sure I understand the question...
> >>>
> >>> Configurations are meant for your application to have additional/changed cloud configuration at run time.
> >>> Scan filters are specific to the job you're running.
> >>>
> >>> As to making your scans more dynamic, you should be able to do this already within your code.
> >>>
> >>>    
> >>>       
> >>>> Date: Mon, 28 Jun 2010 11:28:51 +0200
> >>>> From: ferdy.galema@kalooga.com
> >>>> To: user@hbase.apache.org
> >>>> Subject: how to specify filters in configuration
> >>>>
> >>>> Currently it's possible to specify filters on a Scan object. Is there a
> >>>> way to specify them in configuration instead? So that they aren't hardcoded?
> >>>>
> >>>> Generic Hbase tools (extending TableMapper) could benefit a lot of such
> >>>> a configuration. For example, we would like to import/export data with
> >>>> user specified filters, but we would not want code a specific Tool every
> >>>> time.
> >>>>
> >>>> What are the current possibilities?
> >>>>
> >>>>
> >>>>      
> >>>>         
> >>>   		 	   		
> >>> _________________________________________________________________
> >>> The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
> >>> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
> >>>    
> >>>       
> >  		 	   		  
> > _________________________________________________________________
> > Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
> >   
 		 	   		  
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Re: how to specify filters in configuration

Posted by Ferdy <fe...@kalooga.com>.
Thanks for you response.

You're definitely right about the separate config file for job specific 
parameters.

About the filter configuration, I was planning on extending the Scan 
object so that it is able to parse a Configuration (i.e. 
scan.setConf(conf)).  Startrow, stoprow and filters can be parsed out of 
the configuration - only when specified (otherwise default behaviour is 
kept). As a side effect, generic frameworks such as the TableMapper 
could use this method, so that every job extending it can benefit from 
the ability to configure filters at runtime.

Now startrow and stoprow is not that difficult to configure (we could 
use properties 'scan.startrow' and 'scan.stoprow' for example). The only 
real difficulty is creating a nice way to configure a filter. Ofcouse 
this is because there are several implementations 
(SingleColumnValueFilter, FilterList), each with it's own specific options.

So I was just figuring out how to provide a clean way to do implement 
filter options in configuration.

Michael Segel wrote:
> Ferdy,
>
> I don't think you understand.
>
> What you're asking for doesn't make sense.
>
> Your filters could be built dynamically, so you code it once and based on the parameters passed in, you build a filter and apply it to the scan.
> Whether you pass in the parameters in a configuration file or from a GUI attached to the client code doesn't matter.
>
> Just a clarification ... I've seen some developers do this and its not a good practice... You want to avoid putting job specific parameters in a hadoop config file.
> Use the config file as a way to pass in cloud specific parameters that you want to override and use a separate config file to pass in the application specific command line options or use command line options.  (I'm sure someone is going to argue a counter point to this.)
>
> But getting back to your point. You just need to write some dynamic code for your filters and then you can pass in your column list to filter on as a parameter.
>
> HTH
>
> -Mike
>
>   
>> Date: Tue, 29 Jun 2010 09:17:55 +0200
>> From: ferdy.galema@kalooga.com
>> To: user@hbase.apache.org
>> Subject: Re: how to specify filters in configuration
>>
>> The point is that instead of coding a scan with it's filters, we would 
>> like a way to do this in configuration. Different jobs could be run more 
>> ad hoc.
>>
>> On 06/28/2010 05:55 PM, Michael Segel wrote:
>>     
>>> I'm not sure I understand the question...
>>>
>>> Configurations are meant for your application to have additional/changed cloud configuration at run time.
>>> Scan filters are specific to the job you're running.
>>>
>>> As to making your scans more dynamic, you should be able to do this already within your code.
>>>
>>>    
>>>       
>>>> Date: Mon, 28 Jun 2010 11:28:51 +0200
>>>> From: ferdy.galema@kalooga.com
>>>> To: user@hbase.apache.org
>>>> Subject: how to specify filters in configuration
>>>>
>>>> Currently it's possible to specify filters on a Scan object. Is there a
>>>> way to specify them in configuration instead? So that they aren't hardcoded?
>>>>
>>>> Generic Hbase tools (extending TableMapper) could benefit a lot of such
>>>> a configuration. For example, we would like to import/export data with
>>>> user specified filters, but we would not want code a specific Tool every
>>>> time.
>>>>
>>>> What are the current possibilities?
>>>>
>>>>
>>>>      
>>>>         
>>>   		 	   		
>>> _________________________________________________________________
>>> The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
>>> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>>>    
>>>       
>  		 	   		  
> _________________________________________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>   

RE: how to specify filters in configuration

Posted by Michael Segel <mi...@hotmail.com>.
Ferdy,

I don't think you understand.

What you're asking for doesn't make sense.

Your filters could be built dynamically, so you code it once and based on the parameters passed in, you build a filter and apply it to the scan.
Whether you pass in the parameters in a configuration file or from a GUI attached to the client code doesn't matter.

Just a clarification ... I've seen some developers do this and its not a good practice... You want to avoid putting job specific parameters in a hadoop config file.
Use the config file as a way to pass in cloud specific parameters that you want to override and use a separate config file to pass in the application specific command line options or use command line options.  (I'm sure someone is going to argue a counter point to this.)

But getting back to your point. You just need to write some dynamic code for your filters and then you can pass in your column list to filter on as a parameter.

HTH

-Mike

> Date: Tue, 29 Jun 2010 09:17:55 +0200
> From: ferdy.galema@kalooga.com
> To: user@hbase.apache.org
> Subject: Re: how to specify filters in configuration
> 
> The point is that instead of coding a scan with it's filters, we would 
> like a way to do this in configuration. Different jobs could be run more 
> ad hoc.
> 
> On 06/28/2010 05:55 PM, Michael Segel wrote:
> >
> > I'm not sure I understand the question...
> >
> > Configurations are meant for your application to have additional/changed cloud configuration at run time.
> > Scan filters are specific to the job you're running.
> >
> > As to making your scans more dynamic, you should be able to do this already within your code.
> >
> >    
> >> Date: Mon, 28 Jun 2010 11:28:51 +0200
> >> From: ferdy.galema@kalooga.com
> >> To: user@hbase.apache.org
> >> Subject: how to specify filters in configuration
> >>
> >> Currently it's possible to specify filters on a Scan object. Is there a
> >> way to specify them in configuration instead? So that they aren't hardcoded?
> >>
> >> Generic Hbase tools (extending TableMapper) could benefit a lot of such
> >> a configuration. For example, we would like to import/export data with
> >> user specified filters, but we would not want code a specific Tool every
> >> time.
> >>
> >> What are the current possibilities?
> >>
> >>
> >>      
> >   		 	   		
> > _________________________________________________________________
> > The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
> > http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
> >    
 		 	   		  
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

Re: how to specify filters in configuration

Posted by Ferdy <fe...@kalooga.com>.
The point is that instead of coding a scan with it's filters, we would 
like a way to do this in configuration. Different jobs could be run more 
ad hoc.

On 06/28/2010 05:55 PM, Michael Segel wrote:
>
> I'm not sure I understand the question...
>
> Configurations are meant for your application to have additional/changed cloud configuration at run time.
> Scan filters are specific to the job you're running.
>
> As to making your scans more dynamic, you should be able to do this already within your code.
>
>    
>> Date: Mon, 28 Jun 2010 11:28:51 +0200
>> From: ferdy.galema@kalooga.com
>> To: user@hbase.apache.org
>> Subject: how to specify filters in configuration
>>
>> Currently it's possible to specify filters on a Scan object. Is there a
>> way to specify them in configuration instead? So that they aren't hardcoded?
>>
>> Generic Hbase tools (extending TableMapper) could benefit a lot of such
>> a configuration. For example, we would like to import/export data with
>> user specified filters, but we would not want code a specific Tool every
>> time.
>>
>> What are the current possibilities?
>>
>>
>>      
>   		 	   		
> _________________________________________________________________
> The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>    

RE: how to specify filters in configuration

Posted by Michael Segel <mi...@hotmail.com>.

I'm not sure I understand the question...

Configurations are meant for your application to have additional/changed cloud configuration at run time. 
Scan filters are specific to the job you're running.

As to making your scans more dynamic, you should be able to do this already within your code.

> Date: Mon, 28 Jun 2010 11:28:51 +0200
> From: ferdy.galema@kalooga.com
> To: user@hbase.apache.org
> Subject: how to specify filters in configuration
> 
> Currently it's possible to specify filters on a Scan object. Is there a 
> way to specify them in configuration instead? So that they aren't hardcoded?
> 
> Generic Hbase tools (extending TableMapper) could benefit a lot of such 
> a configuration. For example, we would like to import/export data with 
> user specified filters, but we would not want code a specific Tool every 
> time.
> 
> What are the current possibilities?
> 
> 
 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4