You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bill Graham (JIRA)" <ji...@apache.org> on 2011/05/12 17:28:47 UTC

[jira] [Created] (HBASE-3880) Make mapper function in ImportTSV plug-able

Make mapper function in ImportTSV plug-able
-------------------------------------------

                 Key: HBASE-3880
                 URL: https://issues.apache.org/jira/browse/HBASE-3880
             Project: HBase
          Issue Type: New Feature
            Reporter: Bill Graham


It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.

The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035707#comment-13035707 ] 

Bill Graham commented on HBASE-3880:
------------------------------------

Sounds good, I'm working on the following changes to {{TsvImporter}}:

* Make {{TsvImporter}} an outer class and rename it to {{TsvImporterMapper}}.
* Change the {{setup}} method to public from protected.
* Expose getters for {{ts}}, {{skipBadLines}} and {{badLineCount}}.
* Add {{incrementBadLineCount(int count)}} method.

I should have a patch ready soon unless there are other suggestions/comments. For now I was going to leave the {{TsvParser}} as an inner class, unless anyone things that would be useful as well.

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated HBASE-3880:
-------------------------------

    Status: Patch Available  (was: Open)

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch, HBASE-3880_2.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham reassigned HBASE-3880:
----------------------------------

    Assignee: Bill Graham

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated HBASE-3880:
-------------------------------

    Attachment: HBASE-3880_2.patch

Here's a second version of the patch.

It contains the changes discussed above except I didn't change the {{setup}} method to public, since that seems unnecessary. I did however add a {{doSetup}} method to split out the generic setup functionality that a subclass might want from the required specific setup functionality that a superclass needs (i.e. parser initialization). See the test mapper for an example.

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch, HBASE-3880_2.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated HBASE-3880:
-------------------------------

    Attachment: HBASE-3880_1.patch

Attaching patch 1. This patch allows the mapper to be injected with an option like {{-Dimporttsv.mapper.class=my.Mapper}}. 

A few potential issues with swapping in another mapper like this:

1. Many of the other {{-D}} options are supported in the default {{TsvImporter}}, so using a custom mapper will make these params unused, unless the mapper also re-implements support for them. We could instead make {{TsvImporter}} an outer class so it could be sub-classed.
2. Custom mappers won't have access to {{TsvParser}}. Again, maybe that's ok for now. If {{TsvParser}} would be useful outside of {{ImportTsv}} it can be moved to an outer class.

Any comments on this implementation?

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035593#comment-13035593 ] 

stack commented on HBASE-3880:
------------------------------

Nice patch.  I'd be up for committing this with its test and all but maybe you want to go ahead and move TsvImporter out to be an outer class altogether?  Let me know.  Good on you Bill.

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3880:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.92.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Applied to TRUNK.  Thanks for the nice patch Bill.

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3880_1.patch, HBASE-3880_2.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated HBASE-3880:
-------------------------------

    Status: Open  (was: Patch Available)

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035827#comment-13035827 ] 

stack commented on HBASE-3880:
------------------------------

@Bill All sounds good to me.

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3880) Make mapper function in ImportTSV plug-able

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated HBASE-3880:
-------------------------------

    Release Note: ImportTsv allows using a custom Mapper implementation.
          Status: Patch Available  (was: Open)

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira