You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bill Graham (JIRA)" <ji...@apache.org> on 2011/05/18 07:47:47 UTC

[jira] [Updated] (HBASE-3880) Make mapper function in ImportTSV plug-able

     [ https://issues.apache.org/jira/browse/HBASE-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated HBASE-3880:
-------------------------------

    Attachment: HBASE-3880_1.patch

Attaching patch 1. This patch allows the mapper to be injected with an option like {{-Dimporttsv.mapper.class=my.Mapper}}. 

A few potential issues with swapping in another mapper like this:

1. Many of the other {{-D}} options are supported in the default {{TsvImporter}}, so using a custom mapper will make these params unused, unless the mapper also re-implements support for them. We could instead make {{TsvImporter}} an outer class so it could be sub-classed.
2. Custom mappers won't have access to {{TsvParser}}. Again, maybe that's ok for now. If {{TsvParser}} would be useful outside of {{ImportTsv}} it can be moved to an outer class.

Any comments on this implementation?

> Make mapper function in ImportTSV plug-able
> -------------------------------------------
>
>                 Key: HBASE-3880
>                 URL: https://issues.apache.org/jira/browse/HBASE-3880
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: HBASE-3880_1.patch
>
>
> It would be really useful to allow the ability to specify a different Mapper for the {{ImportTsv}} class to use than the current {{TsvImporter}}. This would allow transformations to be made on the input data before being added to HBase. One suggestion is to add a new command line option to specify a user defined mapper (UDM?). Or maybe instead we just refactor it to be extended where a subclass can specify a new mapper.
> The mapper is statically defined and bound to the job though, so I'm not sure of the best way to make it dynamically plug-able. Suggestions welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira