You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Prashant Kommireddi (JIRA)" <ji...@apache.org> on 2014/03/15 11:07:47 UTC
[jira] [Comment Edited] (PHOENIX-129) Improve MapReduce-based
import
[ https://issues.apache.org/jira/browse/PHOENIX-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936113#comment-13936113 ]
Prashant Kommireddi edited comment on PHOENIX-129 at 3/15/14 10:07 AM:
-----------------------------------------------------------------------
This is great [~gabriel.reid]! A few questions
* Do we want remove the old MR bulk loader entirely, or keep it around for a release, mark it deprecated and communicate to the users that it would not be supported from the next release onwards?
* I believe the Bulk Loader creates HFiles and does not directly write to the Phoenix table via a connection? {{PhoenixHBaseStorage}} actually writes to a table directly and there are custom OutputFormat, RecordWriter and OutputCommitter implementations. I am guessing those wouldn't be required here as creating HFiles might be more efficient?
* I like the idea of sticking to the general way of firing off MR jobs via the "hadoop" command. I don't see any custom Phoenix specific stuff to require a separate script to launch this job.
* Can you please upload the patch to ReviewBoard? It's easier to review/comment on that.
Again, this is really cool stuff! Thank you.
was (Author: prkommireddi):
This is great [~gabriel.reid]! A few questions
# Do we want remove the old MR bulk loader entirely, or keep it around for a release, mark it deprecated and communicate to the users that it would not be supported from the next release onwards?
# I believe the Bulk Loader creates HFiles and does not directly write to the Phoenix table via a connection? {{PhoenixHBaseStorage}} actually writes to a table directly and there are custom OutputFormat, RecordWriter and OutputCommitter implementations. I am guessing those wouldn't be required here as creating HFiles might be more efficient?
# I like the idea of sticking to the general way of firing off MR jobs via the "hadoop" command. I don't see any custom Phoenix specific stuff to require a separate script to launch this job.
# Can you please upload the patch to ReviewBoard? It's easier to review/comment on that.
Again, this is really cool stuff! Thank you.
> Improve MapReduce-based import
> ------------------------------
>
> Key: PHOENIX-129
> URL: https://issues.apache.org/jira/browse/PHOENIX-129
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Attachments: PHOENIX-129-3.0.patch, PHOENIX-129-master.patch
>
>
> In implementing PHOENIX-66, it was noted that the current MapReduce-based importer implementation has a number issues, including the following:
> * CSV handling is largely replicated from the non-MR code, with no ability to specify custom separators
> * No automated tests, and code is written in a way that makes it difficult to test
> * Unusual custom config loading and handling instead of using GenericOptionParser and ToolRunner and friends
> The initial work towards PHOENIX-66 included refactoring the MR importer enough to use common code, up until the development of automated testing exposed the fact that the MR importer could use some major refactoring.
> This ticket is a proposal to do a relatively major rework of the MR import, fixing the above issues. The biggest improvements that will result from this are a common codebase for handling CSV input, and the addition of automated testing for the MR import.
--
This message was sent by Atlassian JIRA
(v6.2#6252)