You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Alexey Romanenko (JIRA)" <ji...@apache.org> on 2019/01/03 16:54:00 UTC

[jira] [Comment Edited] (BEAM-5310) Add support of HadoopOutputFormatIO

    [ https://issues.apache.org/jira/browse/BEAM-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733218#comment-16733218 ] 

Alexey Romanenko edited comment on BEAM-5310 at 1/3/19 4:53 PM:
----------------------------------------------------------------

[~kenn] The current status is the following (extract from my email to Dev ML):

 _"We added new module, called “hadoop-format”, which incorporates Read and Write parts for using Hadoop Mapreduce Format files. Old module “hadoop-input-format” keeps all public user API, but proxies all calls to new module, and will become deprecated starting from Beam 2.10. The implementation of “Read” part has moved into HadoopFormatIO and “Write" part was written from scratch. Unit tests are kept for both modules for the moment to guarantee that there is no regression._ 
 
 _So, from the user perspective, everything should be as it was before, except that old IO becomes deprecated and the users have to migrate to new one after release 2.10._
   
 _What is left to do:_
 _- Completely remove deprecated “hadoop-input-format” (at LTS or 3.0 release?..)_
 _- Add new “hadoop-format” ITs to run on Jenkins."_

Since we still have running and working old {{HadoopInputFormatIOIT}} test on Jenkins, which actually tests new {{HadoopFormatIO.Read}} (by proxying calls, as I mentioned above) then I think missing a new IT on Jenkins is not a blocker for release and we can move this issue (BEAM-6246) to 2.11. Are you ok with that?


was (Author: aromanenko):
[~kenn] The current status is the following (extract from my email to Dev ML):
 _"We added new module, called “hadoop-format”, which incorporates Read and Write parts for using Hadoop Mapreduce Format files. Old module “hadoop-input-format” keeps all public user API, but proxies all calls to new module, and will become deprecated starting from Beam 2.10. The implementation of “Read” part has moved into HadoopFormatIO and “Write" part was written from scratch. Unit tests are kept for both modules for the moment to guarantee that there is no regression._ 
  __ 
 _So, from the user perspective, everything should be as it was before, except that old IO becomes deprecated and the users have to migrate to new one after release 2.10._
  __ 
 _What is left to do:_
 _- Completely remove deprecated “hadoop-input-format” (at LTS or 3.0 release?..)_
 _- Add new “hadoop-format” ITs to run on Jenkins."_

Since we still have running and working old {{HadoopInputFormatIOIT}} test on Jenkins, which actually tests new {{HadoopFormatIO.Read}} (by proxying calls, as I mentioned above) then I think missing a new IT on Jenkins is not a blocker for release and we can move this issue (BEAM-6246) to 2.11. Are you ok with that?

> Add support of HadoopOutputFormatIO
> -----------------------------------
>
>                 Key: BEAM-5310
>                 URL: https://issues.apache.org/jira/browse/BEAM-5310
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-hadoop
>            Reporter: Alexey Romanenko
>            Assignee: Alexey Romanenko
>            Priority: Minor
>             Fix For: 2.10.0
>
>
> For the moment, there is only {{HadoopInputFormatIO}} in Beam. To provide a support of different writing IOs, that are not yet natively supported in Beam (for example, Apache Orc or HBase bulk load), it would make sense to add {{HadoopOutputFormatIO}} as well. It will incorporate support of batching and streaming processing.
> After, {{HadoopInputFormatIO}} and {{HadoopOutputFormatIO}} should be merged into one module, called {{HadoopFormatIO}}. Old {{HadoopInputFormatIO}} should become deprecated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)