You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Tomo Suzuki (Jira)" <ji...@apache.org> on 2019/12/02 20:22:00 UTC

[jira] [Commented] (BEAM-8822) Hadoop Client version 2.8 from 2.7

    [ https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986339#comment-16986339 ] 

Tomo Suzuki commented on BEAM-8822:
-----------------------------------

There are two modules that use Hadoop client dependencies: hadoop-format and hadoop-file-system.

h1. sdks/java/io/hadoop-format

As per [Hadoop Input/Output Format IO|https://beam.apache.org/documentation/io/built-in/hadoop/], HadoopFormatIO (in beam-sdks-java-io-hadoop-format artifact) is not just reading files from Hadoop, but serves the fundamental class for other file formats such as Cassandra, HBase, and even Elasticsearch.

Its integration test HadoopFormatIOIT uses PostgreSQL. Setting up PostgreSQL instance in local MacBook and running HadoopFormatIOIT with IntelliJ worked.
{noformat}
--tests
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOIT
-DintegrationTestPipelineOptions='[
"--postgresServerName=localhost",
"--postgresUsername=suztomo",
"--postgresDatabaseName=suztomo",
"--postgresPassword=",
"--postgresSsl=false",
"--numberOfRecords=1000"
]'{noformat}


h1. sdks/java/io/hadoop-file-system

HadoopFileSystem is in sdks/java/io/hadoop-file-system module. Its test HadoopFileSystemTest creates MiniDFSCluster (hadoop-hdfs artifact) and confirms interaction with it through create and read files. Beam's HadoopFileSystem class provides functions such as {{match}}, {{create}}, {{open}}, {{copy}}, and etc.

My initial thought on testing compatibility of Hadoop dependency is to check such communication between new HDFS and old HDFS client.

But where is HadoopFileSystem used?




> Hadoop Client version 2.8 from 2.7
> ----------------------------------
>
>                 Key: BEAM-8822
>                 URL: https://issues.apache.org/jira/browse/BEAM-8822
>             Project: Beam
>          Issue Type: Bug
>          Components: build-system
>            Reporter: Tomo Suzuki
>            Assignee: Tomo Suzuki
>            Priority: Major
>         Attachments: OGuVu0A18jJ.png
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a good reason to do so https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> The URL says
> {quote}Following branches are EOL: 
> [2.0.x - 2.7.x]{quote}
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> About compatibility with other library:
> Hadoop client 2.7 is not compatible with Guava > 21 because of Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the method ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).
> 2.8.5 is the latest in 2.8.X.
>  !OGuVu0A18jJ.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)