You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bahir.apache.org by "Christian Kadner (JIRA)" <ji...@apache.org> on 2016/12/01 02:13:00 UTC

[jira] [Comment Edited] (BAHIR-67) WebHDFS Data Source for Spark SQL

    [ https://issues.apache.org/jira/browse/BAHIR-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710373#comment-15710373 ] 

Christian Kadner edited comment on BAHIR-67 at 12/1/16 2:12 AM:
----------------------------------------------------------------

I pushed a rudimentary extension of WebHdfsFileSystem to your fork on branch [BAHIR-75-WebHdfsFileSystem|https://github.com/sourav-mazumder/bahir/blob/BAHIR-75-WebHdfsFileSystem/datasource-webhdfs/src/main/scala/org/apache/hadoop/hdfs/web/BahirWebHdfsFileSystem.scala] to show how we can override the necessary method(s).

Please try to work off of that (add required authentication and authorization URL parameters)



was (Author: ckadner):
I pushed a rudimentary extension of WebHdfsFileSystem to your fork on branch [BAHIR-75-WebHdfsFileSystem|https://github.com/sourav-mazumder/bahir/blob/BAHIR-75-WebHdfsFileSystem/datasource-webhdfs/src/main/scala/org/apache/hadoop/hdfs/web/bahir/BahirWebHdfsFileSystem.scala] to show how we can override the necessary method(s).

Please try to work off of that (add required authentication and authorization URL parameters)


> WebHDFS Data Source for Spark SQL
> ---------------------------------
>
>                 Key: BAHIR-67
>                 URL: https://issues.apache.org/jira/browse/BAHIR-67
>             Project: Bahir
>          Issue Type: New Feature
>          Components: Spark SQL Data Sources
>            Reporter: Sourav Mazumder
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Ability to read/write data in Spark from/to HDFS of a remote Hadoop Cluster
> In today's world of Analytics many use cases need capability to access data from multiple remote data sources in Spark. Though Spark has great integration with local Hadoop cluster it lacks heavily on capability for connecting to a remote Hadoop cluster. However, in reality not all data of enterprises in Hadoop and running Spark Cluster locally with Hadoop Cluster is not always a solution.
> In this improvement we propose to create a connector for accessing data (read and write) from/to HDFS of a remote Hadoop cluster from Spark using webhdfs api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)