You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Rafal Wojdyla (Jira)" <ji...@apache.org> on 2020/12/01 23:23:00 UTC

[jira] [Comment Edited] (HADOOP-17402) Add GCS FS impl reference to core-default.xml

    [ https://issues.apache.org/jira/browse/HADOOP-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241935#comment-17241935 ] 

Rafal Wojdyla edited comment on HADOOP-17402 at 12/1/20, 11:22 PM:
-------------------------------------------------------------------

[~stevel@apache.org] thanks for the links. I'm with you on the long term vision. In the meantime tho, is there something we can do to bring GCS connector on par with S3 (specifically the {{core-default}} config). I'm mostly thinking of pyspark users, for whom java ecosystem may be a puzzle. Spark/pyspark loads {{core-default}} from {{hadoop-common}}. Afaiu in pyspark context the auto service doesn't actually register the {{gs}} scheme (at least in the context where connector jar is loaded via `spark.jars`, which is likely), so Spark users are forced to add the config manually.

One might argue that adding the config to {{core-default}} would still result in missing class error, but at least it would look the same as S3, and it would save on extra config. What do you think?


was (Author: ravwojdyla):
[~stevel@apache.org] thanks for the links. I'm with you on the long term vision. In the meantime tho, is there something we can do to bring GCS connector on par with S3 (specifically the {{core-default}} config). I'm mostly thinking of pyspark users, for whom java ecosystem may be a puzzle. Spark/pyspark loads {{core-default}} from {{hadoop-common}}. Afaiu in pyspark context the auto service doesn't actually register the {{gs}} scheme, so Spark users are forced to add the config manually.

One might argue that adding the config to {{core-default}} would still result in missing class error, but at least it would look the same as S3, and it would save on extra config. What do you think?

> Add GCS FS impl reference to core-default.xml
> ---------------------------------------------
>
>                 Key: HADOOP-17402
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17402
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Rafal Wojdyla
>            Priority: Major
>
> Akin to current S3 default configuration add GCS configuration, specifically to declare the GCS implementation. [GCS connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage]. Has this not been done since the GCS connector is not part of the hadoop/ASF codebase, or is there any other blocker?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org