You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Bhavani Sudha (Jira)" <ji...@apache.org> on 2020/08/14 18:30:00 UTC
[jira] [Updated] (HUDI-783) Add official python support to create
hudi datasets using pyspark
[ https://issues.apache.org/jira/browse/HUDI-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bhavani Sudha updated HUDI-783:
-------------------------------
Status: In Progress (was: Open)
> Add official python support to create hudi datasets using pyspark
> -----------------------------------------------------------------
>
> Key: HUDI-783
> URL: https://issues.apache.org/jira/browse/HUDI-783
> Project: Apache Hudi
> Issue Type: Wish
> Components: Utilities
> Reporter: Vinoth Govindarajan
> Assignee: Vinoth Govindarajan
> Priority: Major
> Labels: features, pull-request-available
> Fix For: 0.6.0
>
>
> *Goal:*
> As a pyspark user, I would like to read/write hudi datasets using pyspark.
> There are several components to achieve this goal.
> # Create a hudi-pyspark package that users can import and start reading/writing hudi datasets.
> # Explain how to read/write hudi datasets using pyspark in a blog post/documentation.
> # Add the hudi-pyspark module to the hudi demo docker along with the instructions.
> # Make the package available as part of the [spark packages index|https://spark-packages.org/] and [python package index|https://pypi.org/]
> hudi-pyspark packages should implement HUDI data source API for Apache Spark using which HUDI files can be read as DataFrame and write to any Hadoop supported file system.
> Usage pattern after we launch this feature should be something like this:
> Install the package using:
> {code:java}
> pip install hudi-pyspark{code}
> or
> Include hudi-pyspark package in your Spark Applications using:
> spark-shell, pyspark, or spark-submit
> {code:java}
> > $SPARK_HOME/bin/spark-shell --packages org.apache.hudi:hudi-pyspark_2.11:0.5.2{code}
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)