You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by linshan <ma...@163.com> on 2020/08/19 03:02:43 UTC

[DISCUSS] Support Spark Structured Streaming read from Hudi table

hi team：
     I need  help,After a few days of thinking, trial and error, I have no idea.I wrote the relevant information on this page。Please follow this link（https://issues.apache.org/jira/browse/HUDI-1126)。
   
Best,
linshan-ma

Re: [DISCUSS] Support Spark Structured Streaming read from Hudi table

Posted by Vinoth Chandar <vi...@apache.org>.

I would for all these new things to be revamped on top of Spark 3's newer
APIs
(it's kind of frustrating that the datasource APIs don't stabilize easily
in Spark)

I am thinking we can implement a "hudi3" format using Spark 3, with support
for SQL Merges, existing functionality and a redone Spark Structured
Streaming support.

I know I may be increasing the scope. So feel free to push back and have
this just be about getting the streaming reads working as well.

On Thu, Aug 20, 2020 at 10:50 AM Balaji Varadarajan
<v....@ymail.com.invalid> wrote:

>  Hi linshan,
> Sorry for the delay in responding. It is better to discuss code changes
> over draft PR. Can you open one and tag us there. At a high level, it looks
> like you are using Spark Datasource v2 APIs while currently the structured
> streaming write is implemented using V1 API. Let's discuss this over a PR.
> We have few folks (Gary, Udit) who know about this part better than me.
> They can help you out here.
> Balaji.V
>
>     On Tuesday, August 18, 2020, 08:03:01 PM PDT, linshan <
> mabin194046@163.com> wrote:
>
>  hi team：
>     I need  help,After a few days of thinking, trial and error, I have no
> idea.I wrote the relevant information on this page。Please follow this link（
> https://issues.apache.org/jira/browse/HUDI-1126)。
>
> Best,
> linshan-ma

Re: [DISCUSS] Support Spark Structured Streaming read from Hudi table

Posted by Balaji Varadarajan <v....@ymail.com.INVALID>.

 Hi linshan,
Sorry for the delay in responding. It is better to discuss code changes over draft PR. Can you open one and tag us there. At a high level, it looks like you are using Spark Datasource v2 APIs while currently the structured streaming write is implemented using V1 API. Let's discuss this over a PR. We have few folks (Gary, Udit) who know about this part better than me. They can help you out here.
Balaji.V

    On Tuesday, August 18, 2020, 08:03:01 PM PDT, linshan <ma...@163.com> wrote:  
 
 hi team：
    I need  help,After a few days of thinking, trial and error, I have no idea.I wrote the relevant information on this page。Please follow this link（https://issues.apache.org/jira/browse/HUDI-1126)。
  
Best,
linshan-ma