You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Qian Wang <qw...@gmail.com> on 2019/10/28 22:24:04 UTC

Write Streaming data using Datasource Writer is not working

Hi All,

I tried to use Datasource Writer to read streaming data from Kafka topic and write to Hudi dataset on HDFS.  I used following codes:

val output = data
   .writeStream
   .trigger(Trigger.ProcessingTime("300 seconds"))
   .format("org.apache.hudi")
   .option("hoodie.table.name", "hudi_ro_table")
   .outputMode("append")
   .option("path", fileLocation)
   .option("checkpointLocation", s"${fileLocation}_chpk")
   .start()
However, when I run this spark job it cannot write anything onto HDFS. Can anyone tell me how to do that? Thanks.

Best,
Eric

Re: Write Streaming data using Datasource Writer is not working

Posted by nishith agarwal <n3...@gmail.com>.
I looked at the DataStreamWriter in Spark (
https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/streaming/DataStreamWriter.html)
 and
the implementation seems to be different from DataSource. I haven't looked
into what other classes need to be extended to support hudi format type for
the DataStreamWriter (just like we have done for DataSource)

Does the datasource writer work for you ?

Thanks,
Nishith

On Mon, Oct 28, 2019 at 9:34 PM Qian Wang <qw...@gmail.com> wrote:

> Hi Nishith,
>
> Thanks for reply.
>
> I did use the Datasource Writer to write instead of using
> DataStreamWriter. I think Datasource Writer also can support write
> streaming data, correct?
>
> Best,
> Qian
> On Oct 28, 2019, 9:31 PM -0700, nishith agarwal <n3...@gmail.com>,
> wrote:
> > Qian,
> >
> > It seems like you are using the
> >
> https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/streaming/DataStreamWriter.html
> > and
> > not the spark DataSource. To use the spark datasource, look at an example
> > here https://hudi.apache.org/writing_data.html#datasource-writer.
> >
> > DataStreamWriters are a different set of API's which IIUC don't work
> > interchangeably with DataSource.
> >
> > Thanks,
> > Nishith
> >
> > On Mon, Oct 28, 2019 at 3:24 PM Qian Wang <qw...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I tried to use Datasource Writer to read streaming data from Kafka
> topic
> > > and write to Hudi dataset on HDFS. I used following codes:
> > >
> > > val output = data
> > > .writeStream
> > > .trigger(Trigger.ProcessingTime("300 seconds"))
> > > .format("org.apache.hudi")
> > > .option("hoodie.table.name", "hudi_ro_table")
> > > .outputMode("append")
> > > .option("path", fileLocation)
> > > .option("checkpointLocation", s"${fileLocation}_chpk")
> > > .start()
> > > However, when I run this spark job it cannot write anything onto HDFS.
> Can
> > > anyone tell me how to do that? Thanks.
> > >
> > > Best,
> > > Eric
> > >
>

Re: Write Streaming data using Datasource Writer is not working

Posted by Qian Wang <qw...@gmail.com>.
Hi Nishith,

Thanks for reply.

I did use the Datasource Writer to write instead of using DataStreamWriter. I think Datasource Writer also can support write streaming data, correct?

Best,
Qian
On Oct 28, 2019, 9:31 PM -0700, nishith agarwal <n3...@gmail.com>, wrote:
> Qian,
>
> It seems like you are using the
> https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/streaming/DataStreamWriter.html
> and
> not the spark DataSource. To use the spark datasource, look at an example
> here https://hudi.apache.org/writing_data.html#datasource-writer.
>
> DataStreamWriters are a different set of API's which IIUC don't work
> interchangeably with DataSource.
>
> Thanks,
> Nishith
>
> On Mon, Oct 28, 2019 at 3:24 PM Qian Wang <qw...@gmail.com> wrote:
>
> > Hi All,
> >
> > I tried to use Datasource Writer to read streaming data from Kafka topic
> > and write to Hudi dataset on HDFS. I used following codes:
> >
> > val output = data
> > .writeStream
> > .trigger(Trigger.ProcessingTime("300 seconds"))
> > .format("org.apache.hudi")
> > .option("hoodie.table.name", "hudi_ro_table")
> > .outputMode("append")
> > .option("path", fileLocation)
> > .option("checkpointLocation", s"${fileLocation}_chpk")
> > .start()
> > However, when I run this spark job it cannot write anything onto HDFS. Can
> > anyone tell me how to do that? Thanks.
> >
> > Best,
> > Eric
> >

Re: Write Streaming data using Datasource Writer is not working

Posted by nishith agarwal <n3...@gmail.com>.
Qian,

It seems like you are using the
https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/streaming/DataStreamWriter.html
and
not the spark DataSource. To use the spark datasource, look at an example
here https://hudi.apache.org/writing_data.html#datasource-writer.

DataStreamWriters are a different set of API's which IIUC don't work
interchangeably with DataSource.

Thanks,
Nishith

On Mon, Oct 28, 2019 at 3:24 PM Qian Wang <qw...@gmail.com> wrote:

> Hi All,
>
> I tried to use Datasource Writer to read streaming data from Kafka topic
> and write to Hudi dataset on HDFS.  I used following codes:
>
> val output = data
>    .writeStream
>    .trigger(Trigger.ProcessingTime("300 seconds"))
>    .format("org.apache.hudi")
>    .option("hoodie.table.name", "hudi_ro_table")
>    .outputMode("append")
>    .option("path", fileLocation)
>    .option("checkpointLocation", s"${fileLocation}_chpk")
>    .start()
> However, when I run this spark job it cannot write anything onto HDFS. Can
> anyone tell me how to do that? Thanks.
>
> Best,
> Eric
>