You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Pratyaksh Sharma <pr...@gmail.com> on 2019/09/13 09:52:36 UTC

[BUG] Null Pointer Exception in SourceFormatAdapter

Hi,

I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7.
Here is the command I used for running DeltaStreamer -

spark-submit --files jaas.conf --conf
'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
--conf
'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
--master yarn --deploy-mode cluster --num-executors 2 --class
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
/path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
--source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
--source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table
--target-table cow_table --props hdfs://path/to/fg-kafka-source.properties
--transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer
--spark-master yarn-cluster --source-limit 5000

Basically I have not passed any SchemaProvider class in the command. When I
run the above command, I get the below exception in SourceFormatAdapter and
the job gets killed -

java.lang.NullPointerException
at
com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
at
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
at
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)

In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider
before registering Avro Schemas if the schemaProvider variable is null.
Hence I am trying to understand if the above exception is expected
behaviour.

Please help.

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi Vinoth,

I would like to take it up. Will be sending the PR soon. :)

On Mon, Sep 16, 2019 at 9:27 PM Vinoth Chandar <vi...@apache.org> wrote:

> Actually went ahead and created
> https://issues.apache.org/jira/browse/HUDI-253 .
> Question is just about the PR for this now ? :)
>
> On Mon, Sep 16, 2019 at 8:54 AM Vinoth Chandar <vi...@apache.org> wrote:
>
> > +1 DeltaStreamer can be much nicer in such cases.. Any interest in
> opening
> > a JIRA/PR for this?
> >
> > On Mon, Sep 16, 2019 at 2:02 AM vbalaji@apache.org <vb...@apache.org>
> > wrote:
> >
> >>  Yes, It makes sense to add validations with descriptive messages.
> Please
> >> open a ticket and send a PR for this.
> >> Thanks,Balaji.V    On Monday, September 16, 2019, 01:11:12 AM PDT,
> >> Pratyaksh Sharma <pr...@gmail.com> wrote:
> >>
> >>  Hi Balaji,
> >>
> >> I get your point. However I feel in such cases, instead of throwing a
> Null
> >> Pointer, we should handle the case gracefully. The exception should be
> >> thrown with proper user-facing message. Please let me know your thoughts
> >> on this.
> >>
> >> On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan
> >> <v....@ymail.com.invalid> wrote:
> >>
> >> >  Hi Pratyaksh,
> >> > This is expected. You need to pass a schema-provider since you are
> using
> >> > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema
> from
> >> Row
> >> > type information available from Spark Dataset.
> >> > Balaji.V
> >> >    On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma <
> >> > pratyaksh13@gmail.com> wrote:
> >> >
> >> >  Hi,
> >> >
> >> > I am trying to build a CDC pipeline using Hudi working on tag
> >> hoodie-0.4.7.
> >> > Here is the command I used for running DeltaStreamer -
> >> >
> >> > spark-submit --files jaas.conf --conf
> >> >
> >>
> 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> >> > --conf
> >> >
> >> >
> >>
> 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> >> > --master yarn --deploy-mode cluster --num-executors 2 --class
> >> > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> >> > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
> >> > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
> >> > --source-ordering-field xxxx --target-base-path
> hdfs://path/to/cow_table
> >> > --target-table cow_table --props
> >> hdfs://path/to/fg-kafka-source.properties
> >> > --transformer-class
> >> com.uber.hoodie.utilities.transform.DebeziumTransformer
> >> > --spark-master yarn-cluster --source-limit 5000
> >> >
> >> > Basically I have not passed any SchemaProvider class in the command.
> >> When I
> >> > run the above command, I get the below exception in
> SourceFormatAdapter
> >> and
> >> > the job gets killed -
> >> >
> >> > java.lang.NullPointerException
> >> > at
> >> >
> >> >
> >>
> com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
> >> > at
> >> >
> >> >
> >>
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
> >> > at
> >> >
> >> >
> >>
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)
> >> >
> >> > In HoodieDeltaStreamer class, we try to initiate
> RowBasedSchemaProvider
> >> > before registering Avro Schemas if the schemaProvider variable is
> null.
> >> > Hence I am trying to understand if the above exception is expected
> >> > behaviour.
> >> >
> >> > Please help.
> >> >
> >>
> >
> >
>

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

Posted by Vinoth Chandar <vi...@apache.org>.
Actually went ahead and created
https://issues.apache.org/jira/browse/HUDI-253 .
Question is just about the PR for this now ? :)

On Mon, Sep 16, 2019 at 8:54 AM Vinoth Chandar <vi...@apache.org> wrote:

> +1 DeltaStreamer can be much nicer in such cases.. Any interest in opening
> a JIRA/PR for this?
>
> On Mon, Sep 16, 2019 at 2:02 AM vbalaji@apache.org <vb...@apache.org>
> wrote:
>
>>  Yes, It makes sense to add validations with descriptive messages. Please
>> open a ticket and send a PR for this.
>> Thanks,Balaji.V    On Monday, September 16, 2019, 01:11:12 AM PDT,
>> Pratyaksh Sharma <pr...@gmail.com> wrote:
>>
>>  Hi Balaji,
>>
>> I get your point. However I feel in such cases, instead of throwing a Null
>> Pointer, we should handle the case gracefully. The exception should be
>> thrown with proper user-facing message. Please let me know your thoughts
>> on this.
>>
>> On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan
>> <v....@ymail.com.invalid> wrote:
>>
>> >  Hi Pratyaksh,
>> > This is expected. You need to pass a schema-provider since you are using
>> > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from
>> Row
>> > type information available from Spark Dataset.
>> > Balaji.V
>> >    On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma <
>> > pratyaksh13@gmail.com> wrote:
>> >
>> >  Hi,
>> >
>> > I am trying to build a CDC pipeline using Hudi working on tag
>> hoodie-0.4.7.
>> > Here is the command I used for running DeltaStreamer -
>> >
>> > spark-submit --files jaas.conf --conf
>> >
>> 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
>> > --conf
>> >
>> >
>> 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
>> > --master yarn --deploy-mode cluster --num-executors 2 --class
>> > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
>> > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
>> > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
>> > --source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table
>> > --target-table cow_table --props
>> hdfs://path/to/fg-kafka-source.properties
>> > --transformer-class
>> com.uber.hoodie.utilities.transform.DebeziumTransformer
>> > --spark-master yarn-cluster --source-limit 5000
>> >
>> > Basically I have not passed any SchemaProvider class in the command.
>> When I
>> > run the above command, I get the below exception in SourceFormatAdapter
>> and
>> > the job gets killed -
>> >
>> > java.lang.NullPointerException
>> > at
>> >
>> >
>> com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
>> > at
>> >
>> >
>> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
>> > at
>> >
>> >
>> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)
>> >
>> > In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider
>> > before registering Avro Schemas if the schemaProvider variable is null.
>> > Hence I am trying to understand if the above exception is expected
>> > behaviour.
>> >
>> > Please help.
>> >
>>
>
>

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

Posted by Vinoth Chandar <vi...@apache.org>.
+1 DeltaStreamer can be much nicer in such cases.. Any interest in opening
a JIRA/PR for this?

On Mon, Sep 16, 2019 at 2:02 AM vbalaji@apache.org <vb...@apache.org>
wrote:

>  Yes, It makes sense to add validations with descriptive messages. Please
> open a ticket and send a PR for this.
> Thanks,Balaji.V    On Monday, September 16, 2019, 01:11:12 AM PDT,
> Pratyaksh Sharma <pr...@gmail.com> wrote:
>
>  Hi Balaji,
>
> I get your point. However I feel in such cases, instead of throwing a Null
> Pointer, we should handle the case gracefully. The exception should be
> thrown with proper user-facing message. Please let me know your thoughts
> on this.
>
> On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan
> <v....@ymail.com.invalid> wrote:
>
> >  Hi Pratyaksh,
> > This is expected. You need to pass a schema-provider since you are using
> > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from
> Row
> > type information available from Spark Dataset.
> > Balaji.V
> >    On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma <
> > pratyaksh13@gmail.com> wrote:
> >
> >  Hi,
> >
> > I am trying to build a CDC pipeline using Hudi working on tag
> hoodie-0.4.7.
> > Here is the command I used for running DeltaStreamer -
> >
> > spark-submit --files jaas.conf --conf
> >
> 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> > --conf
> >
> >
> 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> > --master yarn --deploy-mode cluster --num-executors 2 --class
> > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
> > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
> > --source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table
> > --target-table cow_table --props
> hdfs://path/to/fg-kafka-source.properties
> > --transformer-class
> com.uber.hoodie.utilities.transform.DebeziumTransformer
> > --spark-master yarn-cluster --source-limit 5000
> >
> > Basically I have not passed any SchemaProvider class in the command.
> When I
> > run the above command, I get the below exception in SourceFormatAdapter
> and
> > the job gets killed -
> >
> > java.lang.NullPointerException
> > at
> >
> >
> com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
> > at
> >
> >
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
> > at
> >
> >
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)
> >
> > In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider
> > before registering Avro Schemas if the schemaProvider variable is null.
> > Hence I am trying to understand if the above exception is expected
> > behaviour.
> >
> > Please help.
> >
>

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

Posted by "vbalaji@apache.org" <vb...@apache.org>.
 Yes, It makes sense to add validations with descriptive messages. Please open a ticket and send a PR for this.
Thanks,Balaji.V    On Monday, September 16, 2019, 01:11:12 AM PDT, Pratyaksh Sharma <pr...@gmail.com> wrote:  
 
 Hi Balaji,

I get your point. However I feel in such cases, instead of throwing a Null
Pointer, we should handle the case gracefully. The exception should be
thrown with proper user-facing message. Please let me know your thoughts
on this.

On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan
<v....@ymail.com.invalid> wrote:

>  Hi Pratyaksh,
> This is expected. You need to pass a schema-provider since you are using
> Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from Row
> type information available from Spark Dataset.
> Balaji.V
>    On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma <
> pratyaksh13@gmail.com> wrote:
>
>  Hi,
>
> I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7.
> Here is the command I used for running DeltaStreamer -
>
> spark-submit --files jaas.conf --conf
> 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> --conf
>
> 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> --master yarn --deploy-mode cluster --num-executors 2 --class
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
> --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
> --source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table
> --target-table cow_table --props hdfs://path/to/fg-kafka-source.properties
> --transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer
> --spark-master yarn-cluster --source-limit 5000
>
> Basically I have not passed any SchemaProvider class in the command. When I
> run the above command, I get the below exception in SourceFormatAdapter and
> the job gets killed -
>
> java.lang.NullPointerException
> at
>
> com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
> at
>
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
> at
>
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)
>
> In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider
> before registering Avro Schemas if the schemaProvider variable is null.
> Hence I am trying to understand if the above exception is expected
> behaviour.
>
> Please help.
>
  

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi Balaji,

I get your point. However I feel in such cases, instead of throwing a Null
Pointer, we should handle the case gracefully. The exception should be
thrown with proper user-facing message. Please let me know your thoughts
on this.

On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan
<v....@ymail.com.invalid> wrote:

>  Hi Pratyaksh,
> This is expected. You need to pass a schema-provider since you are using
> Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from Row
> type information available from Spark Dataset.
> Balaji.V
>     On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma <
> pratyaksh13@gmail.com> wrote:
>
>  Hi,
>
> I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7.
> Here is the command I used for running DeltaStreamer -
>
> spark-submit --files jaas.conf --conf
> 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> --conf
>
> 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
> --master yarn --deploy-mode cluster --num-executors 2 --class
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
> /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
> --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
> --source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table
> --target-table cow_table --props hdfs://path/to/fg-kafka-source.properties
> --transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer
> --spark-master yarn-cluster --source-limit 5000
>
> Basically I have not passed any SchemaProvider class in the command. When I
> run the above command, I get the below exception in SourceFormatAdapter and
> the job gets killed -
>
> java.lang.NullPointerException
> at
>
> com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
> at
>
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
> at
>
> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)
>
> In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider
> before registering Avro Schemas if the schemaProvider variable is null.
> Hence I am trying to understand if the above exception is expected
> behaviour.
>
> Please help.
>

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

Posted by Balaji Varadarajan <v....@ymail.com.INVALID>.
 Hi Pratyaksh,
This is expected. You need to pass a schema-provider since you are using Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from Row type information available from Spark Dataset.
Balaji.V
    On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma <pr...@gmail.com> wrote:  
 
 Hi,

I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7.
Here is the command I used for running DeltaStreamer -

spark-submit --files jaas.conf --conf
'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
--conf
'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
--master yarn --deploy-mode cluster --num-executors 2 --class
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer
/path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE
--source-class com.uber.hoodie.utilities.sources.AvroKafkaSource
--source-ordering-field xxxx --target-base-path hdfs://path/to/cow_table
--target-table cow_table --props hdfs://path/to/fg-kafka-source.properties
--transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer
--spark-master yarn-cluster --source-limit 5000

Basically I have not passed any SchemaProvider class in the command. When I
run the above command, I get the below exception in SourceFormatAdapter and
the job gets killed -

java.lang.NullPointerException
at
com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94)
at
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224)
at
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504)

In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider
before registering Avro Schemas if the schemaProvider variable is null.
Hence I am trying to understand if the above exception is expected
behaviour.

Please help.