You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Vivek Bhide <vi...@target.com> on 2018/03/22 20:14:00 UTC

TimeBasedDedupOperator failing with NullPointer

Hi All,

While using the TimeBasedDedupOperator for deduping, I see that operator
keeps failing with below NullPointer exception. I don't know if it has
anything to do with configuration.

I also see that operator is always high on CPU usages. Almost reaching 100%.
I tried increasing vcores and also memory for operator but its of no use

2018-03-22 15:10:10,037 INFO  stram.FSRecoveryHandler
(FSRecoveryHandler.java:rotateLog(103)) - Creating
hdfs://littleredns/user/SVDATHDP/datatorrent/apps/application_1519410901484_187748/recovery/log
2018-03-22 15:10:10,056 INFO  stram.StreamingContainerParent
(StreamingContainerParent.java:log(170)) - child msg: Stopped running due to
an exception. java.lang.NullPointerException
	at org.apache.hadoop.io.file.tfile.TFile$Writer.append(TFile.java:387)
	at com.datatorrent.lib.fileaccess.TFileWriter.append(TFileWriter.java:66)
	at
org.apache.apex.malhar.lib.state.managed.BucketsFileSystem.writeBucketData(BucketsFileSystem.java:179)
	at
org.apache.apex.malhar.lib.state.managed.IncrementalCheckpointManager.transferWindowFiles(IncrementalCheckpointManager.java:139)
	at
org.apache.apex.malhar.lib.state.managed.IncrementalCheckpointManager$1.run(IncrementalCheckpointManager.java:110)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
 context:
PTContainer[id=3(container_e3125_1519410901484_187748_01_000012),state=ACTIVE,operators=[PTOperator[id=3,name=dedupeOperator,state=ACTIVE]]]
2018-03-22 15:10:10,915 WARN  stram.StreamingContainerManager
(StreamingContainerManager.java:processOperatorFailure(1439)) - Operator
failure: PTOperator[id=3,name=dedupeOperator,state=INACTIVE] count: 1

Regards
Vivek





--
Sent from: http://apache-apex-users-list.78494.x6.nabble.com/

Re: TimeBasedDedupOperator failing with NullPointer

Posted by Thomas Weise <th...@apache.org>.
Hi Vivek,

Did you end up fixing the issue? If so, would you like to contribute the
fix back to Apex?

Thanks,
Thomas


On Fri, Mar 23, 2018 at 11:05 AM, Vivek Bhide <vi...@target.com>
wrote:

> Hi Vlad,
>
> I have filed a JIRA https://issues.apache.org/jira/browse/APEXMALHAR-2557
>
> but I am not sure about all the attributes like component or epic or Label.
> Please update the ticket with correct values or let me know and I will add
> that
>
> Regards
> Vivek
>
>
>
> --
> Sent from: http://apache-apex-users-list.78494.x6.nabble.com/
>

Re: TimeBasedDedupOperator failing with NullPointer

Posted by Vivek Bhide <vi...@target.com>.
Hi Vlad,

I have filed a JIRA https://issues.apache.org/jira/browse/APEXMALHAR-2557

but I am not sure about all the attributes like component or epic or Label.
Please update the ticket with correct values or let me know and I will add
that

Regards
Vivek



--
Sent from: http://apache-apex-users-list.78494.x6.nabble.com/

Re: TimeBasedDedupOperator failing with NullPointer

Posted by Vlad Rozov <vr...@apache.org>.
IMO, it is a regression introduced in 3.8.0, not a 
configuration/infrastructure or a NULL key issue. File a JIRA.

Thank you,

Vlad

On 3/23/18 10:07, Vivek Bhide wrote:
> Hi Vlad,
>
> Yes, its 3.8.0
>
> Regards
> Vivek
>
>
>
> --
> Sent from: http://apache-apex-users-list.78494.x6.nabble.com/


Re: TimeBasedDedupOperator failing with NullPointer

Posted by Vivek Bhide <vi...@target.com>.
Hi Vlad,

Yes, its 3.8.0

Regards
Vivek



--
Sent from: http://apache-apex-users-list.78494.x6.nabble.com/

Re: TimeBasedDedupOperator failing with NullPointer

Posted by Vlad Rozov <v....@gmail.com>.
Most likely the NPE is caused by a bug (regression introduced by 
APEXMALHAR-2492 Correct usage of empty Slice in Malhar Library). What is 
operator library version (3.8.0)?

Thank you,

Vlad

On 3/23/18 05:48, Bhupesh Chawda wrote:
> Hi Vivek,
>
> Can you send across few more details, like the configuration of the 
> operator and some sample data?
> Also do confirm if the key could have been null?
>
> ~ Bhupesh
>
> _______________________________________________________
>
> Bhupesh Chawda
>
> E: bhupesh@datatorrent.com <ma...@datatorrent.com>| Twitter: 
> @bhupeshsc
>
> www.datatorrent.com <http://www.datatorrent.com>  | apex.apache.org 
> <http://apex.apache.org>
>
>
> On Fri, Mar 23, 2018 at 1:44 AM, Vivek Bhide <vivek.bhide@target.com 
> <ma...@target.com>> wrote:
>
>     Hi All,
>
>     While using the TimeBasedDedupOperator for deduping, I see that
>     operator
>     keeps failing with below NullPointer exception. I don't know if it has
>     anything to do with configuration.
>
>     I also see that operator is always high on CPU usages. Almost
>     reaching 100%.
>     I tried increasing vcores and also memory for operator but its of
>     no use
>
>     2018-03-22 15:10:10,037 INFO  stram.FSRecoveryHandler
>     (FSRecoveryHandler.java:rotateLog(103)) - Creating
>     hdfs://littleredns/user/SVDATHDP/datatorrent/apps/application_1519410901484_187748/recovery/log
>     2018-03-22 15:10:10,056 INFO  stram.StreamingContainerParent
>     (StreamingContainerParent.java:log(170)) - child msg: Stopped
>     running due to
>     an exception. java.lang.NullPointerException
>             at
>     org.apache.hadoop.io.file.tfile.TFile$Writer.append(TFile.java:387)
>             at
>     com.datatorrent.lib.fileaccess.TFileWriter.append(TFileWriter.java:66)
>             at
>     org.apache.apex.malhar.lib.state.managed.BucketsFileSystem.writeBucketData(BucketsFileSystem.java:179)
>             at
>     org.apache.apex.malhar.lib.state.managed.IncrementalCheckpointManager.transferWindowFiles(IncrementalCheckpointManager.java:139)
>             at
>     org.apache.apex.malhar.lib.state.managed.IncrementalCheckpointManager$1.run(IncrementalCheckpointManager.java:110)
>             at
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>             at
>     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>             at java.lang.Thread.run(Thread.java:745)
>      context:
>     PTContainer[id=3(container_e3125_1519410901484_187748_01_000012),state=ACTIVE,operators=[PTOperator[id=3,name=dedupeOperator,state=ACTIVE]]]
>     2018-03-22 15:10:10,915 WARN  stram.StreamingContainerManager
>     (StreamingContainerManager.java:processOperatorFailure(1439)) -
>     Operator
>     failure: PTOperator[id=3,name=dedupeOperator,state=INACTIVE] count: 1
>
>     Regards
>     Vivek
>
>
>
>
>
>     --
>     Sent from: http://apache-apex-users-list.78494.x6.nabble.com/
>     <http://apache-apex-users-list.78494.x6.nabble.com/>
>
>


Re: TimeBasedDedupOperator failing with NullPointer

Posted by Vivek Bhide <vi...@target.com>.
Hi Bhupesh,

Please find the details below

Dedup operator configuration :
<property>
	
<name>dt.application.KafkaDedupOutputOperatorTest.operator.dedupeOperator.prop.keyExpression
		</name>
		<value>{$.id} + {$.name}</value>
	</property>
	<property>
	
<name>dt.application.KafkaDedupOutputOperatorTest.operator.dedupeOperator.prop.timeExpression
		</name>
		<value>eventTime.getTime()</value>
	</property>
	<property>
	
<name>dt.application.KafkaDedupOutputOperatorTest.operator.dedupeOperator.prop.bucketSpan
		</name>
		<value>87</value>
	</property>
	<property>
	
<name>dt.application.KafkaDedupOutputOperatorTest.operator.dedupeOperator.prop.expireBefore
		</name>
		<value>7500</value>
	</property>
	<property>
	
<name>dt.application.KafkaDedupOutputOperatorTest.operator.dedupeOperator.port.input.attr.TUPLE_CLASS
		</name>
		<value>com.tgt.outputdeduptest.kafkaoutputdedup.object.InputPojo</value>
	</property>
	<property>
	
<name>dt.application.KafkaDedupOutputOperatorTest.operator.dedupeOperator.attr.VCORES</name>
		<value>4</value>
	</property>

Sample Input :

{"id":0,"name":"BU7UK7QXP1ZCEJYNNO5D","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":1,"name":"YMPT9HZTLW11G2DUEZ0H","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":2,"name":"ERYVJYK92JPEM9BOPJXN","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":3,"name":"IBXHQXWH32J1IRVPGYKD","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":4,"name":"JJZMUAF15TPSFCDP4LLS","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":5,"name":"O24TIRQDJNHATCB4F3M9","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":6,"name":"UVBBY64LJJRFDWK0B8TB","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":7,"name":"09FSAVBJB9EX0PNBUPPG","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":8,"name":"TLLCY18P0HVJZUYVPP0E","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":9,"name":"RF3WNOHE48JQRZBCZNWL","eventTime":"Mar 22, 2018 1:18:07 PM"}
{"id":10,"name":"KLBCMSC783SXC2MYFYF5","eventTime":"Mar 22, 2018 1:18:07
PM"}

I am sure that there is no instance when the input tuple is NULL. I have
also attached the InputPojo java file for reference.
InputPojo.java
<http://apache-apex-users-list.78494.x6.nabble.com/file/t127/InputPojo.java>  

Regards
Vivek



--
Sent from: http://apache-apex-users-list.78494.x6.nabble.com/

Re: TimeBasedDedupOperator failing with NullPointer

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi Vivek,

Can you send across few more details, like the configuration of the
operator and some sample data?
Also do confirm if the key could have been null?

~ Bhupesh


_______________________________________________________

Bhupesh Chawda

E: bhupesh@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Fri, Mar 23, 2018 at 1:44 AM, Vivek Bhide <vi...@target.com> wrote:

> Hi All,
>
> While using the TimeBasedDedupOperator for deduping, I see that operator
> keeps failing with below NullPointer exception. I don't know if it has
> anything to do with configuration.
>
> I also see that operator is always high on CPU usages. Almost reaching
> 100%.
> I tried increasing vcores and also memory for operator but its of no use
>
> 2018-03-22 15:10:10,037 INFO  stram.FSRecoveryHandler
> (FSRecoveryHandler.java:rotateLog(103)) - Creating
> hdfs://littleredns/user/SVDATHDP/datatorrent/apps/
> application_1519410901484_187748/recovery/log
> 2018-03-22 15:10:10,056 INFO  stram.StreamingContainerParent
> (StreamingContainerParent.java:log(170)) - child msg: Stopped running due
> to
> an exception. java.lang.NullPointerException
>         at org.apache.hadoop.io.file.tfile.TFile$Writer.append(
> TFile.java:387)
>         at com.datatorrent.lib.fileaccess.TFileWriter.append(
> TFileWriter.java:66)
>         at
> org.apache.apex.malhar.lib.state.managed.BucketsFileSystem.
> writeBucketData(BucketsFileSystem.java:179)
>         at
> org.apache.apex.malhar.lib.state.managed.IncrementalCheckpointManager.
> transferWindowFiles(IncrementalCheckpointManager.java:139)
>         at
> org.apache.apex.malhar.lib.state.managed.IncrementalCheckpointManager$
> 1.run(IncrementalCheckpointManager.java:110)
>         at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>  context:
> PTContainer[id=3(container_e3125_1519410901484_187748_01_
> 000012),state=ACTIVE,operators=[PTOperator[id=3,name=dedupeOperator,state=
> ACTIVE]]]
> 2018-03-22 15:10:10,915 WARN  stram.StreamingContainerManager
> (StreamingContainerManager.java:processOperatorFailure(1439)) - Operator
> failure: PTOperator[id=3,name=dedupeOperator,state=INACTIVE] count: 1
>
> Regards
> Vivek
>
>
>
>
>
> --
> Sent from: http://apache-apex-users-list.78494.x6.nabble.com/
>