You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Mate Juhasz (JIRA)" <ji...@apache.org> on 2018/08/06 09:43:00 UTC
[jira] [Comment Edited] (OOZIE-3304) Parsing sharelib timestamps is
not threadsafe
[ https://issues.apache.org/jira/browse/OOZIE-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569949#comment-16569949 ]
Mate Juhasz edited comment on OOZIE-3304 at 8/6/18 9:42 AM:
------------------------------------------------------------
I did some testing and I suspect that not killing the threads caused the timeout, so I uploaded a new patch addressing the issue and used ExecutorService to manage the running threads instead.
The NUMBER_OF_FILESTATUSES value have been reduced to 100.000, its still enough to get NumberFormatException, but significantly faster than with a million :)
Can we kick off a new build to check it? Thanks!
cc [~zsombor], [~dionusos], [~gezapeti], [~andras.piros]
was (Author: matijhs):
I did some testing and I suspect that not killing the threads caused the timeout, so I uploaded a new patch addressing the issue and used ExecutorService to manage the running threads instead.
Can we kick off a new build to check it? Thanks!
cc [~zsombor], [~dionusos], [~gezapeti], [~andras.piros]
> Parsing sharelib timestamps is not threadsafe
> ---------------------------------------------
>
> Key: OOZIE-3304
> URL: https://issues.apache.org/jira/browse/OOZIE-3304
> Project: Oozie
> Issue Type: Bug
> Components: core
> Affects Versions: 5.0.0b1, 4.3.1
> Reporter: Denes Bodo
> Assignee: Denes Bodo
> Priority: Critical
> Labels: usability
> Attachments: OOZIE-3304-001.patch, OOZIE-3304-002.patch, OOZIE-3304-003.patch
>
>
> In rare cases the following Exception can be read in log files when an action fails:
> {code:java}
> org.apache.oozie.action.ActionExecutorException: NumberFormatException: multiple points
> at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1271)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1472)
> at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:234)
> at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:65)
> at org.apache.oozie.command.XCommand.call(XCommand.java:287)
> at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
> at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NumberFormatException: multiple points
> at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1890)
> at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
> at java.lang.Double.parseDouble(Double.java:538)
> at java.text.DigitList.getDouble(DigitList.java:169)
> at java.text.DecimalFormat.parse(DecimalFormat.java:2056)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2160)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)
> at java.text.DateFormat.parse(DateFormat.java:364)
> at org.apache.oozie.service.ShareLibService.getLatestLibPath(ShareLibService.java:727)
> at org.apache.oozie.service.ShareLibService.getShareLibRootPath(ShareLibService.java:595)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.getSharelibRoot(JavaActionExecutor.java:1297)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.initShareLibExcluder(JavaActionExecutor.java:858)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.setLibFilesArchives(JavaActionExecutor.java:866)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1156)
> ... 11 more
> {code}
> or
> {code:java}
> 2018-07-12 04:48:52,649 WARN ForkedActionStartXCommand:523 - SERVER[ctr-e138-1518143905142-410551-01-000003.hwx.site] USER[user] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000023-180712043119670-oozie-oozi-W] ACTION[0000023-180712043119670-oozie-oozi-W@streaming-node] Error starting action [streaming-node]. ErrorType [ERROR], ErrorCode [NumberFormatException], Message [NumberFormatException: For input string: ""]
> org.apache.oozie.action.ActionExecutorException: NumberFormatException: For input string: ""
> at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1271)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1472)
> at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:234)
> at org.apache.oozie.command.wf.ForkedActionStartXCommand.execute(ForkedActionStartXCommand.java:41)
> at org.apache.oozie.command.wf.ForkedActionStartXCommand.execute(ForkedActionStartXCommand.java:30)
> at org.apache.oozie.command.XCommand.call(XCommand.java:287)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NumberFormatException: For input string: ""
> at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:601)
> at java.lang.Long.parseLong(Long.java:631)
> at java.text.DigitList.getLong(DigitList.java:195)
> at java.text.DecimalFormat.parse(DecimalFormat.java:2051)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2160)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)
> at java.text.DateFormat.parse(DateFormat.java:364)
> at org.apache.oozie.service.ShareLibService.getLatestLibPath(ShareLibService.java:727)
> at org.apache.oozie.service.ShareLibService.getShareLibRootPath(ShareLibService.java:595)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.getSharelibRoot(JavaActionExecutor.java:1297)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.initShareLibExcluder(JavaActionExecutor.java:858)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.setLibFilesArchives(JavaActionExecutor.java:866)
> at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1156)
> ... 10 more
> {code}
> Doing a little research found [https://docs.oracle.com/javase/7/docs/api/java/text/DateFormat.html] where the Synchronization block describes we shall use different DateFormat instances when parsing timestamps in SharelibService class.
> I'll try reproducing the issue in UT, for example having thousands of sharelibs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)