You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2014/03/25 14:08:14 UTC

[jira] [Updated] (OAK-1605) Running into endless loop due to tika 1.4

     [ https://issues.apache.org/jira/browse/OAK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Egli updated OAK-1605:
-----------------------------

    Attachment: OAK1605mp3Lookalike.bin

Further narrowed the problem down to the following:
 * when lucene index stumbles across a binary property (jcr:content/jcr:data) which looks like the attached file (eg 4 bytes: FF'FF'C3'A9), it interprets it as audio/mpeg
 * when it parses the binary property with the corresponding parser - which is Mp3Parser - it ends up using MpegStream, does a skipFrame, and in there runs into the endless loop already reported in TIKA-991

In short: it looks like certain mp3-like files can cause tika to loop endlessly.

And a fix for this is to switch to tika 1.5.

To reproduce: upload attached OAK1605mp3Lookalike.bin into the repository and watch the CPU go 100% or more forever

> Running into endless loop due to tika 1.4
> -----------------------------------------
>
>                 Key: OAK-1605
>                 URL: https://issues.apache.org/jira/browse/OAK-1605
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: oak-lucene
>    Affects Versions: 0.19
>            Reporter: Stefan Egli
>            Priority: Critical
>         Attachments: OAK1605mp3Lookalike.bin
>
>
> Narrowed down an endless loop [1] which happened in oak 0.19 to be related to TIKA-991: 
>  * tika's mp3.MpegStream.skipStream calls InputStream.skip() until skipped far enough or that method returns -1
>  * In case that InputStream is a TailStream, there's a bug in tika 1.4 where TailStream.skip(long) does not return -1 even though the end of stream was reached
> Switching to tika 1.5 should solve the issue as TIKA-991 in [0] mentions the exact same endless loop and the tika-991_3.patch fixed the -1 problem.
> I'll check if I can create a test to reproduce with reasonable effort..
> --
> [0] https://issues.apache.org/jira/browse/TIKA-991?focusedCommentId=13579487&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579487
> [1] {code}"pool-8-thread-5" prio=5 tid=7f80a34ea800 nid=0x119cb8000 runnable [119cb6000]
>    java.lang.Thread.State: RUNNABLE
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> - locked <7768956a0> (a java.io.BufferedInputStream)
> at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> at org.apache.tika.io.TailStream.read(TailStream.java:117)
> at org.apache.tika.io.TailStream.skip(TailStream.java:140)
> at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) <- endless loop in here
> at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
> at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
> at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.parseStringValue(LuceneIndexEditor.java:254)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.addBinaryValue(LuceneIndexEditor.java:245)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.makeDocument(LuceneIndexEditor.java:200)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.addOrUpdate(LuceneIndexEditor.java:178)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:108)
> at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:64)
> at org.apache.jackrabbit.oak.spi.commit.CompositeEditor.leave(CompositeEditor.java:74)
> at org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeAdded(EditorDiff.java:130)
> at org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160)
> at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:385)
> at org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeAdded(EditorDiff.java:125)
> at org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:440)
> at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530)
> at org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148)
> at org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:430)
> at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530)
> at org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148)
> at org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:430)
> at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530)
> at org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148)
> at org.apache.jackrabbit.oak.plugins.segment.MapRecord.compare(MapRecord.java:430)
> at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:530)
> at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:52)
> at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:143)
> - locked <76c63aae0> (a org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate)
> at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:105)
> at org.quartz.core.JobRunShell.run(JobRunShell.java:207)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:695){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)