You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Jim Idle <ji...@proofpoint.com> on 2018/03/01 05:31:52 UTC

Malware RTF is not detected as RTF

I can open a ticket for this but wanted to just run it by you first.

As explained here: http://www.decalage.info/rtf_tricks  (no need to read if you  don’t care 😉

Malicious  RTF files take advantage of the fact that Microsoft do not follow their own RTF spec. Specifically, Word et al only looks for the opening sequence:

{rt

Thought the spec says it should be:

{rtf1

Where 1 is the version number.

Tika fails to identify a malware file starting:

{\rtf1{\pict\jpegblip\picw24\pich24\bin49922

As an RTF file – it says that it is application/octet-stream

Could the Tika detector be modified to just look for {rt as per Office tools?

Cheers,

Jim

RE: Malware RTF is not detected as RTF

Posted by Jim Idle <ji...@proofpoint.com>.
Will do - of course the implementation is down to you guys to do what you think is most sensible without breaking others.

The current detector just looks for {\rtf

If it just made the f optional or did not look for it, then I am pretty certain that it would break nothing, but I would be happy with an artificial mime-type too.

I have worked around it for now, so I can wait for the next release cycle. 


I will add an rtf that does not contain malware, for sure. In fact all you need do is use vi to delete the f1 part of any normal rtf magic and you have your test. I will attach it though 😊

Cheers,

Jim 

> -----Original Message-----
> From: Nick Burch [mailto:apache@gagravarr.org]
> Sent: Thursday, March 1, 2018 21:14
> To: user@tika.apache.org
> Subject: Re: Malware RTF is not detected as RTF
> 
> On Thu, 1 Mar 2018, Jim Idle wrote:
> > Malicious RTF files take advantage of the fact that Microsoft do not
> > follow their own RTF spec. Specifically, Word et al only looks for the
> > opening sequence:
> >
> > {rt
> >
> > Thought the spec says it should be:
> >
> > {rtf1
> 
> I don't think that Tika can assume that all RTF users are as broken as Word is!
> 
> I'd be tempted to define a new mimetype of application/x-broken-rtf or
> similar, and feed that a lower priority magic for {\rt, with a suitable
> comment/explanation. That way, we won't tell people something is an RTF
> which isn't, but we can help them spot these problematic files
> 
> If you could create a small, broken but non-malicious rtf file, then raise an
> enhancement jira + attach, that'd be great!
> 
> Nick

Re: Malware RTF is not detected as RTF

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 1 Mar 2018, Jim Idle wrote:
> Malicious RTF files take advantage of the fact that Microsoft do not 
> follow their own RTF spec. Specifically, Word et al only looks for the 
> opening sequence:
>
> {rt
>
> Thought the spec says it should be:
>
> {rtf1

I don't think that Tika can assume that all RTF users are as broken as 
Word is!

I'd be tempted to define a new mimetype of application/x-broken-rtf or 
similar, and feed that a lower priority magic for {\rt, with a suitable 
comment/explanation. That way, we won't tell people something is an RTF 
which isn't, but we can help them spot these problematic files

If you could create a small, broken but non-malicious rtf file, then raise 
an enhancement jira + attach, that'd be great!

Nick

RE: Malware RTF is not detected as RTF

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Yes.  Please do open a ticket, and y, I have a need to read anything from decalage…he does some amazing work. 😊

I trust you wouldn’t, but please don’t post an actual malware file for us to use in our unit tests. 😉

From: Jim Idle [mailto:jidle@proofpoint.com]
Sent: Thursday, March 1, 2018 12:32 AM
To: user@tika.apache.org
Subject: Malware RTF is not detected as RTF

I can open a ticket for this but wanted to just run it by you first.

As explained here: http://www.decalage.info/rtf_tricks  (no need to read if you  don’t care 😉

Malicious  RTF files take advantage of the fact that Microsoft do not follow their own RTF spec. Specifically, Word et al only looks for the opening sequence:

{rt

Thought the spec says it should be:

{rtf1

Where 1 is the version number.

Tika fails to identify a malware file starting:

{\rtf1{\pict\jpegblip\picw24\pich24\bin49922

As an RTF file – it says that it is application/octet-stream

Could the Tika detector be modified to just look for {rt as per Office tools?

Cheers,

Jim