You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Christopher Schultz <ch...@christopherschultz.net> on 2021/11/19 18:04:44 UTC

Checkstyle and CVE-2021-42574

All,

I've been (briefly) looking into using CheckStyle to try to detect use 
of Unicode directional code points in source code to avoid things like this:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574

I don't believe the existing Unicode tool (specifically 
AvoidEscapedUnicodeCharacters) can check for this sort of thing. It 
seems more geared toward code /style/ than anything else, such as 
banning certain byte sequences in files.

But it does look like the Regexp* tool(s) may be able to do it.

WDYT?

   <!-- Look for Unicode directionality overrides -->
   <module name="RegexpSingleline">
     <property name="format" value="[&#x2066;-&#x2069;&#x202a;-&#x202e;]" />
   </module>

I have to fine an example of a file with this type of malicious content 
to see if this rule will catch it. In general, are there any objections 
to adding this to the checkstyle configuration?

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Checkstyle and CVE-2021-42574

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Mark,

On 11/20/21 04:39, Mark Thomas wrote:
> On 19/11/2021 22:58, Christopher Schultz wrote:
>> Mark,
>>
>> On 11/19/21 13:46, Mark Thomas wrote:
>>> On 19/11/2021 18:12, Christopher Schultz wrote:
>>>> All,
>>>>
>>>> On 11/19/21 13:04, Christopher Schultz wrote:
>>>>> All,
>>>>>
>>>>> I've been (briefly) looking into using CheckStyle to try to detect 
>>>>> use of Unicode directional code points in source code to avoid 
>>>>> things like this:
>>>>>
>>>>> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574
>>>>>
>>>>> I don't believe the existing Unicode tool (specifically 
>>>>> AvoidEscapedUnicodeCharacters) can check for this sort of thing. It 
>>>>> seems more geared toward code /style/ than anything else, such as 
>>>>> banning certain byte sequences in files.
>>>>>
>>>>> But it does look like the Regexp* tool(s) may be able to do it.
>>>>>
>>>>> WDYT?
>>>>>
>>>>>    <!-- Look for Unicode directionality overrides -->
>>>>>    <module name="RegexpSingleline">
>>>>>      <property name="format" 
>>>>> value="[&#x2066;-&#x2069;&#x202a;-&#x202e;]" />
>>>>>    </module>
>>>>>
>>>>> I have to fine an example of a file with this type of malicious 
>>>>> content to see if this rule will catch it. In general, are there 
>>>>> any objections to adding this to the checkstyle configuration?
>>>>
>>>> I have a sample of this, now, and I have a grep command which 
>>>> identifies the file properly, but my config in checkstyle above 
>>>> results in the following error during "checkstyle":
>>>>
>>>> BUILD FAILED
>>>> .../build.xml:824: Unable to process files: [
>>>> -- a list of what looks like every file in the whole project --
>>>> ]
>>>>
>>>> There is no indication of what the problem might be. :/
>>>
>>> Do we need this?
>>>
>>> Is the compiler setting that all source code is ISO-8859-1 not 
>>> sufficient to protect us from this?
>>
>> Good question. It depends upon what encoding your code-editor expects. 
>> If the source code is "in ISO-8859-1" but the code-editor uses UTF-8 
>> for whatever reason (including the fact that these types of characters 
>> might be lurking in there, triggering the use of UTF-8), the 
>> programmer can be tricked.
> 
> That at least limits the attack to a committer's IDE which I think makes 
> it much harder. I hope a committer would spot the suspicious yet 
> commented out code before it got to the point where it could execute.

Take a look at some of the practical examples of this type of attack. 
You can be looking at an innocent line of code but the compiler sees 
something very different.

>> The same can be true of pull-requests, though GitHub has recently put 
>> in notifications for files which contain these types of characters. I 
>> don't know how it works with pull requests -- like whether or not the 
>> diffs will be flagged, or if the emails generated to tell you about 
>> the diffs have similar protections in them.
> 
> Lovely. Another way to attack CI systems. Given the majority of the risk 
> here is to GitHub and other CI providers, I'd expect them to put in 
> appropriate mitigations - and it looks like that is what they are doing.

I'm mostly worried that a mundane patch could be made to some Tomcat 
source -- even something that looks like "no functional change; just 
added a bunch of // comments to source lines to clarify behavior" -- 
that gets easily accepted but it actually hides a change in behavior, 
most likely malicious.

>>> Or are you proposing this as a defence in depth option?
>>
>> More so defense-in-depth. With checkstyle (or a similar tool) looking 
>> for this foolishness, we can guarantee that we don't miss something 
>> important if it happens to make it into a source file.
> 
> I think that makes sense - assuming that a) we can get CheckStyle to 
> report the problematic file and b) the added overhead isn't excessive.

Yeah, that's what I'm struggling with, now: Checkstyle seems to 
fall-over and I have no idea why.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Checkstyle and CVE-2021-42574

Posted by Mark Thomas <ma...@apache.org>.
On 19/11/2021 22:58, Christopher Schultz wrote:
> Mark,
> 
> On 11/19/21 13:46, Mark Thomas wrote:
>> On 19/11/2021 18:12, Christopher Schultz wrote:
>>> All,
>>>
>>> On 11/19/21 13:04, Christopher Schultz wrote:
>>>> All,
>>>>
>>>> I've been (briefly) looking into using CheckStyle to try to detect 
>>>> use of Unicode directional code points in source code to avoid 
>>>> things like this:
>>>>
>>>> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574
>>>>
>>>> I don't believe the existing Unicode tool (specifically 
>>>> AvoidEscapedUnicodeCharacters) can check for this sort of thing. It 
>>>> seems more geared toward code /style/ than anything else, such as 
>>>> banning certain byte sequences in files.
>>>>
>>>> But it does look like the Regexp* tool(s) may be able to do it.
>>>>
>>>> WDYT?
>>>>
>>>>    <!-- Look for Unicode directionality overrides -->
>>>>    <module name="RegexpSingleline">
>>>>      <property name="format" 
>>>> value="[&#x2066;-&#x2069;&#x202a;-&#x202e;]" />
>>>>    </module>
>>>>
>>>> I have to fine an example of a file with this type of malicious 
>>>> content to see if this rule will catch it. In general, are there any 
>>>> objections to adding this to the checkstyle configuration?
>>>
>>> I have a sample of this, now, and I have a grep command which 
>>> identifies the file properly, but my config in checkstyle above 
>>> results in the following error during "checkstyle":
>>>
>>> BUILD FAILED
>>> .../build.xml:824: Unable to process files: [
>>> -- a list of what looks like every file in the whole project --
>>> ]
>>>
>>> There is no indication of what the problem might be. :/
>>
>> Do we need this?
>>
>> Is the compiler setting that all source code is ISO-8859-1 not 
>> sufficient to protect us from this?
> 
> Good question. It depends upon what encoding your code-editor expects. 
> If the source code is "in ISO-8859-1" but the code-editor uses UTF-8 for 
> whatever reason (including the fact that these types of characters might 
> be lurking in there, triggering the use of UTF-8), the programmer can be 
> tricked.

That at least limits the attack to a committer's IDE which I think makes 
it much harder. I hope a committer would spot the suspicious yet 
commented out code before it got to the point where it could execute.

> The same can be true of pull-requests, though GitHub has recently put in 
> notifications for files which contain these types of characters. I don't 
> know how it works with pull requests -- like whether or not the diffs 
> will be flagged, or if the emails generated to tell you about the diffs 
> have similar protections in them.

Lovely. Another way to attack CI systems. Given the majority of the risk 
here is to GitHub and other CI providers, I'd expect them to put in 
appropriate mitigations - and it looks like that is what they are doing.

>> Or are you proposing this as a defence in depth option?
> 
> More so defense-in-depth. With checkstyle (or a similar tool) looking 
> for this foolishness, we can guarantee that we don't miss something 
> important if it happens to make it into a source file.

I think that makes sense - assuming that a) we can get CheckStyle to 
report the problematic file and b) the added overhead isn't excessive.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Checkstyle and CVE-2021-42574

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Mark,

On 11/19/21 13:46, Mark Thomas wrote:
> On 19/11/2021 18:12, Christopher Schultz wrote:
>> All,
>>
>> On 11/19/21 13:04, Christopher Schultz wrote:
>>> All,
>>>
>>> I've been (briefly) looking into using CheckStyle to try to detect 
>>> use of Unicode directional code points in source code to avoid things 
>>> like this:
>>>
>>> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574
>>>
>>> I don't believe the existing Unicode tool (specifically 
>>> AvoidEscapedUnicodeCharacters) can check for this sort of thing. It 
>>> seems more geared toward code /style/ than anything else, such as 
>>> banning certain byte sequences in files.
>>>
>>> But it does look like the Regexp* tool(s) may be able to do it.
>>>
>>> WDYT?
>>>
>>>    <!-- Look for Unicode directionality overrides -->
>>>    <module name="RegexpSingleline">
>>>      <property name="format" 
>>> value="[&#x2066;-&#x2069;&#x202a;-&#x202e;]" />
>>>    </module>
>>>
>>> I have to fine an example of a file with this type of malicious 
>>> content to see if this rule will catch it. In general, are there any 
>>> objections to adding this to the checkstyle configuration?
>>
>> I have a sample of this, now, and I have a grep command which 
>> identifies the file properly, but my config in checkstyle above 
>> results in the following error during "checkstyle":
>>
>> BUILD FAILED
>> .../build.xml:824: Unable to process files: [
>> -- a list of what looks like every file in the whole project --
>> ]
>>
>> There is no indication of what the problem might be. :/
> 
> Do we need this?
> 
> Is the compiler setting that all source code is ISO-8859-1 not 
> sufficient to protect us from this?

Good question. It depends upon what encoding your code-editor expects. 
If the source code is "in ISO-8859-1" but the code-editor uses UTF-8 for 
whatever reason (including the fact that these types of characters might 
be lurking in there, triggering the use of UTF-8), the programmer can be 
tricked.

The same can be true of pull-requests, though GitHub has recently put in 
notifications for files which contain these types of characters. I don't 
know how it works with pull requests -- like whether or not the diffs 
will be flagged, or if the emails generated to tell you about the diffs 
have similar protections in them.

> Or are you proposing this as a defence in depth option?

More so defense-in-depth. With checkstyle (or a similar tool) looking 
for this foolishness, we can guarantee that we don't miss something 
important if it happens to make it into a source file.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Checkstyle and CVE-2021-42574

Posted by Mark Thomas <ma...@apache.org>.
On 19/11/2021 18:12, Christopher Schultz wrote:
> All,
> 
> On 11/19/21 13:04, Christopher Schultz wrote:
>> All,
>>
>> I've been (briefly) looking into using CheckStyle to try to detect use 
>> of Unicode directional code points in source code to avoid things like 
>> this:
>>
>> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574
>>
>> I don't believe the existing Unicode tool (specifically 
>> AvoidEscapedUnicodeCharacters) can check for this sort of thing. It 
>> seems more geared toward code /style/ than anything else, such as 
>> banning certain byte sequences in files.
>>
>> But it does look like the Regexp* tool(s) may be able to do it.
>>
>> WDYT?
>>
>>    <!-- Look for Unicode directionality overrides -->
>>    <module name="RegexpSingleline">
>>      <property name="format" 
>> value="[&#x2066;-&#x2069;&#x202a;-&#x202e;]" />
>>    </module>
>>
>> I have to fine an example of a file with this type of malicious 
>> content to see if this rule will catch it. In general, are there any 
>> objections to adding this to the checkstyle configuration?
> 
> I have a sample of this, now, and I have a grep command which identifies 
> the file properly, but my config in checkstyle above results in the 
> following error during "checkstyle":
> 
> BUILD FAILED
> .../build.xml:824: Unable to process files: [
> -- a list of what looks like every file in the whole project --
> ]
> 
> There is no indication of what the problem might be. :/

Do we need this?

Is the compiler setting that all source code is ISO-8859-1 not 
sufficient to protect us from this?

Or are you proposing this as a defence in depth option?

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Checkstyle and CVE-2021-42574

Posted by Christopher Schultz <ch...@christopherschultz.net>.
All,

On 11/19/21 13:04, Christopher Schultz wrote:
> All,
> 
> I've been (briefly) looking into using CheckStyle to try to detect use 
> of Unicode directional code points in source code to avoid things like 
> this:
> 
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574
> 
> I don't believe the existing Unicode tool (specifically 
> AvoidEscapedUnicodeCharacters) can check for this sort of thing. It 
> seems more geared toward code /style/ than anything else, such as 
> banning certain byte sequences in files.
> 
> But it does look like the Regexp* tool(s) may be able to do it.
> 
> WDYT?
> 
>    <!-- Look for Unicode directionality overrides -->
>    <module name="RegexpSingleline">
>      <property name="format" 
> value="[&#x2066;-&#x2069;&#x202a;-&#x202e;]" />
>    </module>
> 
> I have to fine an example of a file with this type of malicious content 
> to see if this rule will catch it. In general, are there any objections 
> to adding this to the checkstyle configuration?

I have a sample of this, now, and I have a grep command which identifies 
the file properly, but my config in checkstyle above results in the 
following error during "checkstyle":

BUILD FAILED
.../build.xml:824: Unable to process files: [
-- a list of what looks like every file in the whole project --
]

There is no indication of what the problem might be. :/

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org