You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Robert Muir (Jira)" <ji...@apache.org> on 2021/11/03 11:23:00 UTC

[jira] [Comment Edited] (LUCENE-10218) Extend validateSourcePatterns task to scan for LTR/RTL unicode to catch "Trojan Source" (see paper)

    [ https://issues.apache.org/jira/browse/LUCENE-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437949#comment-17437949 ] 

Robert Muir edited comment on LUCENE-10218 at 11/3/21, 11:22 AM:
-----------------------------------------------------------------

Yeah, I haven't thought about it too much, but just take one of these directional overrides and try to "expand" out to a larger category that would cover similar problems.

My first stab is https://s.apache.org/1e0ei

(You can see the generated unicode range there, which could be used initially as a regex)


was (Author: rcmuir):
Yeah, I haven't thought about it too much, but just take one of these directional overrides and try to "expand" out to a larger category that would cover similar problems.

My first stab is https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Identifier_Type%CE%B2=Default_Ignorable:]

(You can see the generated unicode range there, which could be used initially as a regex)

> Extend validateSourcePatterns task to scan for LTR/RTL unicode to catch "Trojan Source" (see paper)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10218
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10218
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: general/build
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There was a paper published that describes, how a malicous code contributor can supply a patch which successfully compiles to code, but not code that the reviewing committer thinks it does. This comes from the fact that UIs like Github or your IDE apply left-to-right/right-to-left switching unicode sequences and so hiding code for the reviewer.
> See paper: https://trojansource.codes/trojan-source.pdf
> For source code it makes no sense to have LTR/RTL carachters. Compilers like GCC get updates soon, but I am not sure about Java.
> So I suggest to add the pattern of code points to validate source patterns task.
> For now I would only add the code points as described in the paper, but [~rmuir] made the suggestion to exclude a large range. What the regex does not match is the other malicous patern like using visually similar characters to add hidden duplicate methods. The risk there is lower in my mind, unless somebody hides the "bad" method using the above LTR/RTL tricks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org