You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Uwe Schindler (Jira)" <ji...@apache.org> on 2021/11/03 11:22:00 UTC

[jira] [Commented] (LUCENE-10218) Extend validateSourcePatterns task to scan for LTR/RTL unicode to catch "Trojan Source" (see paper)

    [ https://issues.apache.org/jira/browse/LUCENE-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437948#comment-17437948 ] 

Uwe Schindler commented on LUCENE-10218:
----------------------------------------

Pull request: https://github.com/apache/lucene/pull/425

> Extend validateSourcePatterns task to scan for LTR/RTL unicode to catch "Trojan Source" (see paper)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10218
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10218
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: general/build
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There was a paper published that describes, how a malicous code contributor can supply a patch which successfully compiles to code, but not code that the reviewing committer thinks it does. This comes from the fact that UIs like Github or your IDE apply left-to-right/right-to-left switching unicode sequences and so hiding code for the reviewer.
> See paper: https://trojansource.codes/trojan-source.pdf
> For source code it makes no sense to have LTR/RTL carachters. Compilers like GCC get updates soon, but I am not sure about Java.
> So I suggest to add the pattern of code points to validate source patterns task.
> For now I would only add the code points as described in the paper, but [~rmuir] made the suggestion to exclude a large range. What the regex does not match is the other malicous patern like using visually similar characters to add hidden duplicate methods. The risk there is lower in my mind, unless somebody hides the "bad" method using the above LTR/RTL tricks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org