You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "henryrneh (via GitHub)" <gi...@apache.org> on 2023/02/21 11:13:16 UTC

[GitHub] [lucene] henryrneh opened a new issue, #12165: Integrating Apache Lucene into OSS-Fuzz

henryrneh opened a new issue, #12165:
URL: https://github.com/apache/lucene/issues/12165

### Description

Hi Apache Lucene developers,

We have prepared the [initial integration](https://github.com/google/oss-fuzz/pull/9772) of Apache Lucene into [Google OSS-Fuzz](https://github.com/google/oss-fuzz) which will provide more security for your project.

**Why do you need Fuzzing?**
The Code Intelligence JVM fuzzer [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer) has already found [hundreds of bugs](https://github.com/CodeIntelligenceTesting/jazzer#findings) in open source projects including for example [OpenJDK](https://nvd.nist.gov/vuln/detail/CVE-2022-21360), [Protobuf](https://nvd.nist.gov/vuln/detail/CVE-2021-22569) or [jsoup](https://github.com/jhy/jsoup/security/advisories/GHSA-m72m-mhq2-9p6c). Fuzzing proved to be very effective having no false positives. It provides a crashing input which helps you to reproduce and debug any finding easily. The integration of your project into the OSS-Fuzz platform will enable continuous fuzzing of your project by [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer).

**What do you need to do?**
The integration requires the maintainer or one established project committer to deal with the bug reports.

You need to create or provide one email address that is associated with a google account as per [here](https://google.github.io/oss-fuzz/getting-started/accepting-new-projects/). When a bug is found, you will receive an email that will provide you with access to ClusterFuzz, crash reports, code coverage reports and fuzzer statistics. More than 1 person can be included.

**How can Code Intelligence support you?**
We will continue to add more fuzz targets to improve code coverage over time. Furthermore, we are permanently enhancing fuzzing technologies by developing new fuzzers and bug detectors.

Please let me know if you have any questions regarding fuzzing or the OSS-Fuzz integration.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "uschindler (via GitHub)" <gi...@apache.org>.

uschindler commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1674532419

   Just open public issues. 
   
   Actually not all of those errors would be fixed, because Apache Lucene does not always do all possible checks, as performance is more important than an OOM (caused by "wrong usage").


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] 0roman commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "0roman (via GitHub)" <gi...@apache.org>.

0roman commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1439586559

> So I disagree with adding another fuzzing engine into Lucene. We have a library called "randomized-testing" which provides everything needed. Almost every test in Lucene has fuzzing included, the example above is just a very special one with a very wide range of components tested. Background: Lucene is using randomized testing since around 2012. Here is a talk from 2014 by @dweiss about it: https://2019.berlinbuzzwords.de/14/session/randomize-your-tests-and-it-will-blow-your-socks.html

The talk was interesting, and Randomized Testing is a great contribution to explore complex boundary conditions and to find unexpected edge cases. However, the approach is quite different, which is fine because it fits your use case.

Modern fuzzing is coverage-guided, which means the tested code is instrumented to give the fuzzer feedback about code coverage and further insights when executing a test case. The fuzzer then optimizes its mutations to generate inputs that maximize code coverage and pass checks in the code, such as string comparisons. Furthermore, [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer) has specialized bug detectors that detect various classes of vulnerabilities like command injections, insecure deserialization, or attacker-controlled class loading.

Based on that, I suggest we continue the OSS-Fuzz Onboarding without a maintainer. This way, you don't get any findings notifications from OSS-Fuzz. We will then receive finding reports from OSS-Fuzz, and we will make sure that we communicate interesting findings with you. You can still be added as maintainers afterward if you later see an added value in the findings you get. Thanks all for the contribution.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] rmuir commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "rmuir (via GitHub)" <gi...@apache.org>.

rmuir commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438859463

In the analyzers example given there, it is a good one to see the differences.

Both approaches (OSS Fuzz and existing TestRandomChains) test "random analysis chains", but the current TestRandomChains also tests all possible ctors of these analysis components (not just the default constructor), and injects random stuff into them. Default constructor is usually tested anyway in the component's own unit tests with fuzzed data (see testRandomData() methods everywhere). fuzzing the ctors in this way, finds problems e.g. if a component is e.g. missing a sanity or range check on an integer parameter. This might even be more productive overall than actually feeding "fuzzed data".

The current TestRandomChains also doesn't require us to "register" any new components, it just discovers all Tokenizers, CharFilters, TokenFilters, etc that are available. This ensures we catch problems in new analyzers that get added.

As far as actual data fuzzing, it is more than just randomized data, have a look at our base analyzers test class, it is a "torture chamber" for analyzers and will find things such as thread-safety/race issues as well: https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/analysis/BaseTokenStreamTestCase.java
All analyzers use this class for "fuzzing" in their own unit test: TestRandomChains is just an "integration test" that then combines them together.

It is also important to think about how much time it takes to debug a failure, too. The current setup across both unit and integration tests makes it pretty easy to spot when the problem is a specific analyzer component, vs some crazy "interaction" between more than one of them. Nobody wants to debug a integration test if they can debug a unit test.

We did a lot of work with BaseTokenStreamTestCase/TestRandomChains such as adding special logging of the analysis chain, adding "ValidatingTokenFilter" at every step,etc. It still sucks to debug this stuff when it fails, we have a lot of analyzers :)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] henryrneh commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "henryrneh (via GitHub)" <gi...@apache.org>.

henryrneh commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1440742856

   > Sure, never enough bug reports. I keep wondering what does it report as a legitimate error/ problem if the patch catches a lot of what can be thrown from the inside, for example: https://github.com/google/oss-fuzz/pull/9772/files#diff-f3b3b0a611aa50e69c4823a63b6e760d6c607a635e584a14f836146bc49de48eR195-R196
   
   Every caught exception in the fuzz target was already triggered by the fuzzer during local testing very fast e.g. IllegalStateException. We just catch them to let the fuzzer continue and find more severe bugs such as out of memory issues or bugs detected by [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer) bug detectors.
   
   Note: This PR is just an initial integration, future enhancements are planned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "uschindler (via GitHub)" <gi...@apache.org>.

uschindler commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438783686

I checked the patch in the related issue. It fuzzes analyzer creation with some data and also has some fuzzing for IndexSearcher. That's nothing new, e.g., we have our own Analysis Fuzzer already: https://github.com/apache/lucene/blob/main/lucene/analysis.tests/src/test/org/apache/lucene/analysis/tests/TestRandomChains.java

So I disagree with adding another fuzzing engine into Lucene. We have a library called "randomized-testing" which provides everything needed. Almost every test in Lucene has fuzzing included, the example above is just a very special one with a very wide range of components tested. Background: Lucene is using randomized testing since around 2012. Here is a talk from 2014 by @dweiss about it: https://2019.berlinbuzzwords.de/14/session/randomize-your-tests-and-it-will-blow-your-socks.html

At moment we have fuzzing not only in our own tests, the so called "Policeman Jenkins" Server (https://jenkine.thetaphi.de) runs our test suite in an endless loop and on top of that each run is using a different Java version and different settings for garbage collection and java's pointer size.

If you provide computing power to Lucene and ASF we are happy to use it, but it is enough to run Lucene's test suite in a loop.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] uschindler commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "uschindler (via GitHub)" <gi...@apache.org>.

uschindler commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1440293014

   That is basically the same like: https://github.com/apache/lucene/blob/cce33b07e4f545ae4442c743c5023df1fb5d8fb9/lucene/analysis.tests/src/test/org/apache/lucene/analysis/tests/TestRandomChains.java#L761-L763
   (although ours is more specific and also handles NULL because we forcefully pass NULL)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] henryrneh commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "henryrneh (via GitHub)" <gi...@apache.org>.

henryrneh commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438392493

   Hello @dweiss, great to hear that Apache Lucene is already using fuzzing! 
   
   The big value is that computation power is sponsored by Google and that it is fuzzed by [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer) the most modern state of the art fuzzer for JVM languages which maximise code coverage automatically based on feedback and which is enhanced regularly with new bug detectors. 
   
   Please let me know if you have further questions regarding OSS-Fuzz and Jazzer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] dweiss commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "dweiss (via GitHub)" <gi...@apache.org>.

dweiss commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1439747236

   Sure, never enough bug reports. I keep wondering what does it report as a legitimate error/ problem if the patch catches a lot of what can be thrown from the inside, for example:
   https://github.com/google/oss-fuzz/pull/9772/files#diff-f3b3b0a611aa50e69c4823a63b6e760d6c607a635e584a14f836146bc49de48eR195-R196


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] dweiss commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "dweiss (via GitHub)" <gi...@apache.org>.

dweiss commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1438317336

   Thank you. Your contribution is appreciated but Lucene already uses what you call a "fuzzer" - a reproducible, pseudo-random component assembly for tests... In fact, we have used it for many years now and indeed it's been successful at finding bugs in both Lucene and the JVM runtime. I'm not sure if there's any added value in what this patch provides.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org

[GitHub] [lucene] henryrneh commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

Posted by "henryrneh (via GitHub)" <gi...@apache.org>.

henryrneh commented on issue #12165:
URL: https://github.com/apache/lucene/issues/12165#issuecomment-1674365549

   Now we have started to do some bug triaging of bugs from OSS-Fuzz. There are multiple issues discovered with the fuzzer, for example OutOfMemory or StackOverflow, that we can disclose one by one or by giving you access via email to the oss-fuzz platform. Should we disclose them here through public issues or do you prefer through security@apache.org mailing list?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org