You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Andrew Purtell <an...@gmail.com> on 2022/07/16 18:05:32 UTC

Re: If anyone has experience in changing the regex engine in RegexStringComparator to joni?

Please do file an issue on our issue tracker. https://issues.apache.org/jira . The project name is HBASE of course. 

I think we may have bigger issues here because joni was recently flagged by static analysis tools we use at my employer to determine compliance with various government requirements. I would assume a CVE has been filed regarding joni. I plan to dig in here soon. A required upgrade of joni could by extension provoke an upgrade of JRuby. Sean, I recall you recently landed some changes in that regard, but only back to branch-2. So, if so, this encoding issue by comparison would be a smaller detail to also address concurrently. In any case let’s track the problem. 

> On Jul 16, 2022, at 10:43 AM, Sean Busbey <bu...@apache.org> wrote:
> 
> That sounds reasonable. Could you file an issue in our issue tracker? Are
> you up for working on a PR?
> 
> 
>> On Wed, Jul 13, 2022 at 2:27 AM Minwoo Kang <it...@gmail.com>
>> wrote:
>> 
>> Hello,
>> 
>> I checked whether JONI can be used in RegexStringComparator.
>> After changing the engine of RegexStringComparator to JONI, when a regex
>> filter request was sent, the heap memory usage spiked and the RegionServer
>> did not work due to GC.
>> 
>> When I checked the reason, it is said that when using UTF8Encoding, an
>> infinite loop can occur if an invalid UTF8 is entered.[1]
>> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
>> 
>> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in
>> RegexStringComparator, it was confirmed that the heap memory usage spike
>> was gone.[2]
>> 
>> In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding
>> instead of UTF8Encoding for JoniRegexEngine's encoding.
>> What do you think about changing JoniRegexEngine's encoding to
>> NonStrictUTF8Encoding?
>> 
>> Best Regards,
>> Minwoo
>> 
>>> On 2022/06/27 04:41:41 Minwoo Kang wrote:
>>> (I sent the mail title in Korean for the first time. I'm so sorry.)
>>> 
>>> Hello,
>>> 
>>> Recently, java.util.regex in the Regex filter (RegexStringComparator) had
>>> been running forever.
>>> It is said that java.util.regex can run forever or stack overflow in the
>>> worst case.
>>> 
>>> Looking at RegexStringComparator, I saw that two regex implementations
>>> (java, joni) were provided.
>>> I was wondering if anyone has experience in changing the regex engine
>>> in RegexStringComparator to joni and operating it.
>>> 
>>> Best Regards,
>>> Minwoo
>>> 
>>> On 2022/06/27 04:37:11 Minwoo Kang wrote:
>>>> Hello,
>>>> 
>>>> Recently, java.util.regex in the Regex filter (RegexStringComparator)
>> had
>>>> been running forever.
>>>> It is said that java.util.regex can run forever or stack overflow in
>> the
>>>> worst case.
>>>> 
>>>> Looking at RegexStringComparator, I saw that two regex implementations
>>>> (java, joni) were provided.
>>>> I was wondering if anyone has experience in changing the regex engine
>>>> in RegexStringComparator to joni and operating it.
>>>> 
>>>> Best Regards,
>>>> Minwoo
>>>> 
>>> 
>>