You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Andrew Purtell <an...@gmail.com> on 2022/07/16 18:05:32 UTC
Re: If anyone has experience in changing the regex engine in RegexStringComparator to joni?
Please do file an issue on our issue tracker. https://issues.apache.org/jira . The project name is HBASE of course.
I think we may have bigger issues here because joni was recently flagged by static analysis tools we use at my employer to determine compliance with various government requirements. I would assume a CVE has been filed regarding joni. I plan to dig in here soon. A required upgrade of joni could by extension provoke an upgrade of JRuby. Sean, I recall you recently landed some changes in that regard, but only back to branch-2. So, if so, this encoding issue by comparison would be a smaller detail to also address concurrently. In any case let’s track the problem.
> On Jul 16, 2022, at 10:43 AM, Sean Busbey <bu...@apache.org> wrote:
>
> That sounds reasonable. Could you file an issue in our issue tracker? Are
> you up for working on a PR?
>
>
>> On Wed, Jul 13, 2022 at 2:27 AM Minwoo Kang <it...@gmail.com>
>> wrote:
>>
>> Hello,
>>
>> I checked whether JONI can be used in RegexStringComparator.
>> After changing the engine of RegexStringComparator to JONI, when a regex
>> filter request was sent, the heap memory usage spiked and the RegionServer
>> did not work due to GC.
>>
>> When I checked the reason, it is said that when using UTF8Encoding, an
>> infinite loop can occur if an invalid UTF8 is entered.[1]
>> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
>>
>> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in
>> RegexStringComparator, it was confirmed that the heap memory usage spike
>> was gone.[2]
>>
>> In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding
>> instead of UTF8Encoding for JoniRegexEngine's encoding.
>> What do you think about changing JoniRegexEngine's encoding to
>> NonStrictUTF8Encoding?
>>
>> Best Regards,
>> Minwoo
>>
>>> On 2022/06/27 04:41:41 Minwoo Kang wrote:
>>> (I sent the mail title in Korean for the first time. I'm so sorry.)
>>>
>>> Hello,
>>>
>>> Recently, java.util.regex in the Regex filter (RegexStringComparator) had
>>> been running forever.
>>> It is said that java.util.regex can run forever or stack overflow in the
>>> worst case.
>>>
>>> Looking at RegexStringComparator, I saw that two regex implementations
>>> (java, joni) were provided.
>>> I was wondering if anyone has experience in changing the regex engine
>>> in RegexStringComparator to joni and operating it.
>>>
>>> Best Regards,
>>> Minwoo
>>>
>>> On 2022/06/27 04:37:11 Minwoo Kang wrote:
>>>> Hello,
>>>>
>>>> Recently, java.util.regex in the Regex filter (RegexStringComparator)
>> had
>>>> been running forever.
>>>> It is said that java.util.regex can run forever or stack overflow in
>> the
>>>> worst case.
>>>>
>>>> Looking at RegexStringComparator, I saw that two regex implementations
>>>> (java, joni) were provided.
>>>> I was wondering if anyone has experience in changing the regex engine
>>>> in RegexStringComparator to joni and operating it.
>>>>
>>>> Best Regards,
>>>> Minwoo
>>>>
>>>
>>