You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Minwoo Kang <it...@gmail.com> on 2022/07/13 07:26:32 UTC

RE: If anyone has experience in changing the regex engine in RegexStringComparator to joni?

Hello,

I checked whether JONI can be used in RegexStringComparator.
After changing the engine of RegexStringComparator to JONI, when a regex
filter request was sent, the heap memory usage spiked and the RegionServer
did not work due to GC.

When I checked the reason, it is said that when using UTF8Encoding, an
infinite loop can occur if an invalid UTF8 is entered.[1]
For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.

After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in
RegexStringComparator, it was confirmed that the heap memory usage spike
was gone.[2]

In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding
instead of UTF8Encoding for JoniRegexEngine's encoding.
What do you think about changing JoniRegexEngine's encoding to
NonStrictUTF8Encoding?

Best Regards,
Minwoo

On 2022/06/27 04:41:41 Minwoo Kang wrote:
> (I sent the mail title in Korean for the first time. I'm so sorry.)
>
> Hello,
>
> Recently, java.util.regex in the Regex filter (RegexStringComparator) had
> been running forever.
> It is said that java.util.regex can run forever or stack overflow in the
> worst case.
>
> Looking at RegexStringComparator, I saw that two regex implementations
> (java, joni) were provided.
> I was wondering if anyone has experience in changing the regex engine
> in RegexStringComparator to joni and operating it.
>
> Best Regards,
> Minwoo
>
> On 2022/06/27 04:37:11 Minwoo Kang wrote:
> > Hello,
> >
> > Recently, java.util.regex in the Regex filter (RegexStringComparator)
had
> > been running forever.
> > It is said that java.util.regex can run forever or stack overflow in the
> > worst case.
> >
> > Looking at RegexStringComparator, I saw that two regex implementations
> > (java, joni) were provided.
> > I was wondering if anyone has experience in changing the regex engine
> > in RegexStringComparator to joni and operating it.
> >
> > Best Regards,
> > Minwoo
> >
>

Re: If anyone has experience in changing the regex engine in RegexStringComparator to joni?

Posted by Andrew Purtell <an...@gmail.com>.
Please do file an issue on our issue tracker. https://issues.apache.org/jira . The project name is HBASE of course. 

I think we may have bigger issues here because joni was recently flagged by static analysis tools we use at my employer to determine compliance with various government requirements. I would assume a CVE has been filed regarding joni. I plan to dig in here soon. A required upgrade of joni could by extension provoke an upgrade of JRuby. Sean, I recall you recently landed some changes in that regard, but only back to branch-2. So, if so, this encoding issue by comparison would be a smaller detail to also address concurrently. In any case let’s track the problem. 

> On Jul 16, 2022, at 10:43 AM, Sean Busbey <bu...@apache.org> wrote:
> 
> That sounds reasonable. Could you file an issue in our issue tracker? Are
> you up for working on a PR?
> 
> 
>> On Wed, Jul 13, 2022 at 2:27 AM Minwoo Kang <it...@gmail.com>
>> wrote:
>> 
>> Hello,
>> 
>> I checked whether JONI can be used in RegexStringComparator.
>> After changing the engine of RegexStringComparator to JONI, when a regex
>> filter request was sent, the heap memory usage spiked and the RegionServer
>> did not work due to GC.
>> 
>> When I checked the reason, it is said that when using UTF8Encoding, an
>> infinite loop can occur if an invalid UTF8 is entered.[1]
>> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
>> 
>> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in
>> RegexStringComparator, it was confirmed that the heap memory usage spike
>> was gone.[2]
>> 
>> In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding
>> instead of UTF8Encoding for JoniRegexEngine's encoding.
>> What do you think about changing JoniRegexEngine's encoding to
>> NonStrictUTF8Encoding?
>> 
>> Best Regards,
>> Minwoo
>> 
>>> On 2022/06/27 04:41:41 Minwoo Kang wrote:
>>> (I sent the mail title in Korean for the first time. I'm so sorry.)
>>> 
>>> Hello,
>>> 
>>> Recently, java.util.regex in the Regex filter (RegexStringComparator) had
>>> been running forever.
>>> It is said that java.util.regex can run forever or stack overflow in the
>>> worst case.
>>> 
>>> Looking at RegexStringComparator, I saw that two regex implementations
>>> (java, joni) were provided.
>>> I was wondering if anyone has experience in changing the regex engine
>>> in RegexStringComparator to joni and operating it.
>>> 
>>> Best Regards,
>>> Minwoo
>>> 
>>> On 2022/06/27 04:37:11 Minwoo Kang wrote:
>>>> Hello,
>>>> 
>>>> Recently, java.util.regex in the Regex filter (RegexStringComparator)
>> had
>>>> been running forever.
>>>> It is said that java.util.regex can run forever or stack overflow in
>> the
>>>> worst case.
>>>> 
>>>> Looking at RegexStringComparator, I saw that two regex implementations
>>>> (java, joni) were provided.
>>>> I was wondering if anyone has experience in changing the regex engine
>>>> in RegexStringComparator to joni and operating it.
>>>> 
>>>> Best Regards,
>>>> Minwoo
>>>> 
>>> 
>> 

Re: If anyone has experience in changing the regex engine in RegexStringComparator to joni?

Posted by Andrew Purtell <an...@gmail.com>.
Please do file an issue on our issue tracker. https://issues.apache.org/jira . The project name is HBASE of course. 

I think we may have bigger issues here because joni was recently flagged by static analysis tools we use at my employer to determine compliance with various government requirements. I would assume a CVE has been filed regarding joni. I plan to dig in here soon. A required upgrade of joni could by extension provoke an upgrade of JRuby. Sean, I recall you recently landed some changes in that regard, but only back to branch-2. So, if so, this encoding issue by comparison would be a smaller detail to also address concurrently. In any case let’s track the problem. 

> On Jul 16, 2022, at 10:43 AM, Sean Busbey <bu...@apache.org> wrote:
> 
> That sounds reasonable. Could you file an issue in our issue tracker? Are
> you up for working on a PR?
> 
> 
>> On Wed, Jul 13, 2022 at 2:27 AM Minwoo Kang <it...@gmail.com>
>> wrote:
>> 
>> Hello,
>> 
>> I checked whether JONI can be used in RegexStringComparator.
>> After changing the engine of RegexStringComparator to JONI, when a regex
>> filter request was sent, the heap memory usage spiked and the RegionServer
>> did not work due to GC.
>> 
>> When I checked the reason, it is said that when using UTF8Encoding, an
>> infinite loop can occur if an invalid UTF8 is entered.[1]
>> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
>> 
>> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in
>> RegexStringComparator, it was confirmed that the heap memory usage spike
>> was gone.[2]
>> 
>> In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding
>> instead of UTF8Encoding for JoniRegexEngine's encoding.
>> What do you think about changing JoniRegexEngine's encoding to
>> NonStrictUTF8Encoding?
>> 
>> Best Regards,
>> Minwoo
>> 
>>> On 2022/06/27 04:41:41 Minwoo Kang wrote:
>>> (I sent the mail title in Korean for the first time. I'm so sorry.)
>>> 
>>> Hello,
>>> 
>>> Recently, java.util.regex in the Regex filter (RegexStringComparator) had
>>> been running forever.
>>> It is said that java.util.regex can run forever or stack overflow in the
>>> worst case.
>>> 
>>> Looking at RegexStringComparator, I saw that two regex implementations
>>> (java, joni) were provided.
>>> I was wondering if anyone has experience in changing the regex engine
>>> in RegexStringComparator to joni and operating it.
>>> 
>>> Best Regards,
>>> Minwoo
>>> 
>>> On 2022/06/27 04:37:11 Minwoo Kang wrote:
>>>> Hello,
>>>> 
>>>> Recently, java.util.regex in the Regex filter (RegexStringComparator)
>> had
>>>> been running forever.
>>>> It is said that java.util.regex can run forever or stack overflow in
>> the
>>>> worst case.
>>>> 
>>>> Looking at RegexStringComparator, I saw that two regex implementations
>>>> (java, joni) were provided.
>>>> I was wondering if anyone has experience in changing the regex engine
>>>> in RegexStringComparator to joni and operating it.
>>>> 
>>>> Best Regards,
>>>> Minwoo
>>>> 
>>> 
>> 

Re: If anyone has experience in changing the regex engine in RegexStringComparator to joni?

Posted by Sean Busbey <bu...@apache.org>.
That sounds reasonable. Could you file an issue in our issue tracker? Are
you up for working on a PR?


On Wed, Jul 13, 2022 at 2:27 AM Minwoo Kang <it...@gmail.com>
wrote:

> Hello,
>
> I checked whether JONI can be used in RegexStringComparator.
> After changing the engine of RegexStringComparator to JONI, when a regex
> filter request was sent, the heap memory usage spiked and the RegionServer
> did not work due to GC.
>
> When I checked the reason, it is said that when using UTF8Encoding, an
> infinite loop can occur if an invalid UTF8 is entered.[1]
> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
>
> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in
> RegexStringComparator, it was confirmed that the heap memory usage spike
> was gone.[2]
>
> In HBase, like trino, it seems to be necessary to use NonStrictUTF8Encoding
> instead of UTF8Encoding for JoniRegexEngine's encoding.
> What do you think about changing JoniRegexEngine's encoding to
> NonStrictUTF8Encoding?
>
> Best Regards,
> Minwoo
>
> On 2022/06/27 04:41:41 Minwoo Kang wrote:
> > (I sent the mail title in Korean for the first time. I'm so sorry.)
> >
> > Hello,
> >
> > Recently, java.util.regex in the Regex filter (RegexStringComparator) had
> > been running forever.
> > It is said that java.util.regex can run forever or stack overflow in the
> > worst case.
> >
> > Looking at RegexStringComparator, I saw that two regex implementations
> > (java, joni) were provided.
> > I was wondering if anyone has experience in changing the regex engine
> > in RegexStringComparator to joni and operating it.
> >
> > Best Regards,
> > Minwoo
> >
> > On 2022/06/27 04:37:11 Minwoo Kang wrote:
> > > Hello,
> > >
> > > Recently, java.util.regex in the Regex filter (RegexStringComparator)
> had
> > > been running forever.
> > > It is said that java.util.regex can run forever or stack overflow in
> the
> > > worst case.
> > >
> > > Looking at RegexStringComparator, I saw that two regex implementations
> > > (java, joni) were provided.
> > > I was wondering if anyone has experience in changing the regex engine
> > > in RegexStringComparator to joni and operating it.
> > >
> > > Best Regards,
> > > Minwoo
> > >
> >
>