You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/07/19 16:24:00 UTC

[jira] [Commented] (HBASE-27219) Change JONI encoding in RegexStringComparator

    [ https://issues.apache.org/jira/browse/HBASE-27219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568631#comment-17568631 ] 

Andrew Kyle Purtell commented on HBASE-27219:
---------------------------------------------

Thank you for the contribution [~minwoo.kang]!

> Change JONI encoding in RegexStringComparator
> ---------------------------------------------
>
>                 Key: HBASE-27219
>                 URL: https://issues.apache.org/jira/browse/HBASE-27219
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>            Reporter: Minwoo Kang
>            Assignee: Minwoo Kang
>            Priority: Minor
>             Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>         Attachments: rs-heap.png
>
>
> I change the engine of RegexStringComparator to JONI.
> After that I sent a regex filter request, the RegionServer's heap memory usage spiked and the RegionServer did not work due to GC.
>  
> !rs-heap.png|width=609,height=55!
> (RegionServer Heap Memory Usage)
>  
> {code:java}
> INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1435ms
> GC pool 'ParNew' had collection(s): count=1 time=1550ms
> INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1073ms
> GC pool 'ParNew' had collection(s): count=1 time=1534ms
> INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1456ms
> GC pool 'ParNew' had collection(s): count=1 time=1574ms
> INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1297ms
> GC pool 'ParNew' had collection(s): count=1 time=1415ms {code}
> (RegionServer Log)
>  
> I checked the reason, it is said that when using UTF8Encoding, an infinite loop can occur if an invalid UTF8 is entered.
> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
> (https://github.com/trinodb/trino/commit/ea66e8cb27b098a5cea184106fe245064351b567)
> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in RegexStringComparator, it was confirmed that the heap memory usage spike was gone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)