You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/07/19 16:24:00 UTC
[jira] [Commented] (HBASE-27219) Change JONI encoding in RegexStringComparator
[ https://issues.apache.org/jira/browse/HBASE-27219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568631#comment-17568631 ]
Andrew Kyle Purtell commented on HBASE-27219:
---------------------------------------------
Thank you for the contribution [~minwoo.kang]!
> Change JONI encoding in RegexStringComparator
> ---------------------------------------------
>
> Key: HBASE-27219
> URL: https://issues.apache.org/jira/browse/HBASE-27219
> Project: HBase
> Issue Type: Bug
> Components: Filters
> Reporter: Minwoo Kang
> Assignee: Minwoo Kang
> Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
> Attachments: rs-heap.png
>
>
> I change the engine of RegexStringComparator to JONI.
> After that I sent a regex filter request, the RegionServer's heap memory usage spiked and the RegionServer did not work due to GC.
>
> !rs-heap.png|width=609,height=55!
> (RegionServer Heap Memory Usage)
>
> {code:java}
> INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1435ms
> GC pool 'ParNew' had collection(s): count=1 time=1550ms
> INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1073ms
> GC pool 'ParNew' had collection(s): count=1 time=1534ms
> INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1456ms
> GC pool 'ParNew' had collection(s): count=1 time=1574ms
> INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1297ms
> GC pool 'ParNew' had collection(s): count=1 time=1415ms {code}
> (RegionServer Log)
>
> I checked the reason, it is said that when using UTF8Encoding, an infinite loop can occur if an invalid UTF8 is entered.
> For trino, using NonStrictUTF8Encoding instead of UTF8Encoding.
> (https://github.com/trinodb/trino/commit/ea66e8cb27b098a5cea184106fe245064351b567)
> After changing the encoding of JoniRegexEngine to NonStrictUTF8Encoding in RegexStringComparator, it was confirmed that the heap memory usage spike was gone.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)