You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2016/07/25 00:47:20 UTC
[jira] [Commented] (HIVE-14318) Vectorization: LIKE should use
matches() instead of find(0)
[ https://issues.apache.org/jira/browse/HIVE-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391218#comment-15391218 ]
Hive QA commented on HIVE-14318:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12819757/HIVE-14318.1.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10352 tests executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testRegex
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/629/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/629/console
Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-629/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12819757 - PreCommit-HIVE-MASTER-Build
> Vectorization: LIKE should use matches() instead of find(0)
> -----------------------------------------------------------
>
> Key: HIVE-14318
> URL: https://issues.apache.org/jira/browse/HIVE-14318
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 1.3.0, 1.2.1, 2.2.0
> Reporter: Gopal V
> Assignee: Gopal V
> Attachments: HIVE-14318.1.patch
>
>
> Checking for a match instead of find() would allow matcher to exit early instead of looking for sub-sequences beyond the first non-match.
> In UDFLike.java, the complex pattern checker uses matches() and the vectorized version uses find(0), which is more expensive.
> {code}
> Benchmark Mode Cnt Score Error Units
> RegexBench.testGreedyRegexHit avgt 5 379.316 ± 32.444 ns/op
> RegexBench.testGreedyRegexHitCheck avgt 5 344.895 ± 15.436 ns/op
> RegexBench.testGreedyRegexMiss avgt 5 497.193 ± 18.168 ns/op
> RegexBench.testGreedyRegexMissCheck avgt 5 171.872 ± 8.588 ns/op
> {code}
> The miss in match is nearly ~3x more expensive per-row with the .find(0) over the .match() check version.
> The pattern match scenario is nearly the same.
> The lazy scenario makes it slower when there's a hit (because match runs the check till end, but ~2x faster when there's a miss).
> {code}
> RegexBench.testLazyRegexHit avgt 5 78.398 ± 6.007 ns/op
> RegexBench.testLazyRegexHitCheck avgt 5 120.557 ± 4.396 ns/op
> RegexBench.testLazyRegexMiss avgt 5 387.594 ± 25.672 ns/op
> RegexBench.testLazyRegexMissCheck avgt 5 154.489 ± 13.622 ns/op
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)