You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/06/07 00:18:01 UTC

[jira] [Work logged] (HIVE-24762) StringValueBoundaryScanner ignores boundary which leads to incorrect results

     [ https://issues.apache.org/jira/browse/HIVE-24762?focusedWorklogId=607556&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-607556 ]

ASF GitHub Bot logged work on HIVE-24762:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Jun/21 00:17
            Start Date: 07/Jun/21 00:17
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on pull request #1965:
URL: https://github.com/apache/hive/pull/1965#issuecomment-855489287


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 607556)
    Time Spent: 20m  (was: 10m)

>  StringValueBoundaryScanner ignores boundary which leads to incorrect results
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-24762
>                 URL: https://issues.apache.org/jira/browse/HIVE-24762
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L901
> {code}
>   public boolean isDistanceGreater(Object v1, Object v2, int amt) {
>     ...
>     return s1 != null && s2 != null && s1.compareTo(s2) > 0;
> {code}
> Like other boundary scanners, StringValueBoundaryScanner should take amt into account, otherwise it'll result in the same range regardless of the given window size. This typically affects queries where the range is defined on a string column:
> {code}
> select p_mfgr, p_name, p_retailprice,
> count(*) over(partition by p_mfgr order by p_name range between 1 preceding and current row) as cs1,
> count(*) over(partition by p_mfgr order by p_name range between 3 preceding and current row) as cs2
> from vector_ptf_part_simple_orc;
> {code} 
> with "> 0" cs1 and cs2 will be calculated on the same window, so cs1 == cs2, but actually it should be different, this is the correct result (see "almond antique olive coral navajo"):
> {code}
> +-----------------+---------------------------------------------+------+------+
> |     p_mfgr      |                   p_name                    | cs1  | cs2  |
> +-----------------+---------------------------------------------+------+------+
> | Manufacturer#1  | almond antique burnished rose metallic      | 2    | 2    |
> | Manufacturer#1  | almond antique burnished rose metallic      | 2    | 2    |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    |
> | Manufacturer#1  | almond antique salmon chartreuse burlywood  | 1    | 1    |
> | Manufacturer#1  | almond aquamarine burnished black steel     | 1    | 8    |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    |
> | Manufacturer#2  | almond antique violet chocolate turquoise   | 1    | 1    |
> | Manufacturer#2  | almond antique violet turquoise frosted     | 3    | 3    |
> | Manufacturer#2  | almond antique violet turquoise frosted     | 3    | 3    |
> | Manufacturer#2  | almond antique violet turquoise frosted     | 3    | 3    |
> | Manufacturer#2  | almond aquamarine midnight light salmon     | 1    | 5    |
> | Manufacturer#2  | almond aquamarine rose maroon antique       | 2    | 2    |
> | Manufacturer#2  | almond aquamarine rose maroon antique       | 2    | 2    |
> | Manufacturer#2  | almond aquamarine sandy cyan gainsboro      | 3    | 3    |
> | Manufacturer#3  | almond antique chartreuse khaki white       | 1    | 1    |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    |
> | Manufacturer#3  | almond antique metallic orange dim          | 1    | 1    |
> | Manufacturer#3  | almond antique misty red olive              | 1    | 1    |
> | Manufacturer#3  | almond antique olive coral navajo           | 1    | 3    |
> | Manufacturer#4  | almond antique gainsboro frosted violet     | 1    | 1    |
> | Manufacturer#4  | almond antique violet mint lemon            | 1    | 1    |
> | Manufacturer#4  | almond aquamarine floral ivory bisque       | 2    | 4    |
> | Manufacturer#4  | almond aquamarine floral ivory bisque       | 2    | 4    |
> | Manufacturer#4  | almond aquamarine yellow dodger mint        | 1    | 1    |
> | Manufacturer#4  | almond azure aquamarine papaya violet       | 1    | 1    |
> | Manufacturer#5  | almond antique blue firebrick mint          | 1    | 1    |
> | Manufacturer#5  | almond antique medium spring khaki          | 2    | 2    |
> | Manufacturer#5  | almond antique medium spring khaki          | 2    | 2    |
> | Manufacturer#5  | almond antique sky peru orange              | 1    | 1    |
> | Manufacturer#5  | almond aquamarine dodger light gainsboro    | 1    | 5    |
> | Manufacturer#5  | almond azure blanched chiffon midnight      | 1    | 1    |
> +-----------------+---------------------------------------------+------+------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)