You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/02/27 11:20:00 UTC
[jira] [Work logged] (HIVE-22731) Probe MapJoin hashtables for row level filtering

     [ https://issues.apache.org/jira/browse/HIVE-22731?focusedWorklogId=394100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394100 ]

ASF GitHub Bot logged work on HIVE-22731:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Feb/20 11:19
            Start Date: 27/Feb/20 11:19
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on pull request #884: HIVE-22731 Probe decode initial patch
URL: https://github.com/apache/hive/pull/884
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 394100)
    Time Spent: 20m  (was: 10m)

> Probe MapJoin hashtables for row level filtering
> ------------------------------------------------
>
>                 Key: HIVE-22731
>                 URL: https://issues.apache.org/jira/browse/HIVE-22731
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, llap
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22731.1.patch, HIVE-22731.2.patch, HIVE-22731.WIP.patch, decode_time_bars.pdf
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. They only filter sets of rows if they can guarantee that none of the rows can pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with multiple columns that are not even used in the final result. See figure where original is what happens today and in LazyDecode we skip decoding rows that do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin we could utilize the key HashTable created from the smaller table to skip deserializing row columns at the larger table that do not match any key and thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)