You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Saksham Gangwar (Jira)" <ji...@apache.org> on 2021/01/21 10:16:00 UTC

[jira] [Created] (PHOENIX-6334) All map tasks should operate on the same restored snapshot

Saksham Gangwar created PHOENIX-6334:
----------------------------------------

             Summary: All map tasks should operate on the same restored snapshot
                 Key: PHOENIX-6334
                 URL: https://issues.apache.org/jira/browse/PHOENIX-6334
             Project: Phoenix
          Issue Type: Bug
          Components: core
    Affects Versions: 4.14.3, 5.0.0
            Reporter: Saksham Gangwar
             Fix For: 5.1.0, 4.16.0, 4.x


Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997. 

We also *need not restore the snapshot per map task*. The purpose of this Jira is to correct that behavior. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar.

 

All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)