You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Arpit Agarwal (Jira)" <ji...@apache.org> on 2023/03/27 15:52:00 UTC

[jira] [Assigned] (HDDS-8289) get splits in tpcds queries are way higher (10-30+ seconds) causing slowness on FSO bucket

     [ https://issues.apache.org/jira/browse/HDDS-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arpit Agarwal reassigned HDDS-8289:
-----------------------------------

    Assignee: Ritesh Shukla

> get splits in tpcds queries are way higher (10-30+ seconds) causing slowness on FSO bucket
> ------------------------------------------------------------------------------------------
>
>                 Key: HDDS-8289
>                 URL: https://issues.apache.org/jira/browse/HDDS-8289
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: George Huang
>            Assignee: Ritesh Shukla
>            Priority: Critical
>
> hive: ======= Hive 3.1.3000.7.1.8.11-3 Git git://centos7-builds-9d1tl/grid/0/jenkins/workspace/workspace/CDH-parallel-centos7/SOURCES/hive -r 12b8607f399cafbfa69950daad17d927cf97629d Compiled by jenkins on Wed Dec 7 16:16:55 UTC 2022 From source with checksum d5ec9df9f665c988e78e4d6aee1ff544 Ozone: ======= Using HDDS 1.3.0.718.1.0-b21 Source code repository git@github.infra.cloudera.com:CDH/ozone.git -r df5938c423df684e6a1d6ccab002bec323ee7db1 Compiled by jenkins on 2023-03-02T16:52Z Compiled with protoc 2.5.0, 3.19.6 and 3.7.1 From source with checksum 1a9d4a2bf6acc652de6c29241163a63f
>  
> *Context:*
> Some queries run in Ozone were much slower than HDFS (1 TB tpcds orc format).
> *Issue:*
> getSplits in AM is super slow causing the slowness. Though getSplits run in multithreaded mode for ORC, it is still slower due to internal "listStatus" calls. Yet to get details on why listStatus calls are slow from Ozone side.
> Attaching AM logs here for later reference.
> Attaching AM log snippet for Q44; spent 36+ seconds in getSplits:
> Q44:
> {noformat}
> 2023-03-26 21:10:04,421 [INFO] [App Shared Pool - #25] |orc.OrcInputFormat|: getSplits finished (#splits: 44). duration: 36595 ms
> 2023-03-26 21:10:04,594 [INFO] [App Shared Pool - #24] |orc.OrcInputFormat|: getSplits finished (#splits: 1917). duration: 36727 ms
> 2023-03-26 21:10:04,698 [INFO] [App Shared Pool - #27] |orc.OrcInputFormat|: getSplits finished (#splits: 44). duration: 36816 ms
> 2023-03-26 21:10:04,855 [INFO] [App Shared Pool - #26] |orc.OrcInputFormat|: getSplits finished (#splits: 1917). duration: 36907 ms
> 2023-03-26 21:10:05,051 [INFO] [App Shared Pool - #28] |orc.OrcInputFormat|: getSplits finished (#splits: 1917). duration: 37070 ms                
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org