You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Dan Hecht (JIRA)" <ji...@apache.org> on 2018/03/06 23:52:00 UTC
[jira] [Resolved] (IMPALA-3559) Explain plan and profiles reference HDFS while query is running entirely against S3

     [ https://issues.apache.org/jira/browse/IMPALA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Hecht resolved IMPALA-3559.
-------------------------------
    Resolution: Duplicate

Let's track using IMPALA-6050 which has more discussion.

> Explain plan and profiles reference HDFS while query is running entirely against S3
> -----------------------------------------------------------------------------------
>
>                 Key: IMPALA-3559
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3559
>             Project: IMPALA
>          Issue Type: Task
>          Components: Frontend
>    Affects Versions: Impala 2.6.0
>            Reporter: Mostafa Mokhtar
>            Priority: Critical
>              Labels: s3, supportability
>
> When running queries against databases entirely in S3 the query profile and plan still mention HDFS
> {code}
> WRITE TO HDFS [tpch_300_parquet_partitioned.lineitem, OVERWRITE=false, PARTITION-KEYS=(L_SHIPDATE)]
> |  partitions=2526
> |  hosts=32 per-host-mem=12.53GB
> |
> 01:EXCHANGE [HASH(L_SHIPDATE)]
> |  hosts=32 per-host-mem=0B
> |  tuple-ids=0 row-size=256B cardinality=1682537668
> |
> 00:SCAN HDFS [tpch_300_text_partitioned.lineitem, RANDOM]
>    partitions=2526/2526 files=2526 size=210.05GB
>    table stats: 1682537668 rows total (74 partition(s) missing stats)
>    column stats: all
>    hosts=32 per-host-mem=960.00MB
>    tuple-ids=0 row-size=256B cardinality=1682537668
> {code}
> {code}
>         - TotalIntegrityCheckTime: 0.000ns
>          - TotalReadBlockTime: 0.000ns
>       HdfsTableSink:(Total: 45s194ms, non-child: 45s194ms, % non-child: 100.00%)
>          - BytesWritten: 231.00 B (231)
>          - CompressTimer: 5s055ms
>          - EncodeTimer: 25s434ms
>          - FilesCreated: 57 (57)
>          - HdfsWriteTimer: 0.000ns
>          - PartitionsCreated: 57 (57)
>          - PeakMemoryUsage: 1011.51 MB (1060647936)
>          - RowsInserted: 0 (0)
>          - TmpFileCreateTimer: 4s703ms
> {code}
> {code}
>       HDFS_SCAN_NODE (id=0):(Total: 714.997ms, non-child: 714.997ms, % non-child: 100.00%)
>          - AverageHdfsReadThreadConcurrency: 0.36 
>          - AverageScannerThreadConcurrency: 10.88 
>          - BytesRead: 2.43 GB (2611737608)
>          - BytesReadDataNodeCache: 0
>          - BytesReadLocal: 0
>          - BytesReadRemoteUnexpected: 0
>          - BytesReadShortCircuit: 0
>          - DecompressionTime: 0.000ns
>          - MaxCompressedTextFileLength: 0
>          - NumDisksAccessed: 0 (0)
>          - NumScannerThreadsStarted: 10 (10)
>          - PeakMemoryUsage: 337.63 MB (354027648)
>          - PerReadThreadRawHdfsThroughput: 36.42 MB/sec
>          - RemoteScanRanges: 0 (0)
>          - RowsRead: 19.48M (19483458)
>          - RowsReturned: 19.43M (19431740)
>          - RowsReturnedRate: 27.21 M/sec
>          - ScanRangesComplete: 74 (74)
>          - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
>          - ScannerThreadsTotalWallClockTime: 0.000ns
>            - DelimiterParseTime: 3s596ms
>            - MaterializeTupleTime(*): 7s094ms
>            - ScannerThreadsSysTime: 0.000ns
>            - ScannerThreadsUserTime: 0.000ns
>          - ScannerThreadsVoluntaryContextSwitches: 0 (0)
>          - TotalRawHdfsReadTime(*): 1m8s
>          - TotalReadThroughput: 13.26 MB/sec
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)