You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Zhanwei Wang (JIRA)" <ji...@apache.org> on 2015/11/20 04:42:10 UTC

[jira] [Updated] (HAWQ-42) Disk file corrupt will make HAWQ coredump when read-shortcircuit is enabled in hdfs-client.xml

     [ https://issues.apache.org/jira/browse/HAWQ-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhanwei Wang updated HAWQ-42:
-----------------------------
    Summary: Disk file corrupt will make HAWQ coredump when read-shortcircuit is enabled in hdfs-client.xml  (was: Query Executor Error (core dump))

> Disk file corrupt will make HAWQ coredump when read-shortcircuit is enabled in hdfs-client.xml
> ----------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-42
>                 URL: https://issues.apache.org/jira/browse/HAWQ-42
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: libhdfs
>            Reporter: Xiang Sheng
>            Assignee: Zhanwei Wang
>            Priority: Critical
>
> Running workload ( tpch_row_10g_nocompression_no_partition) on a 128 node cluster,  these queries (q1,q3,q4,q5,q6,w7,q8,q9,q10,q12,q14,q15,q17,q18,q19,q20,q21) failed out for query executor error and core dump.
> {noformat}
> (gdb) bt
> #0  0x000000350b40f5db in raise () from /lib64/libpthread.so.0
> #1  0x0000000000ac77fa in SafeHandlerForSegvBusIll (processName=<value optimized out>, postgres_signal_arg=7) at elog.c:4497
> #2  <signal handler called>
> #3  0x00007f1b445690c2 in _mm_crc32_u64 (this=0x261fcd0, b=0x7f1b0d6d7000, len=512) at /opt/gcc-4.4.2/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/include/smmintrin.h:716
> #4  Hdfs::Internal::HWCrc32c::update (this=0x261fcd0, b=0x7f1b0d6d7000, len=512) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/common/HWCrc32c.cpp:114
> #5  0x00007f1b44549692 in Hdfs::Internal::LocalBlockReader::readAndVerify (this=0x26075a0, bufferSize=2097152) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/LocalBlockReader.cpp:174
> #6  0x00007f1b4454996f in Hdfs::Internal::LocalBlockReader::readInternal (this=0x26075a0, buf=0x3057b20 "Pb\370\003V\246X", len=<value optimized out>)
>     at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/LocalBlockReader.cpp:227
> #7  0x00007f1b44549a13 in Hdfs::Internal::LocalBlockReader::read (this=0xffffffff, buf=0x7f1b0d6d7000 <Address 0x7f1b0d6d7000 out of bounds>, size=64)
>     at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/LocalBlockReader.cpp:240
> #8  0x00007f1b4453bc3a in Hdfs::Internal::InputStreamImpl::readOneBlock (this=0x2768f20, buf=0x3057b20 "Pb\370\003V\246X", size=65536, shouldUpdateMetadataOnFailure=<value optimized out>)
>     at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/InputStreamImpl.cpp:563
> #9  0x00007f1b4453c163 in Hdfs::Internal::InputStreamImpl::readInternal (this=0x2768f20, buf=0x3057b20 "Pb\370\003V\246X", size=65536) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/InputStreamImpl.cpp:666
> #10 0x00007f1b4453c5bb in Hdfs::Internal::InputStreamImpl::read (this=0x2768f20, buf=0x3057b20 "Pb\370\003V\246X", size=65536) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/InputStreamImpl.cpp:507
> #11 0x00007f1b44530e8c in hdfsRead (fs=<value optimized out>, file=<value optimized out>, buffer=0xffffffff, length=225275904) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/Hdfs.cpp:800
> #12 0x00007f1b2138ab7d in gpfs_hdfs_read (fcinfo=<value optimized out>) at gpfshdfs.c:492
> #13 0x000000000092b48b in HdfsRead (protocol=<value optimized out>, fileSystem=<value optimized out>, file=<value optimized out>, buffer=<value optimized out>, length=<value optimized out>) at filesystem.c:533
> #14 0x000000000091c385 in HdfsFileRead (file=6, buffer=0x3057b20 "Pb\370\003V\246X", amount=65536) at fd.c:2722
> #15 FileRead (file=6, buffer=0x3057b20 "Pb\370\003V\246X", amount=65536) at fd.c:3133
> #16 0x0000000000bcc416 in BufferedReadIo (bufferedRead=0x3009f08, newMaxReadAheadLen=<value optimized out>, growBufferLen=<value optimized out>, isUseSplitLen=<value optimized out>) at cdbbufferedread.c:198
> #17 BufferedReadUseBeforeBuffer (bufferedRead=0x3009f08, newMaxReadAheadLen=<value optimized out>, growBufferLen=<value optimized out>, isUseSplitLen=<value optimized out>) at cdbbufferedread.c:317
> #18 BufferedReadGrowBuffer (bufferedRead=0x3009f08, newMaxReadAheadLen=<value optimized out>, growBufferLen=<value optimized out>, isUseSplitLen=<value optimized out>) at cdbbufferedread.c:647
> #19 0x0000000000bc6b79 in AppendOnlyStorageRead_InternalGetBuffer (storageRead=0x3009eb8, isUseSplitLen=0 '\000') at cdbappendonlystorageread.c:1223
> #20 AppendOnlyStorageRead_GetBuffer (storageRead=0x3009eb8, isUseSplitLen=0 '\000') at cdbappendonlystorageread.c:1289
> #21 0x0000000000599a1e in AppendOnlyExecutorReadBlock_GetContents (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:628
> #22 getNextBlock (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:1243
> #23 appendonlygettup (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:1283
> #24 appendonly_getnext (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:1673
> #25 0x000000000075de16 in AppendOnlyScanNext (scanState=<value optimized out>) at execAOScan.c:39
> #26 0x0000000000751f1b in ExecScan (scanState=0x2ffea70) at execScan.c:129
> #27 ExecTableScanRelation (scanState=0x2ffea70) at execScan.c:441
> #28 0x0000000000788a73 in ExecTableScan (node=0x2ffea70) at nodeTableScan.c:42
> #29 0x00000000007469dd in ExecProcNode (node=0x2ffea70) at execProcnode.c:904
> #30 0x000000000077efe6 in execMotionSender (node=0x2ffd2d0) at nodeMotion.c:348
> #31 ExecMotion (node=0x2ffd2d0) at nodeMotion.c:315
> #32 0x0000000000746b71 in ExecProcNode (node=0x2ffd2d0) at execProcnode.c:999
> #33 0x000000000073a8ac in ExecutePlan (estate=0x274bb60, planstate=<value optimized out>, operation=<value optimized out>, numberTuples=<value optimized out>, direction=<value optimized out>, dest=<value optimized out>) at execMain.c:3181
> #34 0x000000000073b1f2 in ExecutorRun (queryDesc=<value optimized out>, direction=<value optimized out>, count=<value optimized out>) at execMain.c:1166
> #35 0x0000000000976ec9 in PortalRunSelect (portal=<value optimized out>, count=0, isTopLevel=<value optimized out>, dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:1641
> #36 PortalRun (portal=<value optimized out>, count=0, isTopLevel=<value optimized out>, dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:1463
> #37 0x000000000096f488 in exec_mpp_query (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:1378
> #38 PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:4866
> #39 0x00000000008cf51b in BackendRun (port=0x260d420) at postmaster.c:5844
> #40 BackendStartup (port=0x260d420) at postmaster.c:5437
> #41 0x00000000008d4fef in ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:2139
> #42 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1431
> #43 0x00000000007d6aea in main (argc=9, argv=0x2609d20) at main.c:226
> (gdb) 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)