You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Tomomichi Hirano (JIRA)" <ji...@apache.org> on 2019/05/22 03:11:00 UTC
[jira] [Closed] (TEZ-4071) shuffle throws exceptions with an external table with multiple hdfs files

     [ https://issues.apache.org/jira/browse/TEZ-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tomomichi Hirano closed TEZ-4071.
---------------------------------

> shuffle throws exceptions with an external table with multiple hdfs files
> -------------------------------------------------------------------------
>
>                 Key: TEZ-4071
>                 URL: https://issues.apache.org/jira/browse/TEZ-4071
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>            Reporter: Tomomichi Hirano
>            Priority: Major
>
> {noformat}
> 2019-05-17 10:08:10,317 [INFO] [Fetcher_B \{Map_1} #1] |shuffle.Fetcher|: Failed to read data to memory for InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, pathComponent=attempt_1557383221332_0289_1_00_000001_0_10003, spillType=0, spillId=-1]. len=25, decomp=11. ExceptionMessage=Not a valid ifile header
> 2019-05-17 10:08:10,317 [WARN] [Fetcher_B \{Map_1} #1] |shuffle.Fetcher|: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, pathComponent=attempt_1557383221332_0289_1_00_000001_0_10003, spillType=0, spillId=-1] from XXXXX
> java.io.IOException: Not a valid ifile header
>  at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.verifyHeaderMagic(IFile.java:859)
>  at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.isCompressedFlagEnabled(IFile.java:866)
>  at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:616)
>  at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:121)
>  at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:950)
>  at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599)
>  at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486)
>  at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284)
>  at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>  at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>  at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>  at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> How to reproduce:
> {noformat}
> 1) create two files - file1.csv and file2.csv
> 2) add two fields to the csv files as below
> one,two
> one,two
> one,two
> 3) In Hive:
> use testdb;
> create external table test1(s1 string, s2 string) row format delimited fields terminated by ',' stored as textfile location '/user/usera/test1';
> 4) Copy one csv file to hdfs - /user/usera/test1
> hdfs dfs -put ./file1.csv /user/usera/test1/
> 5) select count(*) from testdb.test1;
> => works fine.
> 6) copy the second csv file to HDFS
> hdfs dfs -put ./file2.csv /user/usera/test1/
> 7) select * from testdb.test1;
> => Can see all data from 2 hdfs files.
> 8) select count(*) from testdb.test1;
> => get this issue.
> {noformat}
>  
> Similar ticket: https://issues.apache.org/jira/browse/TEZ-3699 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)