You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "Wang, Kun (Ann Arbor)" <wa...@siemens.com> on 2017/11/21 03:17:37 UTC
Load data from hdfs using pig reports "Failed to read data from" error

Hi,

I am running pig 0.17 with Hadoop 2.8.2 (both are self-installed on a Windows machine). The running mode is the default mapreduce mode. The following command shows the input3.csv actually exists:

C:\Users\wangku>hdfs dfs -ls hdfs://0.0.0.0:19000/user/wangku/inputs/input3.csv
-rwxrwxrwx   1 wangku supergroup        329 2017-11-20 21:05 hdfs://0.0.0.0:1900
0/user/wangku/inputs/input3.csv

But loading this file followed by a dump generates the following errors:

grunt> A = LOAD 'inputs/input3.csv' USING PigStorage(',') AS (source:chararray,t
arget:chararray);
grunt> dump A;
2017-11-20 21:52:59,955 [main] INFO  org.apache.pig.tools.pigstats.ScriptState -
Pig features used in the script: UNKNOWN
2017-11-20 21:52:59,966 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Ke
y [pig.schematuple] was not set... will not generate code.
2017-11-20 21:52:59,967 [main] INFO  org.apache.pig.newplan.logical.optimizer.Lo
gicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalc
ulator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeF
ilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePu
shdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCa
stInserter]}
2017-11-20 21:52:59,968 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? fal
se
2017-11-20 21:52:59,969 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-11-20 21:52:59,970 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-11-20 21:52:59,977 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Con
necting to ResourceManager at /0.0.0.0:8032
2017-11-20 21:52:59,978 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRS
criptState - Pig script settings are added to the job
2017-11-20 21:52:59,979 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percen
t is not set, set to default 0.3
2017-11-20 21:52:59,980 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-proce
ss
2017-11-20 21:53:00,052 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.JobControlCompiler - Added jar file:/D:/workdir/Hadoop/pig-0.1
7.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1368998709/tmp-
1186269454/pig-0.17.0-core-h2.jar
2017-11-20 21:53:00,085 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.JobControlCompiler - Added jar file:/D:/workdir/Hadoop/pig-0.1
7.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1368998709/tm
p-1094649886/automaton-1.11-8.jar
2017-11-20 21:53:00,129 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.JobControlCompiler - Added jar file:/D:/workdir/Hadoop/pig-0.1
7.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1368998709/t
mp1048851944/antlr-runtime-3.4.jar
2017-11-20 21:53:00,172 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.JobControlCompiler - Added jar file:/D:/workdir/Hadoop/pig-0.1
7.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1368998709/tmp
-1322115459/joda-time-2.9.3.jar
2017-11-20 21:53:00,174 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-11-20 21:53:00,175 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - K
ey [pig.schematuple] is false, will not generate code.
2017-11-20 21:53:00,175 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - S
tarting process to move generated code to distributed cacche
2017-11-20 21:53:00,175 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - S
etting key [pig.schematuple.classes] with classes to deserialize []
2017-11-20 21:53:00,181 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission
.
2017-11-20 21:53:00,184 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy
- Connecting to ResourceManager at /0.0.0.0:8032
2017-11-20 21:53:00,212 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResour
ceUploader - No job jar file set.  User classes may not be found. See Job or Job
#setJar(String).
2017-11-20 21:53:00,219 [JobControl] INFO  org.apache.pig.builtin.PigStorage - U
sing PigTextInputFormat
2017-11-20 21:53:00,221 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input
.FileInputFormat - Total input files to process : 1
2017-11-20 21:53:00,221 [JobControl] INFO  org.apache.pig.backend.hadoop.executi
onengine.util.MapRedUtil - Total input paths to process : 1
2017-11-20 21:53:00,223 [JobControl] INFO  org.apache.pig.backend.hadoop.executi
onengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-11-20 21:53:00,695 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmit
ter - number of splits:1
2017-11-20 21:53:00,733 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmit
ter - Submitting tokens for job: job_1511229063638_0018
2017-11-20 21:53:00,735 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner -
Job jar is not present. Not adding any jar to the list of resources.
2017-11-20 21:53:00,747 [JobControl] INFO  org.apache.hadoop.yarn.client.api.imp
l.YarnClientImpl - Submitted application application_1511229063638_0018
2017-11-20 21:53:00,749 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The
url to track the job: http://ANI6W129.net.plm.eds.com:8088/proxy/application_15
11229063638_0018/
2017-11-20 21:53:00,750 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1511229063638_0018
2017-11-20 21:53:00,750 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - Processing aliases A
2017-11-20 21:53:00,750 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],A[-1,-1] C:
R:
2017-11-20 21:53:00,755 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - 0% complete
2017-11-20 21:53:00,755 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1511229063638_0018]
2017-11-20 21:53:05,760 [main] WARN  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_
on_failure if you want Pig to stop immediately on failure.
2017-11-20 21:53:05,761 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - job job_1511229063638_0018 has failed! Sto
p running all dependent jobs
2017-11-20 21:53:05,761 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - 100% complete
2017-11-20 21:53:05,762 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Con
necting to ResourceManager at /0.0.0.0:8032
2017-11-20 21:53:05,778 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Con
necting to ResourceManager at /0.0.0.0:8032
2017-11-20 21:53:05,793 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRP
igStatsUtil - 1 map reduce job(s) failed!
2017-11-20 21:53:05,794 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.Sim
plePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features

2.8.2   0.17.0  wangku  2017-11-20 21:52:59     2017-11-20 21:53:05     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1511229063638_0018  A       MAP_ONLY        Message: Job failed!    hdfs://0
.0.0.0:19000/tmp/temp-1368998709/tmp575020783,

Input(s):
Failed to read data from "hdfs://0.0.0.0:19000/user/wangku/inputs/input3.csv"

Output(s):
Failed to produce result in "hdfs://0.0.0.0:19000/tmp/temp-1368998709/tmp5750207
83"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1511229063638_0018


2017-11-20 21:53:05,796 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - Failed!
2017-11-20 21:53:05,797 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 10
66: Unable to open iterator for alias A
Details at logfile: D:\workdir\Hadoop\hadoop-2.8.2\logs\pig_1511232110551.log
grunt>

Thanks for any help!

-Kun