You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Matthew Smith <Ma...@g2-inc.com> on 2010/08/18 20:31:45 UTC

ORDER Issue

All,

 

I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
occurred. I run these commands in a script file:

 

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

fail = ORDER target BY bytes DESC;

not_reached = LIMIT fail 10;

dump not_reached;

 

The error is listed below. I then run:

 

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

dump target;

 

This script produces a large list of sips matching the filter.  What am
I doing wrong that causes pig to not want to ORDER these properly? I
have been wrestling with this issue for a week now. Any help would be
greatly appreciated.

 

Best,

Matthew

 

/ERROR

 

10/08/18 11:24:15 INFO pig.Main: Logging error messages to:
/home/matt/workspace/pig-0.7.0/cloudComputingPrototypes/pig_128215585580
9.log

2010-08-18 11:24:16,000 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///

2010-08-18 11:24:16,338 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column
pruned for start

2010-08-18 11:24:16,338 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys
pruned for start

2010-08-18 11:24:16,396 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=

2010-08-18 11:24:16,470 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
Store(file:/tmp/temp-2049115517/tmp197746350:org.apache.pig.builtin.BinS
torage) - 1-74 Operator Key: 1-74)

2010-08-18 11:24:16,567 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryO
ptimizer - MR plan size before optimization: 3

2010-08-18 11:24:16,567 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryO
ptimizer - MR plan size after optimization: 3

2010-08-18 11:24:16,577 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:16,581 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:16,581 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlC
ompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3

2010-08-18 11:24:17,828 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlC
ompiler - Setting up single store job

2010-08-18 11:24:17,858 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:17,859 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 1 map-reduce job(s) waiting for submission.

2010-08-18 11:24:17,863 [Thread-4] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

2010-08-18 11:24:18,000 [Thread-4] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,010 [Thread-4] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,027 [Thread-4] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1

2010-08-18 11:24:18,027 [Thread-4] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

2010-08-18 11:24:18,237 [Thread-13] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,241 [Thread-13] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1

2010-08-18 11:24:18,241 [Thread-13] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

2010-08-18 11:24:18,302 [Thread-13] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,307 [Thread-13] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,312 [Thread-13] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,317 [Thread-13] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,360 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - HadoopJobId: job_local_0001

2010-08-18 11:24:18,360 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 0% complete

2010-08-18 11:24:18,900 [Thread-13] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0
is done. And is in the process of commiting

2010-08-18 11:24:18,902 [Thread-13] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,903 [Thread-13] INFO
org.apache.hadoop.mapred.LocalJobRunner - 

2010-08-18 11:24:18,903 [Thread-13] INFO
org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0
is allowed to commit now

2010-08-18 11:24:18,904 [Thread-13] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:18,906 [Thread-13] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved
output of task 'attempt_local_0001_m_000000_0' to
file:/tmp/temp-2049115517/tmp-44482827

2010-08-18 11:24:18,906 [Thread-13] INFO
org.apache.hadoop.mapred.LocalJobRunner - 

2010-08-18 11:24:18,907 [Thread-13] INFO
org.apache.hadoop.mapred.TaskRunner - Task
'attempt_local_0001_m_000000_0' done.

2010-08-18 11:24:23,370 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 33% complete

2010-08-18 11:24:23,370 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:23,371 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlC
ompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3

2010-08-18 11:24:24,500 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlC
ompiler - Setting up single store job

2010-08-18 11:24:24,526 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:24,526 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 1 map-reduce job(s) waiting for submission.

2010-08-18 11:24:24,527 [Thread-17] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

2010-08-18 11:24:24,630 [Thread-17] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:24,635 [Thread-17] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:24,640 [Thread-17] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1

2010-08-18 11:24:24,641 [Thread-17] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

2010-08-18 11:24:24,785 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:24,787 [Thread-26] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1

2010-08-18 11:24:24,787 [Thread-26] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

2010-08-18 11:24:24,821 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:24,825 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:24,838 [Thread-26] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb = 100

2010-08-18 11:24:26,106 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - HadoopJobId: job_local_0002

2010-08-18 11:24:26,168 [Thread-26] INFO
org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720

2010-08-18 11:24:26,168 [Thread-26] INFO
org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680

2010-08-18 11:24:26,362 [Thread-26] INFO
org.apache.hadoop.mapred.MapTask - Starting flush of map output

2010-08-18 11:24:26,657 [Thread-26] INFO
org.apache.hadoop.mapred.MapTask - Finished spill 0

2010-08-18 11:24:26,661 [Thread-26] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0
is done. And is in the process of commiting

2010-08-18 11:24:26,661 [Thread-26] INFO
org.apache.hadoop.mapred.LocalJobRunner - 

2010-08-18 11:24:26,661 [Thread-26] INFO
org.apache.hadoop.mapred.TaskRunner - Task
'attempt_local_0002_m_000000_0' done.

2010-08-18 11:24:26,669 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:26,674 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:26,675 [Thread-26] INFO
org.apache.hadoop.mapred.LocalJobRunner - 

2010-08-18 11:24:26,681 [Thread-26] INFO
org.apache.hadoop.mapred.Merger - Merging 1 sorted segments

2010-08-18 11:24:26,746 [Thread-26] INFO
org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1
segments left of total size: 3202 bytes

2010-08-18 11:24:26,747 [Thread-26] INFO
org.apache.hadoop.mapred.LocalJobRunner - 

2010-08-18 11:24:26,752 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:26,955 [Thread-26] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0
is done. And is in the process of commiting

2010-08-18 11:24:26,956 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:26,957 [Thread-26] INFO
org.apache.hadoop.mapred.LocalJobRunner - 

2010-08-18 11:24:26,957 [Thread-26] INFO
org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0
is allowed to commit now

2010-08-18 11:24:26,958 [Thread-26] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:26,959 [Thread-26] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved
output of task 'attempt_local_0002_r_000000_0' to
file:/tmp/temp-2049115517/tmp-2112064820

2010-08-18 11:24:26,961 [Thread-26] INFO
org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce

2010-08-18 11:24:26,961 [Thread-26] INFO
org.apache.hadoop.mapred.TaskRunner - Task
'attempt_local_0002_r_000000_0' done.

2010-08-18 11:24:30,114 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 66% complete

2010-08-18 11:24:30,115 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:30,115 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlC
ompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3

2010-08-18 11:24:31,408 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlC
ompiler - Setting up single store job

2010-08-18 11:24:31,507 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:31,507 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 1 map-reduce job(s) waiting for submission.

2010-08-18 11:24:31,612 [Thread-32] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

2010-08-18 11:24:31,840 [Thread-32] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:31,843 [Thread-32] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:31,845 [Thread-32] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1

2010-08-18 11:24:31,845 [Thread-32] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

2010-08-18 11:24:32,071 [Thread-41] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:32,074 [Thread-41] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1

2010-08-18 11:24:32,074 [Thread-41] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

2010-08-18 11:24:32,075 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - HadoopJobId: job_local_0003

2010-08-18 11:24:32,155 [Thread-41] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:32,160 [Thread-41] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:32,160 [Thread-41] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb = 100

2010-08-18 11:24:32,491 [Thread-41] INFO
org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720

2010-08-18 11:24:32,491 [Thread-41] INFO
org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680

2010-08-18 11:24:32,926 [Thread-41] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:32,948 [Thread-41] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0003

java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/user/matt/pigsample_24118161_1282155871461

                at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)

                at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

                at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
117)

                at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
527)

                at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)

                at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

                at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/user/matt/pigsample_24118161_1282155871461

                at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
utFormat.java:224)

                at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
tFormat.listStatus(PigFileInputFormat.java:37)

                at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
tFormat.java:241)

                at
org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

                at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)

                at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)

                ... 6 more

2010-08-18 11:24:37,089 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 100% complete

2010-08-18 11:24:37,089 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - 1 map reduce job(s) failed!

2010-08-18 11:24:37,090 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - Failed to produce result in:
"file:/tmp/temp-2049115517/tmp197746350"

2010-08-18 11:24:37,090 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLa
uncher - Some jobs have failed! Stop running all dependent jobs

2010-08-18 11:24:37,127 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized

2010-08-18 11:24:37,140 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1066: Unable to open iterator for alias not_reached

Details at logfile:
/home/matt/workspace/pig-0.7.0/cloudComputingPrototypes/pig_128215585580
9.log