You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Matthew Smith <Ma...@g2-inc.com> on 2010/08/05 00:07:25 UTC
LIMIT Issue
Hey,
While running in Java a LIMIT statement is not getting executed.
/code
myServer.registerQuery("flow_firstcut = FOREACH
data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches 'someIP';");
myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");
myServer.registerQuery("topTen = LIMIT O 10;");
myServer.store("topTen", outputFilePath);
/code
This produces a 699 line file. It should produce a 10 line file.
/code
registerQuery("flow_firstcut = FOREACH data
GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches '"+parameters[1]+"';");
//myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");
myServer.registerQuery("topTen = LIMIT filtered
10;");
myServer.store("topTen", outputFilePath);
/code
This produces a 10 line file.
Is there a known bug I am unaware of or can you not order then limit?
http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
indicates that this is a valid sequence of calls.
Help?
Matt
Re: LIMIT Issue
Posted by Ashutosh Chauhan <as...@gmail.com>.
It looks like a bug then. Do you have a script and small enough
dataset which you can upload on jira which reproduces the issue. If
so, go ahead and create a jira ticket with script and data. Are you
using local mode or mapreduce mode ?
Ashutosh
On Fri, Aug 6, 2010 at 07:16, Matthew Smith <Ma...@g2-inc.com> wrote:
> B is not empty:
> (58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA)
> (58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA)
> (58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA)
> (58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA)
> (58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA)
> (58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA)
> (58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA)
> (58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA)
> (58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA)
> (58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA)
> (58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA)
> (58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA)
> (58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA)
> (58.72.19.26, 58.72.19.26,69740,21873,6,322, S A)
> (58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA)
> (58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA)
> (58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA)
> (58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA)
> (58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA)
> (58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A)
> (58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA)
> (58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA)
> (58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA)
>
>
> But after I do:
>> grunt> C = ORDER B BY bytes DESC;
>> grunt> Dump C;
>
> I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>
>
> Which would lead me to believe my ORDER is broken. Is there a conf I need to change?
>
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Friday, August 06, 2010 2:43 AM
> To: Matthew Smith
> Cc: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> This is most likely because B is empty. do
>
> grunt> dump A; -- to verify data is getting loaded as you are expecting.
> grunt> dump B; -- to verify that B is non-empty.
>
> Ashutosh
>
> On Thu, Aug 5, 2010 at 14:54, Matthew Smith <Ma...@g2-inc.com> wrote:
>> While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
>>
>> I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
>>
>> Thoughts?
>>
>>
>> grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
>> grunt> B = FILTER A BY sIP matches '61.81.46.45';
>> grunt> C = ORDER B BY bytes DESC;
>> grunt> D = LIMIT C 10;
>> grunt> DUMP D;
>>
>>
>>
>>
>> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
>> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
>> 2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
>> 2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
>> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
>> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
>> 2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>> 2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>> 2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>> 2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>> 2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:47:54,821 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,827 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,839 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,841 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:55,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
>> 2010-08-05 14:47:56,352 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
>> 2010-08-05 14:47:56,354 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
>> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
>> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
>> 2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
>> 2010-08-05 14:47:59,754 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>> 2010-08-05 14:48:00,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>> 2010-08-05 14:48:00,890 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:00,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>> 2010-08-05 14:48:00,891 [Thread-18] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 2010-08-05 14:48:00,999 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,003 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:01,155 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:01,189 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,192 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,209 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
>> 2010-08-05 14:48:01,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
>> 2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
>> 2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
>> 2010-08-05 14:48:02,369 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
>> 2010-08-05 14:48:02,752 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
>> 2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
>> 2010-08-05 14:48:02,761 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:02,796 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
>> 2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
>> 2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:02,935 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:03,023 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
>> 2010-08-05 14:48:03,025 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
>> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:03,029 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
>> 2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
>> 2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
>> 2010-08-05 14:48:06,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
>> 2010-08-05 14:48:06,432 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:06,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>> 2010-08-05 14:48:08,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>> 2010-08-05 14:48:08,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>> 2010-08-05 14:48:08,197 [Thread-33] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 2010-08-05 14:48:08,475 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,478 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:08,792 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:09,024 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:09,027 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:09,028 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
>> 2010-08-05 14:48:09,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
>> 2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
>> 2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
>> 2010-08-05 14:48:09,479 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:09,491 [Thread-42] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
>> java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>> at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
>> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>> at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>> at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>> ... 6 more
>> 2010-08-05 14:48:13,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>> 2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
>> 2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
>> 2010-08-05 14:48:13,803 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
>> 2010-08-05 14:48:13,811 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
>> Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log
>>
>> -----Original Message-----
>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>> Sent: Thursday, August 05, 2010 3:10 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: LIMIT Issue
>>
>> To cut down on the problem space, can you try your query on grunt. If
>> it works there, problem would be something to do with PigServer, else
>> its related to Pig core itself.
>>
>> Ashutosh
>> On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
>>> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>>>
>>> -----Original Message-----
>>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>>> Sent: Thursday, August 05, 2010 12:54 PM
>>> To: pig-user@hadoop.apache.org
>>> Subject: Re: LIMIT Issue
>>>
>>> Matt,
>>>
>>> Which version you are on? What happens if you run your query through
>>> grunt instead of PigServer?
>>> I tried load-order-limit sequence on a small dataset on grunt and I
>>> got expected results.
>>>
>>> Ashutosh
>>> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>>>> Hey,
>>>>
>>>>
>>>>
>>>> While running in Java a LIMIT statement is not getting executed.
>>>>
>>>>
>>>>
>>>> /code
>>>>
>>>> myServer.registerQuery("flow_firstcut = FOREACH
>>>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>>
>>>> myServer.registerQuery("filtered = FILTER
>>>> flow_firstcut BY sIP matches 'someIP';");
>>>>
>>>>
>>>>
>>>> myServer.registerQuery("O = ORDER filtered BY
>>>> bytes DESC;");
>>>>
>>>>
>>>>
>>>> myServer.registerQuery("topTen = LIMIT O 10;");
>>>>
>>>>
>>>>
>>>> myServer.store("topTen", outputFilePath);
>>>>
>>>>
>>>>
>>>> /code
>>>>
>>>>
>>>>
>>>> This produces a 699 line file. It should produce a 10 line file.
>>>>
>>>>
>>>>
>>>> /code
>>>>
>>>> registerQuery("flow_firstcut = FOREACH data
>>>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>>
>>>> myServer.registerQuery("filtered = FILTER
>>>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>>>
>>>>
>>>>
>>>> //myServer.registerQuery("O = ORDER filtered BY
>>>> bytes DESC;");
>>>>
>>>>
>>>>
>>>> myServer.registerQuery("topTen = LIMIT filtered
>>>> 10;");
>>>>
>>>>
>>>>
>>>> myServer.store("topTen", outputFilePath);
>>>>
>>>> /code
>>>>
>>>>
>>>>
>>>> This produces a 10 line file.
>>>>
>>>>
>>>>
>>>> Is there a known bug I am unaware of or can you not order then limit?
>>>>
>>>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>>>
>>>> indicates that this is a valid sequence of calls.
>>>>
>>>>
>>>>
>>>> Help?
>>>>
>>>>
>>>>
>>>> Matt
>>>>
>>>>
>>>
>>
>
RE: LIMIT Issue
Posted by Matthew Smith <Ma...@g2-inc.com>.
B is not empty:
(58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA)
(58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA)
(58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA)
(58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA)
(58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA)
(58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA)
(58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA)
(58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA)
(58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA)
(58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA)
(58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA)
(58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA)
(58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA)
(58.72.19.26, 58.72.19.26,69740,21873,6,322, S A)
(58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA)
(58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA)
(58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA)
(58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA)
(58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA)
(58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A)
(58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA)
(58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA)
(58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA)
But after I do:
> grunt> C = ORDER B BY bytes DESC;
> grunt> Dump C;
I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
Which would lead me to believe my ORDER is broken. Is there a conf I need to change?
-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
Sent: Friday, August 06, 2010 2:43 AM
To: Matthew Smith
Cc: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue
This is most likely because B is empty. do
grunt> dump A; -- to verify data is getting loaded as you are expecting.
grunt> dump B; -- to verify that B is non-empty.
Ashutosh
On Thu, Aug 5, 2010 at 14:54, Matthew Smith <Ma...@g2-inc.com> wrote:
> While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
>
> I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
>
> Thoughts?
>
>
> grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
> grunt> B = FILTER A BY sIP matches '61.81.46.45';
> grunt> C = ORDER B BY bytes DESC;
> grunt> D = LIMIT C 10;
> grunt> DUMP D;
>
>
>
>
> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
> 2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
> 2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,821 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,827 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,839 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,841 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:55,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
> 2010-08-05 14:47:56,352 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:47:56,354 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
> 2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
> 2010-08-05 14:47:59,754 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:00,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:00,890 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:00,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:00,891 [Thread-18] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:00,999 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,003 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,155 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,189 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,192 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,209 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:01,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
> 2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:02,369 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
> 2010-08-05 14:48:02,752 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
> 2010-08-05 14:48:02,761 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,796 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
> 2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
> 2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,935 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,023 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:03,025 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,029 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
> 2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
> 2010-08-05 14:48:06,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
> 2010-08-05 14:48:06,432 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:06,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:08,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:08,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:08,197 [Thread-33] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:08,475 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,478 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:08,792 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:09,024 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,027 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,028 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:09,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
> 2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:09,479 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,491 [Thread-42] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
> java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
> at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
> ... 6 more
> 2010-08-05 14:48:13,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
> 2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
> 2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
> 2010-08-05 14:48:13,803 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
> 2010-08-05 14:48:13,811 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
> Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 3:10 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> To cut down on the problem space, can you try your query on grunt. If
> it works there, problem would be something to do with PigServer, else
> its related to Pig core itself.
>
> Ashutosh
> On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
>> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>>
>> -----Original Message-----
>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>> Sent: Thursday, August 05, 2010 12:54 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: LIMIT Issue
>>
>> Matt,
>>
>> Which version you are on? What happens if you run your query through
>> grunt instead of PigServer?
>> I tried load-order-limit sequence on a small dataset on grunt and I
>> got expected results.
>>
>> Ashutosh
>> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>>> Hey,
>>>
>>>
>>>
>>> While running in Java a LIMIT statement is not getting executed.
>>>
>>>
>>>
>>> /code
>>>
>>> myServer.registerQuery("flow_firstcut = FOREACH
>>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>> myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches 'someIP';");
>>>
>>>
>>>
>>> myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>> myServer.registerQuery("topTen = LIMIT O 10;");
>>>
>>>
>>>
>>> myServer.store("topTen", outputFilePath);
>>>
>>>
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 699 line file. It should produce a 10 line file.
>>>
>>>
>>>
>>> /code
>>>
>>> registerQuery("flow_firstcut = FOREACH data
>>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>> myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>>
>>>
>>>
>>> //myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>> myServer.registerQuery("topTen = LIMIT filtered
>>> 10;");
>>>
>>>
>>>
>>> myServer.store("topTen", outputFilePath);
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 10 line file.
>>>
>>>
>>>
>>> Is there a known bug I am unaware of or can you not order then limit?
>>>
>>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>>
>>> indicates that this is a valid sequence of calls.
>>>
>>>
>>>
>>> Help?
>>>
>>>
>>>
>>> Matt
>>>
>>>
>>
>
Re: LIMIT Issue
Posted by Ashutosh Chauhan <as...@gmail.com>.
This is most likely because B is empty. do
grunt> dump A; -- to verify data is getting loaded as you are expecting.
grunt> dump B; -- to verify that B is non-empty.
Ashutosh
On Thu, Aug 5, 2010 at 14:54, Matthew Smith <Ma...@g2-inc.com> wrote:
> While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
>
> I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
>
> Thoughts?
>
>
> grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
> grunt> B = FILTER A BY sIP matches '61.81.46.45';
> grunt> C = ORDER B BY bytes DESC;
> grunt> D = LIMIT C 10;
> grunt> DUMP D;
>
>
>
>
> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
> 2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
> 2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,821 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,827 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,839 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,841 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:55,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
> 2010-08-05 14:47:56,352 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:47:56,354 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
> 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
> 2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
> 2010-08-05 14:47:59,754 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:00,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:00,890 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:00,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:00,891 [Thread-18] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:00,999 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,003 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,155 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,189 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,192 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,209 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:01,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
> 2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:02,369 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
> 2010-08-05 14:48:02,752 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
> 2010-08-05 14:48:02,761 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,796 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
> 2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
> 2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,935 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,023 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:03,025 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
> 2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,029 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
> 2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
> 2010-08-05 14:48:06,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
> 2010-08-05 14:48:06,432 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:06,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:08,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:08,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:08,197 [Thread-33] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:08,475 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,478 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:08,792 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:09,024 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,027 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,028 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:09,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
> 2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:09,479 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,491 [Thread-42] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
> java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
> at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
> ... 6 more
> 2010-08-05 14:48:13,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
> 2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
> 2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
> 2010-08-05 14:48:13,803 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
> 2010-08-05 14:48:13,811 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
> Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 3:10 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> To cut down on the problem space, can you try your query on grunt. If
> it works there, problem would be something to do with PigServer, else
> its related to Pig core itself.
>
> Ashutosh
> On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
>> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>>
>> -----Original Message-----
>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>> Sent: Thursday, August 05, 2010 12:54 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: LIMIT Issue
>>
>> Matt,
>>
>> Which version you are on? What happens if you run your query through
>> grunt instead of PigServer?
>> I tried load-order-limit sequence on a small dataset on grunt and I
>> got expected results.
>>
>> Ashutosh
>> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>>> Hey,
>>>
>>>
>>>
>>> While running in Java a LIMIT statement is not getting executed.
>>>
>>>
>>>
>>> /code
>>>
>>> myServer.registerQuery("flow_firstcut = FOREACH
>>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>> myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches 'someIP';");
>>>
>>>
>>>
>>> myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>> myServer.registerQuery("topTen = LIMIT O 10;");
>>>
>>>
>>>
>>> myServer.store("topTen", outputFilePath);
>>>
>>>
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 699 line file. It should produce a 10 line file.
>>>
>>>
>>>
>>> /code
>>>
>>> registerQuery("flow_firstcut = FOREACH data
>>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>> myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>>
>>>
>>>
>>> //myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>> myServer.registerQuery("topTen = LIMIT filtered
>>> 10;");
>>>
>>>
>>>
>>> myServer.store("topTen", outputFilePath);
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 10 line file.
>>>
>>>
>>>
>>> Is there a known bug I am unaware of or can you not order then limit?
>>>
>>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>>
>>> indicates that this is a valid sequence of calls.
>>>
>>>
>>>
>>> Help?
>>>
>>>
>>>
>>> Matt
>>>
>>>
>>
>
RE: LIMIT Issue
Posted by Matthew Smith <Ma...@g2-inc.com>.
While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
Thoughts?
grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
grunt> B = FILTER A BY sIP matches '61.81.46.45';
grunt> C = ORDER B BY bytes DESC;
grunt> D = LIMIT C 10;
grunt> DUMP D;
2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,821 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,827 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,839 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,841 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:55,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2010-08-05 14:47:56,352 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:47:56,354 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2010-08-05 14:47:59,754 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:00,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:00,890 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:00,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:00,891 [Thread-18] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:00,999 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,003 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,155 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,189 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,192 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,209 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:01,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:02,369 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2010-08-05 14:48:02,752 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
2010-08-05 14:48:02,761 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,796 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,935 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,023 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:03,025 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,029 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
2010-08-05 14:48:06,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2010-08-05 14:48:06,432 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:06,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:08,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:08,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:08,197 [Thread-33] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:08,475 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,478 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:08,792 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:09,024 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,027 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,028 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:09,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:09,479 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,491 [Thread-42] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
... 6 more
2010-08-05 14:48:13,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
2010-08-05 14:48:13,803 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2010-08-05 14:48:13,811 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log
-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
Sent: Thursday, August 05, 2010 3:10 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue
To cut down on the problem space, can you try your query on grunt. If
it works there, problem would be something to do with PigServer, else
its related to Pig core itself.
Ashutosh
On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 12:54 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> Matt,
>
> Which version you are on? What happens if you run your query through
> grunt instead of PigServer?
> I tried load-order-limit sequence on a small dataset on grunt and I
> got expected results.
>
> Ashutosh
> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>> Hey,
>>
>>
>>
>> While running in Java a LIMIT statement is not getting executed.
>>
>>
>>
>> /code
>>
>> myServer.registerQuery("flow_firstcut = FOREACH
>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>> myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches 'someIP';");
>>
>>
>>
>> myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>> myServer.registerQuery("topTen = LIMIT O 10;");
>>
>>
>>
>> myServer.store("topTen", outputFilePath);
>>
>>
>>
>> /code
>>
>>
>>
>> This produces a 699 line file. It should produce a 10 line file.
>>
>>
>>
>> /code
>>
>> registerQuery("flow_firstcut = FOREACH data
>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>> myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>
>>
>>
>> //myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>> myServer.registerQuery("topTen = LIMIT filtered
>> 10;");
>>
>>
>>
>> myServer.store("topTen", outputFilePath);
>>
>> /code
>>
>>
>>
>> This produces a 10 line file.
>>
>>
>>
>> Is there a known bug I am unaware of or can you not order then limit?
>>
>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>
>> indicates that this is a valid sequence of calls.
>>
>>
>>
>> Help?
>>
>>
>>
>> Matt
>>
>>
>
Re: LIMIT Issue
Posted by Ashutosh Chauhan <as...@gmail.com>.
To cut down on the problem space, can you try your query on grunt. If
it works there, problem would be something to do with PigServer, else
its related to Pig core itself.
Ashutosh
On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 12:54 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> Matt,
>
> Which version you are on? What happens if you run your query through
> grunt instead of PigServer?
> I tried load-order-limit sequence on a small dataset on grunt and I
> got expected results.
>
> Ashutosh
> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>> Hey,
>>
>>
>>
>> While running in Java a LIMIT statement is not getting executed.
>>
>>
>>
>> /code
>>
>> myServer.registerQuery("flow_firstcut = FOREACH
>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>> myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches 'someIP';");
>>
>>
>>
>> myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>> myServer.registerQuery("topTen = LIMIT O 10;");
>>
>>
>>
>> myServer.store("topTen", outputFilePath);
>>
>>
>>
>> /code
>>
>>
>>
>> This produces a 699 line file. It should produce a 10 line file.
>>
>>
>>
>> /code
>>
>> registerQuery("flow_firstcut = FOREACH data
>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>> myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>
>>
>>
>> //myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>> myServer.registerQuery("topTen = LIMIT filtered
>> 10;");
>>
>>
>>
>> myServer.store("topTen", outputFilePath);
>>
>> /code
>>
>>
>>
>> This produces a 10 line file.
>>
>>
>>
>> Is there a known bug I am unaware of or can you not order then limit?
>>
>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>
>> indicates that this is a valid sequence of calls.
>>
>>
>>
>> Help?
>>
>>
>>
>> Matt
>>
>>
>
RE: LIMIT Issue
Posted by Matthew Smith <Ma...@g2-inc.com>.
No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
Sent: Thursday, August 05, 2010 12:54 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue
Matt,
Which version you are on? What happens if you run your query through
grunt instead of PigServer?
I tried load-order-limit sequence on a small dataset on grunt and I
got expected results.
Ashutosh
On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
> Hey,
>
>
>
> While running in Java a LIMIT statement is not getting executed.
>
>
>
> /code
>
> myServer.registerQuery("flow_firstcut = FOREACH
> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
> myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches 'someIP';");
>
>
>
> myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
> myServer.registerQuery("topTen = LIMIT O 10;");
>
>
>
> myServer.store("topTen", outputFilePath);
>
>
>
> /code
>
>
>
> This produces a 699 line file. It should produce a 10 line file.
>
>
>
> /code
>
> registerQuery("flow_firstcut = FOREACH data
> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
> myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>
>
>
> //myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
> myServer.registerQuery("topTen = LIMIT filtered
> 10;");
>
>
>
> myServer.store("topTen", outputFilePath);
>
> /code
>
>
>
> This produces a 10 line file.
>
>
>
> Is there a known bug I am unaware of or can you not order then limit?
>
> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>
> indicates that this is a valid sequence of calls.
>
>
>
> Help?
>
>
>
> Matt
>
>
Re: LIMIT Issue
Posted by Ashutosh Chauhan <as...@gmail.com>.
Matt,
Which version you are on? What happens if you run your query through
grunt instead of PigServer?
I tried load-order-limit sequence on a small dataset on grunt and I
got expected results.
Ashutosh
On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
> Hey,
>
>
>
> While running in Java a LIMIT statement is not getting executed.
>
>
>
> /code
>
> myServer.registerQuery("flow_firstcut = FOREACH
> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
> myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches 'someIP';");
>
>
>
> myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
> myServer.registerQuery("topTen = LIMIT O 10;");
>
>
>
> myServer.store("topTen", outputFilePath);
>
>
>
> /code
>
>
>
> This produces a 699 line file. It should produce a 10 line file.
>
>
>
> /code
>
> registerQuery("flow_firstcut = FOREACH data
> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
> myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>
>
>
> //myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
> myServer.registerQuery("topTen = LIMIT filtered
> 10;");
>
>
>
> myServer.store("topTen", outputFilePath);
>
> /code
>
>
>
> This produces a 10 line file.
>
>
>
> Is there a known bug I am unaware of or can you not order then limit?
>
> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>
> indicates that this is a valid sequence of calls.
>
>
>
> Help?
>
>
>
> Matt
>
>