You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Matthew Smith <Ma...@g2-inc.com> on 2010/08/05 00:07:25 UTC

LIMIT Issue

Hey,

 

While running in Java a LIMIT statement is not getting executed.

 

/code 

                        myServer.registerQuery("flow_firstcut = FOREACH
data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

                        myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches 'someIP';");

                        

                        myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");

                        

                        myServer.registerQuery("topTen = LIMIT O 10;");

 

                        myServer.store("topTen", outputFilePath);

 

/code 

 

This produces a 699 line file. It should produce a 10 line file.

 

/code

                        registerQuery("flow_firstcut = FOREACH data
GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

                        myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches '"+parameters[1]+"';");

                        

                        //myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");

                        

                        myServer.registerQuery("topTen = LIMIT filtered
10;");

 

                        myServer.store("topTen", outputFilePath);

/code

 

This produces a 10 line file. 

 

Is there a known bug I am unaware of or can you not order then limit? 

http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT 

indicates that this is a valid sequence of calls.

 

Help?

 

Matt

Re: LIMIT Issue

Posted by Ashutosh Chauhan <as...@gmail.com>.

It looks like a bug then. Do you have a script and small enough
dataset which you can upload on jira which reproduces the issue. If
so, go ahead and create a jira ticket with script and data. Are you
using local mode or mapreduce mode ?

Ashutosh
On Fri, Aug 6, 2010 at 07:16, Matthew Smith <Ma...@g2-inc.com> wrote:
> B is not empty:
> (58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA)
> (58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA)
> (58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA)
> (58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA)
> (58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA)
> (58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA)
> (58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA)
> (58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA)
> (58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA)
> (58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA)
> (58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA)
> (58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA)
> (58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA)
> (58.72.19.26, 58.72.19.26,69740,21873,6,322, S A)
> (58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA)
> (58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA)
> (58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA)
> (58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA)
> (58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA)
> (58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A)
> (58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA)
> (58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA)
> (58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA)
>
>
> But after I do:
>> grunt> C = ORDER B BY bytes DESC;
>> grunt> Dump C;
>
> I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>
>
> Which would lead me to believe my ORDER is broken. Is there a conf I need to change?
>
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Friday, August 06, 2010 2:43 AM
> To: Matthew Smith
> Cc: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> This is most likely because B is empty. do
>
> grunt> dump A; -- to verify data is getting loaded as you are expecting.
> grunt> dump B; -- to verify that B is non-empty.
>
> Ashutosh
>
> On Thu, Aug 5, 2010 at 14:54, Matthew Smith <Ma...@g2-inc.com> wrote:
>> While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
>>
>> I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
>>
>> Thoughts?
>>
>>
>> grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
>> grunt> B = FILTER A BY sIP matches '61.81.46.45';
>> grunt> C = ORDER B BY bytes DESC;
>> grunt> D = LIMIT C 10;
>> grunt> DUMP D;
>>
>>
>>
>>
>> 2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
>> 2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
>> 2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
>> 2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
>> 2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
>> 2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
>> 2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>> 2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>> 2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>> 2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>> 2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
>> 2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
>> 2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
>> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
>> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
>> 2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
>> 2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>> 2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>> 2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>> 2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
>> 2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
>> 2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
>> 2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
>> 2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
>> 2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
>> 2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
>> 2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
>> 2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
>> 2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
>> 2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
>> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
>> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
>> 2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
>> 2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
>> 2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
>> 2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>> 2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>> 2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>> 2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
>> 2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
>> 2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
>> 2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
>> 2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
>> 2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
>> 2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
>> 2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
>> java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>>        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>>        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
>>        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>>        at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>>        at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>>        ... 6 more
>> 2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>> 2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
>> 2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
>> 2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
>> 2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
>> 2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
>> Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log
>>
>> -----Original Message-----
>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>> Sent: Thursday, August 05, 2010 3:10 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: LIMIT Issue
>>
>> To cut down on the problem space, can you try your query on grunt. If
>> it works there, problem would be something to do with PigServer, else
>> its related to Pig core itself.
>>
>> Ashutosh
>> On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
>>> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>>>
>>> -----Original Message-----
>>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>>> Sent: Thursday, August 05, 2010 12:54 PM
>>> To: pig-user@hadoop.apache.org
>>> Subject: Re: LIMIT Issue
>>>
>>> Matt,
>>>
>>> Which version you are on? What happens if you run your query through
>>> grunt instead of PigServer?
>>> I tried load-order-limit sequence on a small dataset on grunt and I
>>> got expected results.
>>>
>>> Ashutosh
>>> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>>>> Hey,
>>>>
>>>>
>>>>
>>>> While running in Java a LIMIT statement is not getting executed.
>>>>
>>>>
>>>>
>>>> /code
>>>>
>>>>                        myServer.registerQuery("flow_firstcut = FOREACH
>>>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>>
>>>>                        myServer.registerQuery("filtered = FILTER
>>>> flow_firstcut BY sIP matches 'someIP';");
>>>>
>>>>
>>>>
>>>>                        myServer.registerQuery("O = ORDER filtered BY
>>>> bytes DESC;");
>>>>
>>>>
>>>>
>>>>                        myServer.registerQuery("topTen = LIMIT O 10;");
>>>>
>>>>
>>>>
>>>>                        myServer.store("topTen", outputFilePath);
>>>>
>>>>
>>>>
>>>> /code
>>>>
>>>>
>>>>
>>>> This produces a 699 line file. It should produce a 10 line file.
>>>>
>>>>
>>>>
>>>> /code
>>>>
>>>>                        registerQuery("flow_firstcut = FOREACH data
>>>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>>
>>>>                        myServer.registerQuery("filtered = FILTER
>>>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>>>
>>>>
>>>>
>>>>                        //myServer.registerQuery("O = ORDER filtered BY
>>>> bytes DESC;");
>>>>
>>>>
>>>>
>>>>                        myServer.registerQuery("topTen = LIMIT filtered
>>>> 10;");
>>>>
>>>>
>>>>
>>>>                        myServer.store("topTen", outputFilePath);
>>>>
>>>> /code
>>>>
>>>>
>>>>
>>>> This produces a 10 line file.
>>>>
>>>>
>>>>
>>>> Is there a known bug I am unaware of or can you not order then limit?
>>>>
>>>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>>>
>>>> indicates that this is a valid sequence of calls.
>>>>
>>>>
>>>>
>>>> Help?
>>>>
>>>>
>>>>
>>>> Matt
>>>>
>>>>
>>>
>>
>

RE: LIMIT Issue

Posted by Matthew Smith <Ma...@g2-inc.com>.

B is not empty: 
(58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA)
(58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA)
(58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA)
(58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA)
(58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA)
(58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA)
(58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA)
(58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA)
(58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA)
(58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA)
(58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA)
(58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA)
(58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA)
(58.72.19.26, 58.72.19.26,69740,21873,6,322, S A)
(58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA)
(58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA)
(58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA)
(58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA)
(58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA)
(58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A)
(58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA)
(58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA)
(58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA)


But after I do:
> grunt> C = ORDER B BY bytes DESC;
> grunt> Dump C;

I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160


Which would lead me to believe my ORDER is broken. Is there a conf I need to change? 


-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com] 
Sent: Friday, August 06, 2010 2:43 AM
To: Matthew Smith
Cc: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

This is most likely because B is empty. do

grunt> dump A; -- to verify data is getting loaded as you are expecting.
grunt> dump B; -- to verify that B is non-empty.

Ashutosh

On Thu, Aug 5, 2010 at 14:54, Matthew Smith <Ma...@g2-inc.com> wrote:
> While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
>
> I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
>
> Thoughts?
>
>
> grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
> grunt> B = FILTER A BY sIP matches '61.81.46.45';
> grunt> C = ORDER B BY bytes DESC;
> grunt> D = LIMIT C 10;
> grunt> DUMP D;
>
>
>
>
> 2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
> 2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
> 2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
> 2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
> 2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
> 2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
> 2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
> 2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
> 2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
> 2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
> 2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
> 2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
> 2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
> 2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
> 2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
> 2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
> 2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
> 2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
> java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
>        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>        at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>        at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>        ... 6 more
> 2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
> 2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
> 2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
> 2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
> 2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
> Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 3:10 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> To cut down on the problem space, can you try your query on grunt. If
> it works there, problem would be something to do with PigServer, else
> its related to Pig core itself.
>
> Ashutosh
> On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
>> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>>
>> -----Original Message-----
>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>> Sent: Thursday, August 05, 2010 12:54 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: LIMIT Issue
>>
>> Matt,
>>
>> Which version you are on? What happens if you run your query through
>> grunt instead of PigServer?
>> I tried load-order-limit sequence on a small dataset on grunt and I
>> got expected results.
>>
>> Ashutosh
>> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>>> Hey,
>>>
>>>
>>>
>>> While running in Java a LIMIT statement is not getting executed.
>>>
>>>
>>>
>>> /code
>>>
>>>                        myServer.registerQuery("flow_firstcut = FOREACH
>>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>>                        myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches 'someIP';");
>>>
>>>
>>>
>>>                        myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>>                        myServer.registerQuery("topTen = LIMIT O 10;");
>>>
>>>
>>>
>>>                        myServer.store("topTen", outputFilePath);
>>>
>>>
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 699 line file. It should produce a 10 line file.
>>>
>>>
>>>
>>> /code
>>>
>>>                        registerQuery("flow_firstcut = FOREACH data
>>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>>                        myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>>
>>>
>>>
>>>                        //myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>>                        myServer.registerQuery("topTen = LIMIT filtered
>>> 10;");
>>>
>>>
>>>
>>>                        myServer.store("topTen", outputFilePath);
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 10 line file.
>>>
>>>
>>>
>>> Is there a known bug I am unaware of or can you not order then limit?
>>>
>>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>>
>>> indicates that this is a valid sequence of calls.
>>>
>>>
>>>
>>> Help?
>>>
>>>
>>>
>>> Matt
>>>
>>>
>>
>

Re: LIMIT Issue

Posted by Ashutosh Chauhan <as...@gmail.com>.

This is most likely because B is empty. do

grunt> dump A; -- to verify data is getting loaded as you are expecting.
grunt> dump B; -- to verify that B is non-empty.

Ashutosh

On Thu, Aug 5, 2010 at 14:54, Matthew Smith <Ma...@g2-inc.com> wrote:
> While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
>
> I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
>
> Thoughts?
>
>
> grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
> grunt> B = FILTER A BY sIP matches '61.81.46.45';
> grunt> C = ORDER B BY bytes DESC;
> grunt> D = LIMIT C 10;
> grunt> DUMP D;
>
>
>
>
> 2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
> 2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
> 2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
> 2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
> 2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
> 2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
> 2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
> 2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
> 2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
> 2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
> 2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
> 2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
> 2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
> 2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
> 2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
> 2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
> 2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
> 2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
> 2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
> 2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
> 2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
> 2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
> 2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
> 2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
> 2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
> 2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
> java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
>        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
>        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
>        at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>        at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>        ... 6 more
> 2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
> 2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
> 2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
> 2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
> 2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
> Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 3:10 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> To cut down on the problem space, can you try your query on grunt. If
> it works there, problem would be something to do with PigServer, else
> its related to Pig core itself.
>
> Ashutosh
> On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
>> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>>
>> -----Original Message-----
>> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
>> Sent: Thursday, August 05, 2010 12:54 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: LIMIT Issue
>>
>> Matt,
>>
>> Which version you are on? What happens if you run your query through
>> grunt instead of PigServer?
>> I tried load-order-limit sequence on a small dataset on grunt and I
>> got expected results.
>>
>> Ashutosh
>> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>>> Hey,
>>>
>>>
>>>
>>> While running in Java a LIMIT statement is not getting executed.
>>>
>>>
>>>
>>> /code
>>>
>>>                        myServer.registerQuery("flow_firstcut = FOREACH
>>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>>                        myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches 'someIP';");
>>>
>>>
>>>
>>>                        myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>>                        myServer.registerQuery("topTen = LIMIT O 10;");
>>>
>>>
>>>
>>>                        myServer.store("topTen", outputFilePath);
>>>
>>>
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 699 line file. It should produce a 10 line file.
>>>
>>>
>>>
>>> /code
>>>
>>>                        registerQuery("flow_firstcut = FOREACH data
>>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>>
>>>                        myServer.registerQuery("filtered = FILTER
>>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>>
>>>
>>>
>>>                        //myServer.registerQuery("O = ORDER filtered BY
>>> bytes DESC;");
>>>
>>>
>>>
>>>                        myServer.registerQuery("topTen = LIMIT filtered
>>> 10;");
>>>
>>>
>>>
>>>                        myServer.store("topTen", outputFilePath);
>>>
>>> /code
>>>
>>>
>>>
>>> This produces a 10 line file.
>>>
>>>
>>>
>>> Is there a known bug I am unaware of or can you not order then limit?
>>>
>>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>>
>>> indicates that this is a valid sequence of calls.
>>>
>>>
>>>
>>> Help?
>>>
>>>
>>>
>>> Matt
>>>
>>>
>>
>

RE: LIMIT Issue

Posted by Matthew Smith <Ma...@g2-inc.com>.

While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.

I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed. 

Thoughts?


grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
grunt> B = FILTER A BY sIP matches '61.81.46.45';
grunt> C = ORDER B BY bytes DESC;
grunt> D = LIMIT C 10;
grunt> DUMP D;




2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner - 
2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner - 
2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - 
2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - 
2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - 
2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - 
2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
	at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
	at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
	... 6 more
2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log

-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com] 
Sent: Thursday, August 05, 2010 3:10 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

To cut down on the problem space, can you try your query on grunt. If
it works there, problem would be something to do with PigServer, else
its related to Pig core itself.

Ashutosh
On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 12:54 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> Matt,
>
> Which version you are on? What happens if you run your query through
> grunt instead of PigServer?
> I tried load-order-limit sequence on a small dataset on grunt and I
> got expected results.
>
> Ashutosh
> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>> Hey,
>>
>>
>>
>> While running in Java a LIMIT statement is not getting executed.
>>
>>
>>
>> /code
>>
>>                        myServer.registerQuery("flow_firstcut = FOREACH
>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>>                        myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches 'someIP';");
>>
>>
>>
>>                        myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>>                        myServer.registerQuery("topTen = LIMIT O 10;");
>>
>>
>>
>>                        myServer.store("topTen", outputFilePath);
>>
>>
>>
>> /code
>>
>>
>>
>> This produces a 699 line file. It should produce a 10 line file.
>>
>>
>>
>> /code
>>
>>                        registerQuery("flow_firstcut = FOREACH data
>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>>                        myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>
>>
>>
>>                        //myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>>                        myServer.registerQuery("topTen = LIMIT filtered
>> 10;");
>>
>>
>>
>>                        myServer.store("topTen", outputFilePath);
>>
>> /code
>>
>>
>>
>> This produces a 10 line file.
>>
>>
>>
>> Is there a known bug I am unaware of or can you not order then limit?
>>
>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>
>> indicates that this is a valid sequence of calls.
>>
>>
>>
>> Help?
>>
>>
>>
>> Matt
>>
>>
>

Re: LIMIT Issue

Posted by Ashutosh Chauhan <as...@gmail.com>.

To cut down on the problem space, can you try your query on grunt. If
it works there, problem would be something to do with PigServer, else
its related to Pig core itself.

Ashutosh
On Thu, Aug 5, 2010 at 10:57, Matthew Smith <Ma...@g2-inc.com> wrote:
> No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com]
> Sent: Thursday, August 05, 2010 12:54 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: LIMIT Issue
>
> Matt,
>
> Which version you are on? What happens if you run your query through
> grunt instead of PigServer?
> I tried load-order-limit sequence on a small dataset on grunt and I
> got expected results.
>
> Ashutosh
> On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
>> Hey,
>>
>>
>>
>> While running in Java a LIMIT statement is not getting executed.
>>
>>
>>
>> /code
>>
>>                        myServer.registerQuery("flow_firstcut = FOREACH
>> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>>                        myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches 'someIP';");
>>
>>
>>
>>                        myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>>                        myServer.registerQuery("topTen = LIMIT O 10;");
>>
>>
>>
>>                        myServer.store("topTen", outputFilePath);
>>
>>
>>
>> /code
>>
>>
>>
>> This produces a 699 line file. It should produce a 10 line file.
>>
>>
>>
>> /code
>>
>>                        registerQuery("flow_firstcut = FOREACH data
>> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>>
>>                        myServer.registerQuery("filtered = FILTER
>> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>>
>>
>>
>>                        //myServer.registerQuery("O = ORDER filtered BY
>> bytes DESC;");
>>
>>
>>
>>                        myServer.registerQuery("topTen = LIMIT filtered
>> 10;");
>>
>>
>>
>>                        myServer.store("topTen", outputFilePath);
>>
>> /code
>>
>>
>>
>> This produces a 10 line file.
>>
>>
>>
>> Is there a known bug I am unaware of or can you not order then limit?
>>
>> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>>
>> indicates that this is a valid sequence of calls.
>>
>>
>>
>> Help?
>>
>>
>>
>> Matt
>>
>>
>

RE: LIMIT Issue

Posted by Matthew Smith <Ma...@g2-inc.com>.

No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com] 
Sent: Thursday, August 05, 2010 12:54 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

Matt,

Which version you are on? What happens if you run your query through
grunt instead of PigServer?
I tried load-order-limit sequence on a small dataset on grunt and I
got expected results.

Ashutosh
On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
> Hey,
>
>
>
> While running in Java a LIMIT statement is not getting executed.
>
>
>
> /code
>
>                        myServer.registerQuery("flow_firstcut = FOREACH
> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
>                        myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches 'someIP';");
>
>
>
>                        myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
>                        myServer.registerQuery("topTen = LIMIT O 10;");
>
>
>
>                        myServer.store("topTen", outputFilePath);
>
>
>
> /code
>
>
>
> This produces a 699 line file. It should produce a 10 line file.
>
>
>
> /code
>
>                        registerQuery("flow_firstcut = FOREACH data
> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
>                        myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>
>
>
>                        //myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
>                        myServer.registerQuery("topTen = LIMIT filtered
> 10;");
>
>
>
>                        myServer.store("topTen", outputFilePath);
>
> /code
>
>
>
> This produces a 10 line file.
>
>
>
> Is there a known bug I am unaware of or can you not order then limit?
>
> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>
> indicates that this is a valid sequence of calls.
>
>
>
> Help?
>
>
>
> Matt
>
>

Re: LIMIT Issue

Posted by Ashutosh Chauhan <as...@gmail.com>.

Matt,

Which version you are on? What happens if you run your query through
grunt instead of PigServer?
I tried load-order-limit sequence on a small dataset on grunt and I
got expected results.

Ashutosh
On Wed, Aug 4, 2010 at 15:07, Matthew Smith <Ma...@g2-inc.com> wrote:
> Hey,
>
>
>
> While running in Java a LIMIT statement is not getting executed.
>
>
>
> /code
>
>                        myServer.registerQuery("flow_firstcut = FOREACH
> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
>                        myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches 'someIP';");
>
>
>
>                        myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
>                        myServer.registerQuery("topTen = LIMIT O 10;");
>
>
>
>                        myServer.store("topTen", outputFilePath);
>
>
>
> /code
>
>
>
> This produces a 699 line file. It should produce a 10 line file.
>
>
>
> /code
>
>                        registerQuery("flow_firstcut = FOREACH data
> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");
>
>                        myServer.registerQuery("filtered = FILTER
> flow_firstcut BY sIP matches '"+parameters[1]+"';");
>
>
>
>                        //myServer.registerQuery("O = ORDER filtered BY
> bytes DESC;");
>
>
>
>                        myServer.registerQuery("topTen = LIMIT filtered
> 10;");
>
>
>
>                        myServer.store("topTen", outputFilePath);
>
> /code
>
>
>
> This produces a 10 line file.
>
>
>
> Is there a known bug I am unaware of or can you not order then limit?
>
> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT
>
> indicates that this is a valid sequence of calls.
>
>
>
> Help?
>
>
>
> Matt
>
>