You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Matthew Smith <Ma...@g2-inc.com> on 2010/08/19 20:35:57 UTC
ORDER Issue (repost to avoid spam filters)
All,
I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
occurred. I run these commands in a script file:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
fail = ORDER target BY bytes DESC;
not_reached = LIMIT fail 10;
dump not_reached;
The error is listed below. I then run:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
dump target;
This script produces a large list of sips matching the filter. What am
I doing wrong that causes pig to not want to ORDER these properly? I
have been wrestling with this issue for a week now. Any help would be
greatly appreciated.
Best,
Matthew
/ERROR
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/user/matt/pigsample_24118161_1282155871461
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
117)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
527)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/user/matt/pigsample_24118161_1282155871461
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
utFormat.java:224)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
tFormat.listStatus(PigFileInputFormat.java:37)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
tFormat.java:241)
at
org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
... 6 more
Re: ORDER Issue (repost to avoid spam filters)
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Are you using pig local mode ?
If yes, does this work with hadoop ?
Regards,
Mridul
On Friday 20 August 2010 12:05 AM, Matthew Smith wrote:
> All,
>
>
>
> I am running pig-0.7.0 and I have been running into an issue running the
> ORDER command. I have attempted to run pig out of the box on 2 separate
> LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
> occurred. I run these commands in a script file:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> fail = ORDER target BY bytes DESC;
>
>
>
> not_reached = LIMIT fail 10;
>
>
>
> dump not_reached;
>
>
>
>
>
> The error is listed below. I then run:
>
>
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> dump target;
>
>
>
>
>
> This script produces a large list of sips matching the filter. What am
> I doing wrong that causes pig to not want to ORDER these properly? I
> have been wrestling with this issue for a week now. Any help would be
> greatly appreciated.
>
>
>
>
>
>
>
> Best,
>
>
>
> Matthew
>
>
>
> /ERROR
>
>
>
> java.lang.RuntimeException:
>
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>
>
>
> at
>
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>
>
>
> at
>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
>
> 117)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
>
> 527)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
>
>
> at
>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
>
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>
> Input path does not exist:
>
> file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
>
> utFormat.java:224)
>
>
>
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
>
> tFormat.listStatus(PigFileInputFormat.java:37)
>
>
>
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
>
> tFormat.java:241)
>
>
>
> at
>
> org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>
>
>
> at
>
> org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>
>
>
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>
>
>
> ... 6 more
>
>
>
>
>
>
>
Re: ORDER Issue (repost to avoid spam filters)
Posted by Thejas M Nair <te...@yahoo-inc.com>.
Can you check if the initial MR jobs in the order-by query failed because of
some other error ? (specifically the sampling MR job that is part of
order-by). Maybe, for some reason(bug?) pig did not capture/log that error.
-Thejas
On 8/23/10 11:13 AM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
> Update:
> After downloading and installing pig-0.6.0, I ran the script again over
> the same data set. It produced the desired results. I don't know what I
> am doing wrong in 0.7.0, but will be reverting back to 0.6.0 until I can
> sort out what went wrong in 0.7.0. Thoughts are still welcome and wanted
> :D
>
> Thanks,
> Matt
>
> -----Original Message-----
> From: Matthew Smith [mailto:Matthew.Smith@g2-inc.com]
> Sent: Monday, August 23, 2010 11:39 AM
> To: Thejas M Nair; pig-user@hadoop.apache.org
> Subject: RE: ORDER Issue (repost to avoid spam filters)
>
> Changed the script to:
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
> target = FILTER start BY sip matches '51.37.8.63';
> not_null_bytes = FILTER target BY bytes is not null;
> dump not_null_bytes;
>
> and dumped the expected tuples. There were plenty of records that were
> valid. I will attempt to revert everything to pig-0.6.0 and re run the
> scripts to determine if the issue is in pig-0.7.0.
>
> Matt
>
> -----Original Message-----
> From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
> Sent: Friday, August 20, 2010 5:23 PM
> To: pig-user@hadoop.apache.org; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
> I was wondering if the bytes column is having all null values (probably
> because the input has formatting issues.)
>
> Can check you if the following query gives any output -
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> non_null_bytes = FILTER target by bytes is not null;
>
> dump just_bytes;
>
> -Thejas
>
>
> On 8/20/10 1:56 PM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
>
>> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and
> the
>> script worked as intended over the data set. This leads me to believe
>> that the issue is with pig-0.7.0 or my configuration. I would however
>> like to not pay for something that is free :D. Any other ideas would
> be
>> most welcome
>>
>>
>>
>> @Thejas
>>
>> I changed the Script to:
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>> just_bytes= FOREACH target GENERATE bytes;
>>
>> fail = ORDER just_bytes BY bytes DESC;
>>
>> not_reached = LIMIT fail 10;
>>
>> dump not_reached;
>>
>>
>>
>> and received the same error as before. I then changed the script to:
>>
>>
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>> stored = STORE target INTO 'myoutput';
>>
>> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
>> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
>> packets:int, bytes:int, flags:chararray, startTime:long,
> endTime:long);
>>
>> fail = ORDER second_start BY bytes DESC;
>>
>> not_reached = LIMIT fail 10;
>>
>> dump not_reached;
>>
>>
>>
>> and received the same error.
>>
>>
>>
>> @Mridul
>>
>> I am using local mode at the moment. I don't understand the second
>> question.
>>
>>
>>
>> Thanks,
>>
>> Matt
>>
>>
>>
>>
>>
>>
>>
>> From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
>> Sent: Thursday, August 19, 2010 5:34 PM
>> To: pig-user@hadoop.apache.org; Matthew Smith
>> Subject: Re: ORDER Issue (repost to avoid spam filters)
>>
>>
>>
>> I think 0.7 had an issue where order-by used to fail if the input was
>> empty. But that does not seem to be the case here.
>> I am wondering if there is a parsing/data-format issue that is causing
>> bytes column to be empty , though I am not aware of emtpy/null value
> of
>> sort column causing issues.
>> Can you try dumping just the bytes column ?
>> Another thing you can try is to store the output of filter and load
> data
>> again before doing order-by ..
>>
>> Please let us know what you find.
>>
>> Thanks,
>> Thejas
>>
>>
>>
>>
>> On 8/19/10 11:35 AM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
>>
>> All,
>>
>>
>>
>> I am running pig-0.7.0 and I have been running into an issue running
> the
>> ORDER command. I have attempted to run pig out of the box on 2
> separate
>> LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
>> occurred. I run these commands in a script file:
>>
>>
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>>
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>>
>>
>> fail = ORDER target BY bytes DESC;
>>
>>
>>
>> not_reached = LIMIT fail 10;
>>
>>
>>
>> dump not_reached;
>>
>>
>>
>>
>>
>> The error is listed below. I then run:
>>
>>
>>
>>
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>>
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>>
>>
>> dump target;
>>
>>
>>
>>
>>
>> This script produces a large list of sips matching the filter. What
> am
>> I doing wrong that causes pig to not want to ORDER these properly? I
>> have been wrestling with this issue for a week now. Any help would be
>> greatly appreciated.
>>
>>
>>
>>
>>
>>
>>
>> Best,
>>
>>
>>
>> Matthew
>>
>>
>>
>> /ERROR
>>
>>
>>
>> java.lang.RuntimeException:
>>
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path
>> does not exist: file:/user/matt/pigsample_24118161_1282155871461
>>
>>
>>
>> at
>>
>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>>
>> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>>
>>
>>
>> at
>>
>>
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>>
>>
>>
>> at
>>
>>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
>>
>> 117)
>>
>>
>>
>> at
>>
>>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
>>
>> 527)
>>
>>
>>
>> at
>>
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>
>>
>>
>> at
>>
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>
>>
>>
>> at
>>
>>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>
>>
>>
>> Caused by:
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>>
>> Input path does not exist:
>>
>> file:/user/matt/pigsample_24118161_1282155871461
>>
>>
>>
>> at
>>
>>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
>>
>> utFormat.java:224)
>>
>>
>>
>> at
>>
>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
>>
>> tFormat.listStatus(PigFileInputFormat.java:37)
>>
>>
>>
>> at
>>
>>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
>>
>> tFormat.java:241)
>>
>>
>>
>> at
>>
>> org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>>
>>
>>
>> at
>>
>>
> org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>>
>>
>>
>> at
>>
>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>>
>> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>>
>>
>>
>> ... 6 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
RE: ORDER Issue (repost to avoid spam filters)
Posted by Matthew Smith <Ma...@g2-inc.com>.
Update:
After downloading and installing pig-0.6.0, I ran the script again over
the same data set. It produced the desired results. I don't know what I
am doing wrong in 0.7.0, but will be reverting back to 0.6.0 until I can
sort out what went wrong in 0.7.0. Thoughts are still welcome and wanted
:D
Thanks,
Matt
-----Original Message-----
From: Matthew Smith [mailto:Matthew.Smith@g2-inc.com]
Sent: Monday, August 23, 2010 11:39 AM
To: Thejas M Nair; pig-user@hadoop.apache.org
Subject: RE: ORDER Issue (repost to avoid spam filters)
Changed the script to:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
not_null_bytes = FILTER target BY bytes is not null;
dump not_null_bytes;
and dumped the expected tuples. There were plenty of records that were
valid. I will attempt to revert everything to pig-0.6.0 and re run the
scripts to determine if the issue is in pig-0.7.0.
Matt
-----Original Message-----
From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
Sent: Friday, August 20, 2010 5:23 PM
To: pig-user@hadoop.apache.org; Matthew Smith
Subject: Re: ORDER Issue (repost to avoid spam filters)
I was wondering if the bytes column is having all null values (probably
because the input has formatting issues.)
Can check you if the following query gives any output -
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
non_null_bytes = FILTER target by bytes is not null;
dump just_bytes;
-Thejas
On 8/20/10 1:56 PM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and
the
> script worked as intended over the data set. This leads me to believe
> that the issue is with pig-0.7.0 or my configuration. I would however
> like to not pay for something that is free :D. Any other ideas would
be
> most welcome
>
>
>
> @Thejas
>
> I changed the Script to:
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> just_bytes= FOREACH target GENERATE bytes;
>
> fail = ORDER just_bytes BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error as before. I then changed the script to:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> stored = STORE target INTO 'myoutput';
>
> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
> packets:int, bytes:int, flags:chararray, startTime:long,
endTime:long);
>
> fail = ORDER second_start BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error.
>
>
>
> @Mridul
>
> I am using local mode at the moment. I don't understand the second
> question.
>
>
>
> Thanks,
>
> Matt
>
>
>
>
>
>
>
> From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
> Sent: Thursday, August 19, 2010 5:34 PM
> To: pig-user@hadoop.apache.org; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
>
>
> I think 0.7 had an issue where order-by used to fail if the input was
> empty. But that does not seem to be the case here.
> I am wondering if there is a parsing/data-format issue that is causing
> bytes column to be empty , though I am not aware of emtpy/null value
of
> sort column causing issues.
> Can you try dumping just the bytes column ?
> Another thing you can try is to store the output of filter and load
data
> again before doing order-by ..
>
> Please let us know what you find.
>
> Thanks,
> Thejas
>
>
>
>
> On 8/19/10 11:35 AM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
>
> All,
>
>
>
> I am running pig-0.7.0 and I have been running into an issue running
the
> ORDER command. I have attempted to run pig out of the box on 2
separate
> LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
> occurred. I run these commands in a script file:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> fail = ORDER target BY bytes DESC;
>
>
>
> not_reached = LIMIT fail 10;
>
>
>
> dump not_reached;
>
>
>
>
>
> The error is listed below. I then run:
>
>
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> dump target;
>
>
>
>
>
> This script produces a large list of sips matching the filter. What
am
> I doing wrong that causes pig to not want to ORDER these properly? I
> have been wrestling with this issue for a week now. Any help would be
> greatly appreciated.
>
>
>
>
>
>
>
> Best,
>
>
>
> Matthew
>
>
>
> /ERROR
>
>
>
> java.lang.RuntimeException:
>
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path
> does not exist: file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>
>
>
> at
>
>
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>
>
>
> at
>
>
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
>
> 117)
>
>
>
> at
>
>
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
>
> 527)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
>
>
> at
>
>
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
>
> Caused by:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>
> Input path does not exist:
>
> file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
>
> utFormat.java:224)
>
>
>
> at
>
>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
>
> tFormat.listStatus(PigFileInputFormat.java:37)
>
>
>
> at
>
>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
>
> tFormat.java:241)
>
>
>
> at
>
> org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>
>
>
> at
>
>
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>
>
>
> at
>
>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>
>
>
> ... 6 more
>
>
>
>
>
>
>
>
>
>
>
>
RE: ORDER Issue (repost to avoid spam filters)
Posted by Matthew Smith <Ma...@g2-inc.com>.
Changed the script to:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
not_null_bytes = FILTER target BY bytes is not null;
dump not_null_bytes;
and dumped the expected tuples. There were plenty of records that were
valid. I will attempt to revert everything to pig-0.6.0 and re run the
scripts to determine if the issue is in pig-0.7.0.
Matt
-----Original Message-----
From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
Sent: Friday, August 20, 2010 5:23 PM
To: pig-user@hadoop.apache.org; Matthew Smith
Subject: Re: ORDER Issue (repost to avoid spam filters)
I was wondering if the bytes column is having all null values (probably
because the input has formatting issues.)
Can check you if the following query gives any output -
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
non_null_bytes = FILTER target by bytes is not null;
dump just_bytes;
-Thejas
On 8/20/10 1:56 PM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and
the
> script worked as intended over the data set. This leads me to believe
> that the issue is with pig-0.7.0 or my configuration. I would however
> like to not pay for something that is free :D. Any other ideas would
be
> most welcome
>
>
>
> @Thejas
>
> I changed the Script to:
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> just_bytes= FOREACH target GENERATE bytes;
>
> fail = ORDER just_bytes BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error as before. I then changed the script to:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> stored = STORE target INTO 'myoutput';
>
> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
> packets:int, bytes:int, flags:chararray, startTime:long,
endTime:long);
>
> fail = ORDER second_start BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error.
>
>
>
> @Mridul
>
> I am using local mode at the moment. I don't understand the second
> question.
>
>
>
> Thanks,
>
> Matt
>
>
>
>
>
>
>
> From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
> Sent: Thursday, August 19, 2010 5:34 PM
> To: pig-user@hadoop.apache.org; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
>
>
> I think 0.7 had an issue where order-by used to fail if the input was
> empty. But that does not seem to be the case here.
> I am wondering if there is a parsing/data-format issue that is causing
> bytes column to be empty , though I am not aware of emtpy/null value
of
> sort column causing issues.
> Can you try dumping just the bytes column ?
> Another thing you can try is to store the output of filter and load
data
> again before doing order-by ..
>
> Please let us know what you find.
>
> Thanks,
> Thejas
>
>
>
>
> On 8/19/10 11:35 AM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
>
> All,
>
>
>
> I am running pig-0.7.0 and I have been running into an issue running
the
> ORDER command. I have attempted to run pig out of the box on 2
separate
> LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
> occurred. I run these commands in a script file:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> fail = ORDER target BY bytes DESC;
>
>
>
> not_reached = LIMIT fail 10;
>
>
>
> dump not_reached;
>
>
>
>
>
> The error is listed below. I then run:
>
>
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> dump target;
>
>
>
>
>
> This script produces a large list of sips matching the filter. What
am
> I doing wrong that causes pig to not want to ORDER these properly? I
> have been wrestling with this issue for a week now. Any help would be
> greatly appreciated.
>
>
>
>
>
>
>
> Best,
>
>
>
> Matthew
>
>
>
> /ERROR
>
>
>
> java.lang.RuntimeException:
>
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path
> does not exist: file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>
>
>
> at
>
>
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>
>
>
> at
>
>
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
>
> 117)
>
>
>
> at
>
>
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
>
> 527)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
>
>
> at
>
>
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
>
> Caused by:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>
> Input path does not exist:
>
> file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
>
> utFormat.java:224)
>
>
>
> at
>
>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
>
> tFormat.listStatus(PigFileInputFormat.java:37)
>
>
>
> at
>
>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
>
> tFormat.java:241)
>
>
>
> at
>
> org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>
>
>
> at
>
>
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>
>
>
> at
>
>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>
>
>
> ... 6 more
>
>
>
>
>
>
>
>
>
>
>
>
Re: ORDER Issue (repost to avoid spam filters)
Posted by Thejas M Nair <te...@yahoo-inc.com>.
I was wondering if the bytes column is having all null values (probably
because the input has formatting issues.)
Can check you if the following query gives any output -
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
non_null_bytes = FILTER target by bytes is not null;
dump just_bytes;
-Thejas
On 8/20/10 1:56 PM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and the
> script worked as intended over the data set. This leads me to believe
> that the issue is with pig-0.7.0 or my configuration. I would however
> like to not pay for something that is free :D. Any other ideas would be
> most welcome
>
>
>
> @Thejas
>
> I changed the Script to:
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> just_bytes= FOREACH target GENERATE bytes;
>
> fail = ORDER just_bytes BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error as before. I then changed the script to:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> stored = STORE target INTO 'myoutput';
>
> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
> packets:int, bytes:int, flags:chararray, startTime:long, endTime:long);
>
> fail = ORDER second_start BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error.
>
>
>
> @Mridul
>
> I am using local mode at the moment. I don't understand the second
> question.
>
>
>
> Thanks,
>
> Matt
>
>
>
>
>
>
>
> From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
> Sent: Thursday, August 19, 2010 5:34 PM
> To: pig-user@hadoop.apache.org; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
>
>
> I think 0.7 had an issue where order-by used to fail if the input was
> empty. But that does not seem to be the case here.
> I am wondering if there is a parsing/data-format issue that is causing
> bytes column to be empty , though I am not aware of emtpy/null value of
> sort column causing issues.
> Can you try dumping just the bytes column ?
> Another thing you can try is to store the output of filter and load data
> again before doing order-by ..
>
> Please let us know what you find.
>
> Thanks,
> Thejas
>
>
>
>
> On 8/19/10 11:35 AM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
>
> All,
>
>
>
> I am running pig-0.7.0 and I have been running into an issue running the
> ORDER command. I have attempted to run pig out of the box on 2 separate
> LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
> occurred. I run these commands in a script file:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> fail = ORDER target BY bytes DESC;
>
>
>
> not_reached = LIMIT fail 10;
>
>
>
> dump not_reached;
>
>
>
>
>
> The error is listed below. I then run:
>
>
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> dump target;
>
>
>
>
>
> This script produces a large list of sips matching the filter. What am
> I doing wrong that causes pig to not want to ORDER these properly? I
> have been wrestling with this issue for a week now. Any help would be
> greatly appreciated.
>
>
>
>
>
>
>
> Best,
>
>
>
> Matthew
>
>
>
> /ERROR
>
>
>
> java.lang.RuntimeException:
>
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
>
>
>
> at
>
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>
>
>
> at
>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
>
> 117)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
>
> 527)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>
>
>
> at
>
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
>
>
> at
>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
>
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>
> Input path does not exist:
>
> file:/user/matt/pigsample_24118161_1282155871461
>
>
>
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
>
> utFormat.java:224)
>
>
>
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
>
> tFormat.listStatus(PigFileInputFormat.java:37)
>
>
>
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
>
> tFormat.java:241)
>
>
>
> at
>
> org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
>
>
>
> at
>
> org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
>
>
>
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
>
> s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
>
>
>
> ... 6 more
>
>
>
>
>
>
>
>
>
>
>
>
RE: ORDER Issue (repost to avoid spam filters)
Posted by Matthew Smith <Ma...@g2-inc.com>.
UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and the
script worked as intended over the data set. This leads me to believe
that the issue is with pig-0.7.0 or my configuration. I would however
like to not pay for something that is free :D. Any other ideas would be
most welcome
@Thejas
I changed the Script to:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
just_bytes= FOREACH target GENERATE bytes;
fail = ORDER just_bytes BY bytes DESC;
not_reached = LIMIT fail 10;
dump not_reached;
and received the same error as before. I then changed the script to:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
stored = STORE target INTO 'myoutput';
second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
(sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
packets:int, bytes:int, flags:chararray, startTime:long, endTime:long);
fail = ORDER second_start BY bytes DESC;
not_reached = LIMIT fail 10;
dump not_reached;
and received the same error.
@Mridul
I am using local mode at the moment. I don't understand the second
question.
Thanks,
Matt
From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
Sent: Thursday, August 19, 2010 5:34 PM
To: pig-user@hadoop.apache.org; Matthew Smith
Subject: Re: ORDER Issue (repost to avoid spam filters)
I think 0.7 had an issue where order-by used to fail if the input was
empty. But that does not seem to be the case here.
I am wondering if there is a parsing/data-format issue that is causing
bytes column to be empty , though I am not aware of emtpy/null value of
sort column causing issues.
Can you try dumping just the bytes column ?
Another thing you can try is to store the output of filter and load data
again before doing order-by ..
Please let us know what you find.
Thanks,
Thejas
On 8/19/10 11:35 AM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
All,
I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
occurred. I run these commands in a script file:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
fail = ORDER target BY bytes DESC;
not_reached = LIMIT fail 10;
dump not_reached;
The error is listed below. I then run:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
dump target;
This script produces a large list of sips matching the filter. What am
I doing wrong that causes pig to not want to ORDER these properly? I
have been wrestling with this issue for a week now. Any help would be
greatly appreciated.
Best,
Matthew
/ERROR
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/user/matt/pigsample_24118161_1282155871461
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
117)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
527)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/user/matt/pigsample_24118161_1282155871461
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
utFormat.java:224)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
tFormat.listStatus(PigFileInputFormat.java:37)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
tFormat.java:241)
at
org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
... 6 more
Re: ORDER Issue (repost to avoid spam filters)
Posted by Thejas M Nair <te...@yahoo-inc.com>.
I think 0.7 had an issue where order-by used to fail if the input was empty. But that does not seem to be the case here.
I am wondering if there is a parsing/data-format issue that is causing bytes column to be empty , though I am not aware of emtpy/null value of sort column causing issues.
Can you try dumping just the bytes column ?
Another thing you can try is to store the output of filter and load data again before doing order-by ..
Please let us know what you find.
Thanks,
Thejas
On 8/19/10 11:35 AM, "Matthew Smith" <Ma...@g2-inc.com> wrote:
All,
I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
occurred. I run these commands in a script file:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
fail = ORDER target BY bytes DESC;
not_reached = LIMIT fail 10;
dump not_reached;
The error is listed below. I then run:
start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);
target = FILTER start BY sip matches '51.37.8.63';
dump target;
This script produces a large list of sips matching the filter. What am
I doing wrong that causes pig to not want to ORDER these properly? I
have been wrestling with this issue for a week now. Any help would be
greatly appreciated.
Best,
Matthew
/ERROR
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/user/matt/pigsample_24118161_1282155871461
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
117)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
527)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/user/matt/pigsample_24118161_1282155871461
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp
utFormat.java:224)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu
tFormat.listStatus(PigFileInputFormat.java:37)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu
tFormat.java:241)
at
org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner
s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
... 6 more