You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2014/06/22 08:53:24 UTC
[jira] [Created] (PIG-4032) BloomFilter fails with s3 path in
Hadoop 2.4
Cheolsoo Park created PIG-4032:
----------------------------------
Summary: BloomFilter fails with s3 path in Hadoop 2.4
Key: PIG-4032
URL: https://issues.apache.org/jira/browse/PIG-4032
Project: Pig
Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Fix For: 0.14.0
BloomFilter is broken with s3 path in Hadoop 2. Here is a simple example-
{code}
DEFINE bloomtest Bloom('s3n://foo/bar/bloom');
a = LOAD 's3n://foo/bar/test.txt' using PigStorage('\t') as (k:int, v:int) ;
split a into yes if bloomtest(k,v), no otherwise;
dump yes;
{code}
This query fails with the following error-
{code}
14/06/22 06:28:58 INFO jobcontrol.ControlledJob: PigLatin:test.pig got an error while submitting
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3n:__foo_bar_bloom
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java
{code}
The problem is that the distributed cache file name {{s3n:__foo_bar_bloom}} causes a uri syntax error because of the s3n prefix.
In fact, this is a regression of HADOOP-8562 that includes the following change-
{code:title=Path.java}
- this.uri = new URI(scheme, authority, normalizePath(path), null, fragment)
+ this.uri = new URI(scheme, authority, normalizePath(scheme, path), null, fragment)
{code}
Since the scheme was ignored in Hadoop 1, s3 path used to work accidentally. But in Hadoop 2, it starts failing.
--
This message was sent by Atlassian JIRA
(v6.2#6252)