You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2014/06/22 08:53:24 UTC

[jira] [Created] (PIG-4032) BloomFilter fails with s3 path in Hadoop 2.4

Cheolsoo Park created PIG-4032:
----------------------------------

             Summary: BloomFilter fails with s3 path in Hadoop 2.4
                 Key: PIG-4032
                 URL: https://issues.apache.org/jira/browse/PIG-4032
             Project: Pig
          Issue Type: Bug
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
             Fix For: 0.14.0


BloomFilter is broken with s3 path in Hadoop 2. Here is a simple example-
{code}
DEFINE bloomtest Bloom('s3n://foo/bar/bloom');
a = LOAD 's3n://foo/bar/test.txt' using PigStorage('\t') as (k:int, v:int) ;
split a into yes if bloomtest(k,v), no otherwise;
dump yes;
{code}
This query fails with the following error-
{code}
14/06/22 06:28:58 INFO jobcontrol.ControlledJob: PigLatin:test.pig got an error while submitting
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3n:__foo_bar_bloom
	at org.apache.hadoop.fs.Path.initialize(Path.java:206)
	at org.apache.hadoop.fs.Path.<init>(Path.java:172)
	at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java
{code}
The problem is that the distributed cache file name {{s3n:__foo_bar_bloom}} causes a uri syntax error because of the s3n prefix.

In fact, this is a regression of HADOOP-8562 that includes the following change-
{code:title=Path.java}
-      this.uri = new URI(scheme, authority, normalizePath(path), null, fragment)
+      this.uri = new URI(scheme, authority, normalizePath(scheme, path), null, fragment)
{code}
Since the scheme was ignored in Hadoop 1, s3 path used to work accidentally. But in Hadoop 2, it starts failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)