You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2014/06/22 21:56:24 UTC

[jira] [Commented] (PIG-4032) BloomFilter fails with s3 path in Hadoop 2.4

    [ https://issues.apache.org/jira/browse/PIG-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040224#comment-14040224 ] 

Daniel Dai commented on PIG-4032:
---------------------------------

+1

> BloomFilter fails with s3 path in Hadoop 2.4
> --------------------------------------------
>
>                 Key: PIG-4032
>                 URL: https://issues.apache.org/jira/browse/PIG-4032
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4032-1.patch
>
>
> BloomFilter is broken with s3 path in Hadoop 2. Here is a simple example-
> {code}
> DEFINE bloomtest Bloom('s3n://foo/bar/bloom');
> a = LOAD 's3n://foo/bar/test.txt' using PigStorage('\t') as (k:int, v:int) ;
> split a into yes if bloomtest(k,v), no otherwise;
> dump yes;
> {code}
> This query fails with the following error-
> {code}
> 14/06/22 06:28:58 INFO jobcontrol.ControlledJob: PigLatin:test.pig got an error while submitting
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3n:__foo_bar_bloom
> 	at org.apache.hadoop.fs.Path.initialize(Path.java:206)
> 	at org.apache.hadoop.fs.Path.<init>(Path.java:172)
> 	at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java
> {code}
> The problem is that the distributed cache file name {{s3n:__foo_bar_bloom}} causes a uri syntax error because of the s3n prefix.
> In fact, this is a regression of HADOOP-8562 that includes the following change-
> {code:title=Path.java}
> -      this.uri = new URI(scheme, authority, normalizePath(path), null, fragment)
> +      this.uri = new URI(scheme, authority, normalizePath(scheme, path), null, fragment)
> {code}
> Since the scheme was ignored in Hadoop 1, s3 path used to work accidentally. But in Hadoop 2, it starts failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)