You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2014/03/27 17:59:22 UTC

[jira] [Commented] (PIG-3830) HiveColumnarLoader throwing FileNotFoundException on Hadoop 2

    [ https://issues.apache.org/jira/browse/PIG-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949583#comment-13949583 ] 

Cheolsoo Park commented on PIG-3830:
------------------------------------

[~jarcec], is this ready for review? If so, please mark it as patch available.

Regarding the formatting, feel free to clean up the white spaces. As long as the patch is uploaded to the RB, it's not hard to review (at least for me).

> HiveColumnarLoader throwing FileNotFoundException on Hadoop 2
> -------------------------------------------------------------
>
>                 Key: PIG-3830
>                 URL: https://issues.apache.org/jira/browse/PIG-3830
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 0.13.0
>
>         Attachments: PIG-3830.patch
>
>
> I've noticed that {{HiveColumnarLoader}} will thrown {{java.io.FileNotFoundException}} when used with glob path on Hadoop 2.0. It will run just fine on Hadoop 1.0:
> {code}
> Failed to parse: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
> 	at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
> 	at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1676)
> 	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1623)
> 	at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
> 	at org.apache.pig.PigServer.registerQuery(PigServer.java:588)
> 	at org.apache.pig.piggybank.test.storage.TestHiveColumnarLoader.testHdfdsGlobbing(TestHiveColumnarLoader.java:220)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at junit.framework.TestCase.runTest(TestCase.java:176)
> 	at junit.framework.TestCase.runBare(TestCase.java:141)
> 	at junit.framework.TestResult$1.protect(TestResult.java:122)
> 	at junit.framework.TestResult.runProtected(TestResult.java:142)
> 	at junit.framework.TestResult.run(TestResult.java:125)
> 	at junit.framework.TestCase.run(TestCase.java:129)
> 	at junit.framework.TestSuite.runTest(TestSuite.java:255)
> 	at junit.framework.TestSuite.run(TestSuite.java:250)
> 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
> 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
> 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
> 	at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:362)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1484)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1524)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
> 	at org.apache.pig.piggybank.storage.partition.PathPartitioner.getPartitionKeys(PathPartitioner.java:105)
> 	at org.apache.pig.piggybank.storage.partition.PathPartitionHelper.getPartitionKeys(PathPartitionHelper.java:101)
> 	at org.apache.pig.piggybank.storage.HiveColumnarLoader.getPartitionColumns(HiveColumnarLoader.java:576)
> 	at org.apache.pig.piggybank.storage.HiveColumnarLoader.getSchema(HiveColumnarLoader.java:646)
> 	at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
> 	at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
> 	at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853)
> 	at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3479)
> 	at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1536)
> 	at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1013)
> 	at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:553)
> 	at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
> 	at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
> 	... 20 more
> Caused by: java.io.FileNotFoundException: File /home/jarcec/cloudera/repos/pig/contrib/piggybank/java/simpleDataDir1395623312698/*.txt does not exist
> 	... 37 more
> {code}
> I've dived into the problem and found a difference in Hadoop implementation of {{DistributedFileSystem}}. For non existing directory method {{listStatus}} will return {{null}} in [Hadoop 1|https://github.com/apache/hadoop-common/blob/branch-1/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java#L316]:
> {code}
>     if (thisListing == null) { // the directory does not exist
>       return null;
>     }
> {code}
> But will thrown an exception in [Hadoop 2|https://github.com/apache/hadoop-common/blob/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L653]:
> {code}
>     if (thisListing == null) { // the directory does not exist
>       throw new FileNotFoundException("File " + p + " does not exist.");
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)