You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Waldemar Hummer (JIRA)" <ji...@apache.org> on 2016/04/05 05:10:25 UTC
[jira] [Resolved] (HADOOP-12999) NPE when accessing (meta-)data via Hive query from S3 bucket

     [ https://issues.apache.org/jira/browse/HADOOP-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Waldemar Hummer resolved HADOOP-12999.
--------------------------------------
    Resolution: Invalid

> NPE when accessing (meta-)data via Hive query from S3 bucket
> ------------------------------------------------------------
>
>                 Key: HADOOP-12999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12999
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0, 2.7.2
>         Environment: JDK8, Hive 2.0.0, Hadoop 2.7.2, also happens with Hadoop 2.8.0-SNAPSHOT (git revision ab67b50543e2e9dc48f2dcc00de18c2e2c6b4647)
>            Reporter: Waldemar Hummer
>
> Querying data stored in S3 via Hive 2.0.0 causes a NPE. The exception occurs when Hive Metastore uses hadoop-aws tools to query the bucket structure in S3.
> Example Hive query:
> {code}
> create external table if not exists test_table_2 (
>     id STRING, name STRING
> )
> LOCATION 's3://my-bucket/test/test_insert2/';
> {code}
> The required bucket folder exists properly in S3. (Also, there is a $folder$ entry on the same directory level which is an often used workaround for S3 tools that cannot handle empty folders):
> {code}
> $ s3cmd ls s3://my-bucket/test/test_insert2
>                        DIR   s3://my-bucket/test/test_insert2/
> 2016-04-04 10:44         0   s3://my-bucket/test/test_insert2_$folder$
> {code}
> The following is an excerpt of the stack trace in Hive console log:
> {code}
> exec.DDLTask (DDLTask.java:failed(541)) - org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.NullPointerException)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:783)
> 	at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4032)
> 	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:322)
> ...
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:132)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:342)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
> 	at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:518)
> 	at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:201)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1317)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1369)
> 	... 39 more
> ```
> {code}
> After digging into the code, it appears that the root cause for this issue is not in Hive, but in the way that hadoop-aws is querying the bucket information from S3. We have verified that the Path parameter passed into org.apache.hadoop.hive.common.FileUtils.mkdir(...) is indeed NOT null.
> The issue can be easily reproduced using the following standalone piece of code:
> {code}
> FileSystem fs = new org.apache.hadoop.fs.s3.S3FileSystem();
> Configuration conf = new Configuration();
> conf.set("fs.s3.awsAccessKeyId", "...");
> conf.set("fs.s3.awsSecretAccessKey", "...");
> String url = "s3://my-bucket/test/test_insert2/";
> fs.initialize(new URI(url), conf);
> Path f1 = new Path(url);
> boolean inheritPerms = true;
> org.apache.hadoop.hive.common.FileUtils.mkdir(fs, f1, inheritPerms, conf);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)