You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "W.H. (JIRA)" <ji...@apache.org> on 2016/04/04 13:37:25 UTC
[jira] [Created] (HADOOP-12999) NPE when accessing (meta-)data via Hive query from S3 bucket

W.H. created HADOOP-12999:
-----------------------------

             Summary: NPE when accessing (meta-)data via Hive query from S3 bucket
                 Key: HADOOP-12999
                 URL: https://issues.apache.org/jira/browse/HADOOP-12999
             Project: Hadoop Common
          Issue Type: Bug
          Components: tools
    Affects Versions: 2.7.2, 2.8.0
         Environment: JDK8, Hive 2.0.0, Hadoop 2.7.2, also happens with Hadoop 2.8.0-SNAPSHOT (git revision ab67b50543e2e9dc48f2dcc00de18c2e2c6b4647)
            Reporter: W.H.


Querying data stored in S3 via Hive 2.0.0 causes a NPE. The exception occurs when Hive Metastore uses hadoop-aws tools to query the bucket structure in S3.

Example Hive query:
{code}
create external table if not exists test_table_2 (
    id STRING, name STRING
)
LOCATION 's3://my-bucket/test/test_insert2/';
{code}

The required bucket folder exists properly in S3. (Also, there is a $folder$ entry on the same directory level which is an often used workaround for S3 tools that cannot handle empty folders):
{code}
$ s3cmd ls s3://my-bucket/test/test_insert2
                       DIR   s3://my-bucket/test/test_insert2/
2016-04-04 10:44         0   s3://my-bucket/test/test_insert2_$folder$
{code}

The following is an excerpt of the stack trace in Hive console log:
{code}
exec.DDLTask (DDLTask.java:failed(541)) - org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.NullPointerException)
	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:783)
	at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4032)
	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:322)

...
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:132)
	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:342)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
	at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:518)
	at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:201)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1317)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1369)
	... 39 more
```
{code}

After digging into the code, it appears that the root cause for this issue is not in Hive, but in the way that hadoop-aws is querying the bucket information from S3. We have verified that the Path parameter passed into org.apache.hadoop.hive.common.FileUtils.mkdir(...) is indeed NOT null.

The issue can be easily reproduced using the following standalone piece of code:
{code}
FileSystem fs = new org.apache.hadoop.fs.s3.S3FileSystem();
Configuration conf = new Configuration();
conf.set("fs.s3.awsAccessKeyId", "...");
conf.set("fs.s3.awsSecretAccessKey", "...");
String url = "s3://my-bucket/test/test_insert2/";
fs.initialize(new URI(url), conf);
Path f1 = new Path(url);
boolean inheritPerms = true;
org.apache.hadoop.hive.common.FileUtils.mkdir(fs, f1, inheritPerms, conf);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)