You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mahender Sarangam <ma...@outlook.com> on 2018/04/27 08:28:12 UTC

Hive External Table with Zero Bytes files

Hi,

Can any one faced issue while fetching data from external table. We are copying data from upstream system into our storage S3. As part of copy, directories along with Zero bytes files are been copied. Source File Format is in JSON format.  Below is Folder Hierarchy Structure


 DATE  -->  <Folder>

       <DAY=201803250> ---> Folder

                     1.json.gz  --> File

                      2.json.gz

        <day=201803250> ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part1.0D2FE6BE.F20BC8BF@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar>;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 'wasb://container1@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://container1@xyz.blob.core.windows.net/date/day=201803250/>' TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external table. Is this expected behaviour. Is there any setting for ignoring Zero bytes files in hive external table


-Mahens

Re: Hive External Table with Zero Bytes files

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> We are copying data from upstream system into our storage S3. As part of copy, directories along with Zero bytes files are been copied. 

Is this exactly the same issue as the previous thread or a different one?

<http://mail-archives.apache.org/mod_mbox/hive-user/201701.mbox/%3CSG2PR0601MB1817CEAF7C1F1B4A9777741992670@SG2PR0601MB1817.apcprd06.prod.outlook.com%3E>

Cheers,
Gopal