You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mahender Sarangam <ma...@outlook.com> on 2018/05/01 10:57:26 UTC

Re: Hive External Table with Zero Bytes files

Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325 (Folder) and same has Filename. actually in our upstream, our source table is partitioned by Date. Whenever a table is partitioned, we see Zero Byte. Now when we create external table with partitioned by columns and fire select query no data is returned. . If I delete manually those files (Zero Bytes), we were able to read.


/Mahender

On 4/28/2018 6:36 AM, Thai Bui wrote:
Your external table is referencing the .../day=201803250 location which is empty. Point your table to the capital .../DAY=201803250 and you should be able to read the data there.

Also, it looks like you want external partitioned table. You’ll need to create an external table with a partition clause, then alter the table and add partition for each of the ../DAY=someday path that you have.

On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam <ma...@outlook.com>> wrote:

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying data from upstream system into our storage S3. As part of copy, directories along with Zero bytes files are been copied. Source File Format is in JSON format.  Below is Folder Hierarchy Structure


 DATE  -->  <Folder>

       <DAY=201803250> ---> Folder

                     1.json.gz  --> File

                      2.json.gz

        <day=201803250> ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part2.8F1EE022.4DF8F8CE@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar>;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 'wasb://container1@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://container1@xyz.blob.core.windows.net/date/day=201803250/>' TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external table. Is this expected behaviour. Is there any setting for ignoring Zero bytes files in hive external table


-Mahens

--
Thai


Re: Hive External Table with Zero Bytes files

Posted by Mahender Sarangam <ma...@outlook.com>.
Thanks Nishanth. We are also cleaning up files of size Zero byes




On 5/2/2018 8:53 AM, Nishanth S wrote:
I have run into similar issue with avro files . The solution was to fix upstream jobs  that were writing data to those directories . In our case the  writers were not flushed/closed correctly during certain  events which caused the   issue . Fixing those prevented these 0 sized files.

-NS

On Wed, May 2, 2018 at 1:52 AM, Mahender Sarangam <ma...@outlook.com>> wrote:

ping..

On 5/1/2018 3:57 AM, Mahender Sarangam wrote:

Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325 (Folder) and same has Filename. actually in our upstream, our source table is partitioned by Date. Whenever a table is partitioned, we see Zero Byte. Now when we create external table with partitioned by columns and fire select query no data is returned. . If I delete manually those files (Zero Bytes), we were able to read.


/Mahender

On 4/28/2018 6:36 AM, Thai Bui wrote:
Your external table is referencing the .../day=201803250 location which is empty. Point your table to the capital .../DAY=201803250 and you should be able to read the data there.

Also, it looks like you want external partitioned table. You’ll need to create an external table with a partition clause, then alter the table and add partition for each of the ../DAY=someday path that you have.

On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam <ma...@outlook.com>> wrote:

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying data from upstream system into our storage S3. As part of copy, directories along with Zero bytes files are been copied. Source File Format is in JSON format.  Below is Folder Hierarchy Structure


 DATE  -->  <Folder>

       <DAY=201803250> ---> Folder

                     1.json.gz  --> File

                      2.json.gz

        <day=201803250> ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part3.EBF43959.0CBADC68@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar>;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 'wasb://container1@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://container1@xyz.blob.core.windows.net/date/day=201803250/>' TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external table. Is this expected behaviour. Is there any setting for ignoring Zero bytes files in hive external table


-Mahens

--
Thai





Re: Hive External Table with Zero Bytes files

Posted by Nishanth S <ni...@gmail.com>.
I have run into similar issue with avro files . The solution was to fix
upstream jobs  that were writing data to those directories . In our case
the  writers were not flushed/closed correctly during certain  events which
caused the   issue . Fixing those prevented these 0 sized files.

-NS

On Wed, May 2, 2018 at 1:52 AM, Mahender Sarangam <
mahender.bigdata@outlook.com> wrote:

> ping..
>
> On 5/1/2018 3:57 AM, Mahender Sarangam wrote:
>
> Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325
> (Folder) and same has Filename. actually in our upstream, our source table
> is partitioned by Date. Whenever a table is partitioned, we see Zero Byte.
> Now when we create external table with partitioned by columns and fire
> select query no data is returned. . If I delete manually those files (Zero
> Bytes), we were able to read.
>
>
> /Mahender
>
> On 4/28/2018 6:36 AM, Thai Bui wrote:
>
> Your external table is referencing the .../day=201803250 location which is
> empty. Point your table to the capital .../DAY=201803250 and you should be
> able to read the data there.
>
> Also, it looks like you want external partitioned table. You’ll need to
> create an external table with a partition clause, then alter the table and
> add partition for each of the ../DAY=someday path that you have.
>
> On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam <
> mahender.bigdata@outlook.com> wrote:
>
>> Gentle Ping. Please help me on below issue. Has any one faced same issue
>>
>> On 4/27/2018 1:28 AM, Mahender Sarangam wrote:
>>
>> Hi,
>>
>> Can any one faced issue while fetching data from external table. We are
>> copying data from upstream system into our storage S3. As part of copy,
>> directories along with Zero bytes files are been copied. Source File Format
>> is in JSON format.  Below is Folder Hierarchy Structure
>>
>>
>>  DATE  -->  <Folder>
>>
>>        <DAY=201803250> ---> Folder
>>
>>                      1.json.gz  --> File
>>
>>                       2.json.gz
>>
>>         <day=201803250> ---> Empty Zero Bytes Files.
>>
>> Please find below screenshot
>>
>> We are trying to create external table with JSON Serde.
>>
>> ADD JAR wasb://jsonserde@XYZ.blob.core.windows.net/json/json-
>> serde-1.3.9.jar;
>>  SET hive.mapred.supports.subdirectories=TRUE;
>>  SET mapred.input.dir.recursive=TRUE;
>> SET hive.merge.mapfiles = true;
>> SET hive.merge.mapredfiles = true;
>> SET hive.merge.tezfiles = true;
>>
>>
>>  DROP TABLE IF EXISTS Ext_STG1;
>>  CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String)
>> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH
>> SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" =
>> "true")
>> STORED AS TEXTFILE LOCATION 'wasb://container1@xyz.blob.
>> core.windows.net/date/day=201803250/' TBLPROPERTIES
>> ('serialization.null.format' = '');
>>
>> select * from Ext_STG1 limit 100;
>>
>>
>> Above Query shows Empty Results.
>>
>>
>> When I delete Zero bytes files, then i could see data from select
>> external table. Is this expected behaviour. Is there any setting for
>> ignoring Zero bytes files in hive external table
>>
>>
>> -Mahens
>>
>>
>> --
> Thai
>
>
>
>

Re: Hive External Table with Zero Bytes files

Posted by Mahender Sarangam <ma...@outlook.com>.
ping..

On 5/1/2018 3:57 AM, Mahender Sarangam wrote:

Thanks Thai. I have mentioned wrongly Folder Name, it 's same DAY=20180325 (Folder) and same has Filename. actually in our upstream, our source table is partitioned by Date. Whenever a table is partitioned, we see Zero Byte. Now when we create external table with partitioned by columns and fire select query no data is returned. . If I delete manually those files (Zero Bytes), we were able to read.


/Mahender

On 4/28/2018 6:36 AM, Thai Bui wrote:
Your external table is referencing the .../day=201803250 location which is empty. Point your table to the capital .../DAY=201803250 and you should be able to read the data there.

Also, it looks like you want external partitioned table. You’ll need to create an external table with a partition clause, then alter the table and add partition for each of the ../DAY=someday path that you have.

On Sat, Apr 28, 2018 at 4:05 AM Mahender Sarangam <ma...@outlook.com>> wrote:

Gentle Ping. Please help me on below issue. Has any one faced same issue

On 4/27/2018 1:28 AM, Mahender Sarangam wrote:

Hi,

Can any one faced issue while fetching data from external table. We are copying data from upstream system into our storage S3. As part of copy, directories along with Zero bytes files are been copied. Source File Format is in JSON format.  Below is Folder Hierarchy Structure


 DATE  -->  <Folder>

       <DAY=201803250> ---> Folder

                     1.json.gz  --> File

                      2.json.gz

        <day=201803250> ---> Empty Zero Bytes Files.

Please find below screenshot

[cid:part2.03F0F4D8.6DB0963A@outlook.com]

We are trying to create external table with JSON Serde.

ADD JAR wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar<mailto:wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar>;
 SET hive.mapred.supports.subdirectories=TRUE;
 SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;


 DROP TABLE IF EXISTS Ext_STG1;
 CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 'wasb://container1@xyz.blob.core.windows.net/date/day=201803250/<mailto:wasb://container1@xyz.blob.core.windows.net/date/day=201803250/>' TBLPROPERTIES ('serialization.null.format' = '');

select * from Ext_STG1 limit 100;


Above Query shows Empty Results.


When I delete Zero bytes files, then i could see data from select external table. Is this expected behaviour. Is there any setting for ignoring Zero bytes files in hive external table


-Mahens

--
Thai