You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "sathishkumar paramasivam (JIRA)" <ji...@apache.org> on 2018/04/10 00:12:00 UTC

[jira] [Created] (IMPALA-6829) how to get compressed hdfs file using impala or hive

sathishkumar paramasivam created IMPALA-6829:
------------------------------------------------

             Summary: how to get compressed hdfs file using impala or hive
                 Key: IMPALA-6829
                 URL: https://issues.apache.org/jira/browse/IMPALA-6829
             Project: IMPALA
          Issue Type: Question
            Reporter: sathishkumar paramasivam


hi,

 

i am doing the self learning now the impala and trying to enable the compression for the table but could not see the hdfs file getting the extension?

referring to 

[https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_txtfile.html]

but not sure how the final compressed file are creating. 

When I try sqoop, i can get the compress file.  please guide.
create table csv_compressed (a string, b string, c string)
  row format delimited fields terminated by ",";

insert into csv_compressed values
  ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
  ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
...make equivalent .gz, .bz2, and .snappy files and load them into same table directory...

select * from csv_compressed;
+--------------------+--------------------+----------------------+
| a                  | b                  | c                    |
+--------------------+--------------------+----------------------+
| one - snappy       | two - snappy       | three - snappy       |
| one - uncompressed | two - uncompressed | three - uncompressed |
| abc - uncompressed | xyz - uncompressed | 123 - uncompressed   |
| one - bz2          | two - bz2          | three - bz2          |
| abc - bz2          | xyz - bz2          | 123 - bz2            |
| one - gzip         | two - gzip         | three - gzip         |
| abc - gzip         | xyz - gzip         | 123 - gzip           |
+--------------------+--------------------+----------------------+

$ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/';
...truncated for readability...
75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed.snappy
79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_bz2.csv.bz2
80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_gzip.csv.gz
116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/dd414df64d67d49b_data.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)