You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Fawze Abujaber <fa...@gmail.com> on 2019/02/27 17:20:58 UTC
advise for dim table
Hi Community,
I have a process importing file from vertica to hadoop and i create an
impala table on it, i start see weird behaviour where 0 INT value are
missing in the table, when i create the table and changing the column to
STRING is see the record.
*** The table definition in vertica is INT
Any quick insight.
hdfs dfs -cat /fawze/dim_agent_status_v2p/temp_data_8240.txt
0 Login
1 Logout
2 Online
4 Away
3 Back Soon
CREATE TABLE default.analytics_dim_agent_status_v2p (
state_id INT,
state_name STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
STORED AS TEXTFILE
LOCATION '/fawze/dim_agent_status_v2p'
select * from analytics_dim_agent_status_v2p
1 Logout
2 Online
4 Away
3 Back Soon
Fetched 4 row(s) in 0.01s
===============
CREATE TABLE default.analytics_dim_agent_status_v3p (
state_id STRING,
state_name STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
STORED AS TEXTFILE
LOCATION '/fawze//analytics_dim_agent_status_v2p'
select * from analytics_dim_agent_status_v3p
0 Login
1 Logout
2 Online
4 Away
3 Back Soon
Fetched 5 row(s) in 0.11s
--
Take Care
Fawze Abujaber
Re: advise for dim table
Posted by Tim Armstrong <ta...@cloudera.com>.
I can't think of a reason why 0 would be parsed differently from 1, 2, 3,
or 4. I can think of reasons why the first line would be dropped
- You have a unicode BOM at the start of the file and Impala is hitting
a parse error. You could try setting abort_on_error=True and seeing if the
query fails. https://issues.apache.org/jira/browse/IMPALA-3478
- You have skip.header.line.count set on the table so the initial rows
in each file are skipped
On Wed, Feb 27, 2019 at 9:22 AM Fawze Abujaber <fa...@gmail.com> wrote:
> Hi Community,
>
> I have a process importing file from vertica to hadoop and i create an
> impala table on it, i start see weird behaviour where 0 INT value are
> missing in the table, when i create the table and changing the column to
> STRING is see the record.
>
> *** The table definition in vertica is INT
>
> Any quick insight.
>
>
>
> hdfs dfs -cat /fawze/dim_agent_status_v2p/temp_data_8240.txt
> 0 Login
> 1 Logout
> 2 Online
> 4 Away
> 3 Back Soon
>
>
> CREATE TABLE default.analytics_dim_agent_status_v2p (
> state_id INT,
> state_name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
> STORED AS TEXTFILE
> LOCATION '/fawze/dim_agent_status_v2p'
>
>
> select * from analytics_dim_agent_status_v2p
>
> 1 Logout
> 2 Online
> 4 Away
> 3 Back Soon
> Fetched 4 row(s) in 0.01s
>
> ===============
>
> CREATE TABLE default.analytics_dim_agent_status_v3p (
> state_id STRING,
> state_name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
> STORED AS TEXTFILE
> LOCATION '/fawze//analytics_dim_agent_status_v2p'
>
>
> select * from analytics_dim_agent_status_v3p
>
> 0 Login
> 1 Logout
> 2 Online
> 4 Away
> 3 Back Soon
> Fetched 5 row(s) in 0.11s
>
> --
> Take Care
> Fawze Abujaber
>