You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Fawze Abujaber <fa...@gmail.com> on 2019/02/27 17:20:58 UTC

advise for dim table

Hi Community,

I have a process importing file from vertica to hadoop and i create an
impala table on it, i start see weird behaviour where 0 INT value are
missing in the table, when i create the table and changing the column to
STRING is see the record.

*** The table definition in vertica is INT

Any quick insight.



hdfs dfs -cat /fawze/dim_agent_status_v2p/temp_data_8240.txt
0       Login
1       Logout
2       Online
4       Away
3       Back Soon


CREATE TABLE default.analytics_dim_agent_status_v2p (
  state_id INT,
  state_name STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
STORED AS TEXTFILE
LOCATION '/fawze/dim_agent_status_v2p'


select * from analytics_dim_agent_status_v2p

1       Logout
2       Online
4       Away
3       Back Soon
Fetched 4 row(s) in 0.01s

===============

CREATE TABLE default.analytics_dim_agent_status_v3p (
  state_id STRING,
  state_name STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
STORED AS TEXTFILE
LOCATION '/fawze//analytics_dim_agent_status_v2p'


 select * from analytics_dim_agent_status_v3p

0       Login
1       Logout
2       Online
4       Away
3       Back Soon
Fetched 5 row(s) in 0.11s

-- 
Take Care
Fawze Abujaber

Re: advise for dim table

Posted by Tim Armstrong <ta...@cloudera.com>.
I can't think of a reason why 0 would be parsed differently from 1, 2, 3,
or 4. I can think of reasons why the first line would be dropped

   - You have a unicode BOM at the start of the file and Impala is hitting
   a parse error. You could try setting abort_on_error=True and seeing if the
   query fails. https://issues.apache.org/jira/browse/IMPALA-3478
   - You have skip.header.line.count set on the table so the initial rows
   in each file are skipped


On Wed, Feb 27, 2019 at 9:22 AM Fawze Abujaber <fa...@gmail.com> wrote:

> Hi Community,
>
> I have a process importing file from vertica to hadoop and i create an
> impala table on it, i start see weird behaviour where 0 INT value are
> missing in the table, when i create the table and changing the column to
> STRING is see the record.
>
> *** The table definition in vertica is INT
>
> Any quick insight.
>
>
>
> hdfs dfs -cat /fawze/dim_agent_status_v2p/temp_data_8240.txt
> 0       Login
> 1       Logout
> 2       Online
> 4       Away
> 3       Back Soon
>
>
> CREATE TABLE default.analytics_dim_agent_status_v2p (
>   state_id INT,
>   state_name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
> STORED AS TEXTFILE
> LOCATION '/fawze/dim_agent_status_v2p'
>
>
> select * from analytics_dim_agent_status_v2p
>
> 1       Logout
> 2       Online
> 4       Away
> 3       Back Soon
> Fetched 4 row(s) in 0.01s
>
> ===============
>
> CREATE TABLE default.analytics_dim_agent_status_v3p (
>   state_id STRING,
>   state_name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> WITH SERDEPROPERTIES ('field.delim'='\t', 'serialization.format'='\t')
> STORED AS TEXTFILE
> LOCATION '/fawze//analytics_dim_agent_status_v2p'
>
>
>  select * from analytics_dim_agent_status_v3p
>
> 0       Login
> 1       Logout
> 2       Online
> 4       Away
> 3       Back Soon
> Fetched 5 row(s) in 0.11s
>
> --
> Take Care
> Fawze Abujaber
>