You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Venu Yanamandra (JIRA)" <ji...@apache.org> on 2017/08/04 19:18:00 UTC

[jira] [Created] (IMPALA-5766) Automatic String to Timestamp conversion could honor various date separators apart from hyphen (-)

Venu Yanamandra created IMPALA-5766:
---------------------------------------

             Summary: Automatic String to Timestamp conversion could honor various date separators apart from hyphen (-)
                 Key: IMPALA-5766
                 URL: https://issues.apache.org/jira/browse/IMPALA-5766
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
         Environment: impala 2.2
            Reporter: Venu Yanamandra
            Priority: Minor


Could we support more than one date separator while automatically parsing string timestamp columns. If possible, could we also enhance to specify a common timestamp format for a table?
In detail:
------------
  External table:
  CREATE EXTERNAL TABLE videowatchactivity(
    id int,
    user string,
    activitydttm timestamp,
    videoid string,
    activity string) 
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ',' 
      LOCATION '/user/videologger/activity';

If  HDFS file: "/user/videologger/activity/activity_aug_4_2017.csv" contains --> 
Venu,2017-08-04 11:23:00,video_id_0,start
Venu,2017-08-04 11:25:00,video_id_0,stop
Then,
"select activitydttm from videowatchactivity;"  outputs 2 valid rows.

If HDFS file: /user/videologger/activity/activity_aug_4_2017.csv contains -->
Venu,2017/08/04 11:23:00,video_id_0,start
Venu,2017/08/04 11:25:00,video_id_0,stop
Then,
"select activitydttm from videowatchactivity;"  outputs 2 NULL rows.

Could there be a provision to specify the date separator in the create table statement that could be used in the logic to automatically parse strings to timestamp columns?
E.g.,
CREATE EXTERNAL TABLE videowatchactivity(
    id int,
    user string,
    activitydttm timestamp,
    videoid string,
    activity string) 
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ',' 
      LOCATION '/user/videologger/activity'
      DATESEPARATOR '/';
Please note the new "DATESEPARATOR '/'" that could help us define any separator for the date field. 

If possible to extend, we could also use a new DATEFMT property of the table to specify a format like==> MM-dd-YYYY HH:mm:ss instead of the default format too.

Thanks,






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)