You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Peter Leckie (JIRA)" <ji...@apache.org> on 2015/01/07 10:33:34 UTC

[jira] [Assigned] (FLUME-2570) Add option to not pad date fields

     [ https://issues.apache.org/jira/browse/FLUME-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Leckie reassigned FLUME-2570:
-----------------------------------

    Assignee: Peter Leckie

> Add option to not pad date fields
> ---------------------------------
>
>                 Key: FLUME-2570
>                 URL: https://issues.apache.org/jira/browse/FLUME-2570
>             Project: Flume
>          Issue Type: New Feature
>          Components: Configuration
>    Affects Versions: v1.5.1
>            Reporter: Peter Leckie
>            Assignee: Peter Leckie
>
> Although technically dates are padded, it would be valuable if Flume was able to format the date components such that they were expressed like integers, eg not padded.
> For example using the %y, %d or %m alias to create output directories referencing today's date like the following:
> /output/2014/3/5/
> The reason this would be so helpful is when importing the data into either Hive or Impala.
> First of all, Impala does not have an ability to pad partitions, so currently the only way to do this is to import the data with hive, then use Impala to access the data(well you could write custom code, however).
> Second, padding partitions in hive or impala causes issues for example pruning of padded partitions is not possible.
> The following is an example of a typical work flow:
> Data is imported into HDFS using flume with sink as follows:
> agent.sinks.snk_avro_snappy.hdfs.path = hdfs://hdfs/avro/year=%Y/month=%m/day=%d
> IMPALA reads the data as follows:
> create external table TestAvro (.....)
> partitioned by (Year int, Month int, Day int) stored as avro
> location '/avro';
> alter table TestAvro add if not exists partition(Year=cast(year(to_date(now())) as int), Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as int));
> Flume saves the output as
> hdfs://hdfs/avro/year=2014/month=12/day=01
> And Impala reads it as:
> hdfs://hdfs/avro/year=2014/month=12/day=1
> So this feature request is to add an ability to Flume to write data into a directory using today's date with no padding on the day or month field.
> Implementation details are not important, for example could add a macro which simply removes padding, instead of futzing with the date aliases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)