You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Ben O'Day (JIRA)" <ji...@apache.org> on 2013/10/16 20:41:47 UTC

[jira] [Updated] (CAMEL-6867) camel-hdfs - HdfsProducer filename collisions when Producer instance recreated

     [ https://issues.apache.org/jira/browse/CAMEL-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben O'Day updated CAMEL-6867:
-----------------------------

    Description: 
The HdfsProducer uses an instance variable (long splitNum) that is incremented to create unique output filenames in a given directory (seg0, seg1, etc).  

If the Producer instance is recreated (producer cache limit exceeded, server restart, etc), the splitNum variable is reset to 0.  This results in files being overwritten when using overwrite=true mode or throwing "The file already exists" errors when using overwrite=false mode.

We should switch to using a timestamp or some other unique generator to prevent filename collisions regardless of the Producer instance lifecycle for the same hdfs directory URL...



  was:
The HdfsProducer uses an instance variable (long splitNum) that is incremented to create unique output filenames in a given directory (seq0, seq1, etc).  

If the Producer instance is recreated (producer cache limit exceeded, server restart, etc), the splitNum variable is reset to 0.  This results in files being overwritten when using overwrite=true mode or throwing "The file already exists" errors when using overwrite=false mode.

We should switch to using a timestamp or some other unique generator to prevent filename collisions regardless of the Producer instance lifecycle for the same hdfs directory URL...




> camel-hdfs - HdfsProducer filename collisions when Producer instance recreated
> ------------------------------------------------------------------------------
>
>                 Key: CAMEL-6867
>                 URL: https://issues.apache.org/jira/browse/CAMEL-6867
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-hdfs
>            Reporter: Ben O'Day
>            Assignee: Ben O'Day
>             Fix For: Future
>
>
> The HdfsProducer uses an instance variable (long splitNum) that is incremented to create unique output filenames in a given directory (seg0, seg1, etc).  
> If the Producer instance is recreated (producer cache limit exceeded, server restart, etc), the splitNum variable is reset to 0.  This results in files being overwritten when using overwrite=true mode or throwing "The file already exists" errors when using overwrite=false mode.
> We should switch to using a timestamp or some other unique generator to prevent filename collisions regardless of the Producer instance lifecycle for the same hdfs directory URL...



--
This message was sent by Atlassian JIRA
(v6.1#6144)