You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "darkz (JIRA)" <ji...@apache.org> on 2017/08/29 03:30:00 UTC

[jira] [Comment Edited] (FLUME-1702) HDFSEventSink should write to a hidden file as opposed to a .tmp file

    [ https://issues.apache.org/jira/browse/FLUME-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943005#comment-15943005 ] 

darkz edited comment on FLUME-1702 at 8/29/17 3:29 AM:
-------------------------------------------------------

I select the .tmp data in hive,then it encounter an error:

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 1)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: java.io.ByteArrayInputStream@7730ef88; line: 1, column: 2]

I think is the compressed file with '.tmp' suffix is in use and is not a completed compressed file,so codec in hadoop colud not recognize the content of it

After all: Yes,I use the "." prefix to skip ".tmp" file,but the flume document dos not mention it...



was (Author: darkz):
I select the .tmp data in hive,then it cauth a error:

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 1)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: java.io.ByteArrayInputStream@7730ef88; line: 1, column: 2]

I think is the compressed file with '.tmp' suffix is in use and is not a completed compressed file,so codec in hadoop colud not recognize the content of it

After all:Yes,I use the "." prefix to skip ".tmp" file,but the flume docuent dos not mention it...


> HDFSEventSink should write to a hidden file as opposed to a .tmp file
> ---------------------------------------------------------------------
>
>                 Key: FLUME-1702
>                 URL: https://issues.apache.org/jira/browse/FLUME-1702
>             Project: Flume
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 1.4.0
>
>         Attachments: bugFLUME-1702.patch, bugFLUME-1702.patch
>
>
> Currently we write to a .tmp file. The problem is that if MR jobs are being run on the directory we are writing to, then it's common for an MR job to list the directory, get a .tmp file and then in the mean time the .tmp file is renamed causing the job to fail when run.
> Using JavaMR you can use a PathFilter to avoid this, however a custom solution is required for Pig, Hive, etc.
> Perhaps we should write to a hidden file so that MR never tries to process data in flight.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)