You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Romeo Kienzler <ro...@ormium.de> on 2011/11/22 08:01:08 UTC

Incremental Mappers?

Hi,

I'm planning to use Fume in order to stream data from a local client 
machine into HDFS running on a cloud environment.

Is there a way to start a mapper already on an incomplete file? As I 
know a file in HDFS has to be closed first before a mapper can start.

Is this true?

Any possible idea for a solution of this problem?

Or do I have to write smaller chunks of my big input file and create 
multiple files in HDFS and start a separate map task on each file once 
it has been closed?

Best Regards,

Romeo

Romeo Kienzler
r o m e o @ o r m i u m . d e

Re: Incremental Mappers?

Posted by Joey Echeverria <jo...@cloudera.com>.
You're correct, currently HDFS only supports reading from closed files. You can configure flume to write your data in small enough chunks so you can do incremental processing. 


-Joey



On Nov 22, 2011, at 2:01, Romeo Kienzler <ro...@ormium.de> wrote:

> Hi,
> 
> I'm planning to use Fume in order to stream data from a local client machine into HDFS running on a cloud environment.
> 
> Is there a way to start a mapper already on an incomplete file? As I know a file in HDFS has to be closed first before a mapper can start.
> 
> Is this true?
> 
> Any possible idea for a solution of this problem?
> 
> Or do I have to write smaller chunks of my big input file and create multiple files in HDFS and start a separate map task on each file once it has been closed?
> 
> Best Regards,
> 
> Romeo
> 
> Romeo Kienzler
> r o m e o @ o r m i u m . d e