You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Elton Pinto <ep...@gmail.com> on 2010/09/02 19:35:05 UTC

Read After Write Consistency in HDFS

Hello,

I apologize if this topic has already been brought up, but I was unable to
find it by searching around.

We recently discovered in issue in one of our jobs where the output of one
job does not seem to be making it into another job. The first job is a
loader job that's just a map step for asynchronously downloading external
data in multiple threads and then writing to HDFS directly (i.e. not using
the OutputCollector) using FileSystem and FSDataOutputStream. I believe we
did this because we had issues in this situation with writing using the
OutputCollector.

The job that consumes this data runs directly after taking as the input
directory the output directory of the loader job. Very rarely, it looks like
not all the files are being consumed though, which we assume means that they
weren't yet propagated to HDFS yet. The volume of data being loaded is on
the order of 10 GB.

Our fix that we're working on for this is to append the number of files
(i.e. number of mappers) to the file name and then checking that to ensure
that the actual number of files match expected, but I had a few questions
about this issue:

1) Has anyone else seen anything like this? Is read after write consistency
just not guaranteed on HDFS?
2) Could it be an issue because we're not using an OutputCollector?
3) Does anyone know an easy way to change the file name that the
OutputCollector uses? MultipleTextOutputFormat seems to only take in a
key/value pair to create file names whereas what we really want is the
JobConf so we can get the task number and the total number of tasks. If the
OutputCollector is also affected by this issue, then we have other jobs that
we need to set up this kind of check for.

Thanks,

Elton

eptiger@gmail.com
epinto@alumni.cs.utexas.edu
http://www.eltonpinto.net/

Re: Read After Write Consistency in HDFS

Posted by Elton Pinto <ep...@gmail.com>.

Well, we have the jobs run in serial. I'm 100% positive that our job
consuming the loader output started after it completed, where completion is
according to Hadoop. The delay between the end of that job and the start of
the next one is not likely more than a few seconds though.

I'm not sure that there is any problem with the OutputCollector - we haven't
tried verifying the amount of data in vs. out on our jobs using the
OutputCollector. We're 90% sure though that when using FileSystem directly
to write to HDFS in the loader, not all the loader output files are
available on HDFS as soon as the job has completed. I was wondering if
anyone would know if the OutputCollector isn't susceptible to this problem.

I say 90% because I'm working on deploying the fix right now and we'll know
from grepping the logs if it ever found a mismatch in expected vs actual
files and had to re-run the job. Also, the only metric we have for number of
maps appears to be TOTAL_LAUNCHED_MAPS from Hadoop, which includes maps from
speculative execution (though from looking at this job, it tends to use
speculative execution very rarely). So that's our only "proof" that not all
the files are being read in - by comparing input file number to total
launched maps from the loader.

Elton

eptiger@gmail.com
epinto@alumni.cs.utexas.edu
http://www.eltonpinto.net/

On Thu, Sep 2, 2010 at 11:45 AM, Ted Yu <yu...@gmail.com> wrote:

> One possibility, due to the asynchronous nature of your loader, was that
> the consumer job started before all files from loader were written
> (propagated) completely.
>
> Can you describe what problem you encountered with OutputCollector ?
>
>
> On Thu, Sep 2, 2010 at 10:35 AM, Elton Pinto <ep...@gmail.com> wrote:
>
>> Hello,
>>
>> I apologize if this topic has already been brought up, but I was unable to
>> find it by searching around.
>>
>> We recently discovered in issue in one of our jobs where the output of one
>> job does not seem to be making it into another job. The first job is a
>> loader job that's just a map step for asynchronously downloading external
>> data in multiple threads and then writing to HDFS directly (i.e. not using
>> the OutputCollector) using FileSystem and FSDataOutputStream. I believe we
>> did this because we had issues in this situation with writing using the
>> OutputCollector.
>>
>> The job that consumes this data runs directly after taking as the input
>> directory the output directory of the loader job. Very rarely, it looks like
>> not all the files are being consumed though, which we assume means that they
>> weren't yet propagated to HDFS yet. The volume of data being loaded is on
>> the order of 10 GB.
>>
>> Our fix that we're working on for this is to append the number of files
>> (i.e. number of mappers) to the file name and then checking that to ensure
>> that the actual number of files match expected, but I had a few questions
>> about this issue:
>>
>> 1) Has anyone else seen anything like this? Is read after write
>> consistency just not guaranteed on HDFS?
>> 2) Could it be an issue because we're not using an OutputCollector?
>> 3) Does anyone know an easy way to change the file name that the
>> OutputCollector uses? MultipleTextOutputFormat seems to only take in a
>> key/value pair to create file names whereas what we really want is the
>> JobConf so we can get the task number and the total number of tasks. If the
>> OutputCollector is also affected by this issue, then we have other jobs that
>> we need to set up this kind of check for.
>>
>> Thanks,
>>
>> Elton
>>
>> eptiger@gmail.com
>> epinto@alumni.cs.utexas.edu
>> http://www.eltonpinto.net/
>>
>
>

Re: Read After Write Consistency in HDFS

Posted by Ted Yu <yu...@gmail.com>.

One possibility, due to the asynchronous nature of your loader, was that the
consumer job started before all files from loader were written (propagated)
completely.

Can you describe what problem you encountered with OutputCollector ?

On Thu, Sep 2, 2010 at 10:35 AM, Elton Pinto <ep...@gmail.com> wrote:

> Hello,
>
> I apologize if this topic has already been brought up, but I was unable to
> find it by searching around.
>
> We recently discovered in issue in one of our jobs where the output of one
> job does not seem to be making it into another job. The first job is a
> loader job that's just a map step for asynchronously downloading external
> data in multiple threads and then writing to HDFS directly (i.e. not using
> the OutputCollector) using FileSystem and FSDataOutputStream. I believe we
> did this because we had issues in this situation with writing using the
> OutputCollector.
>
> The job that consumes this data runs directly after taking as the input
> directory the output directory of the loader job. Very rarely, it looks like
> not all the files are being consumed though, which we assume means that they
> weren't yet propagated to HDFS yet. The volume of data being loaded is on
> the order of 10 GB.
>
> Our fix that we're working on for this is to append the number of files
> (i.e. number of mappers) to the file name and then checking that to ensure
> that the actual number of files match expected, but I had a few questions
> about this issue:
>
> 1) Has anyone else seen anything like this? Is read after write consistency
> just not guaranteed on HDFS?
> 2) Could it be an issue because we're not using an OutputCollector?
> 3) Does anyone know an easy way to change the file name that the
> OutputCollector uses? MultipleTextOutputFormat seems to only take in a
> key/value pair to create file names whereas what we really want is the
> JobConf so we can get the task number and the total number of tasks. If the
> OutputCollector is also affected by this issue, then we have other jobs that
> we need to set up this kind of check for.
>
> Thanks,
>
> Elton
>
> eptiger@gmail.com
> epinto@alumni.cs.utexas.edu
> http://www.eltonpinto.net/
>