You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ondřej Klimpera <kl...@fit.cvut.cz> on 2012/04/07 22:44:51 UTC

Creating and working with temporary file in a map() function

Hello,

I would like to ask you if it is possible to create and work with a 
temporary file while in a map function.

I suppose that map function is running on a single node in Hadoop 
cluster. So what is a safe way to create a temporary file and read from 
it in one map() run. If it is possible is there a size limit for the file.

The file can not be created before hadoop job is created. I need to 
create and process the file inside map().

Thanks for your answer.

Ondrej Klimpera.

Re: Creating and working with temporary file in a map() function

Posted by Ondřej Klimpera <kl...@fit.cvut.cz>.

I will, but deploying application on a cluster is now far away. Just 
finishing raw implementation. Cluster tuning is planed in the end of 
this month.

Thanks.

On 04/08/2012 09:06 PM, Harsh J wrote:
> It will work. Pseudo-distributed mode shouldn't be all that different
> from a fully distributed mode. Do let us know if it does not work as
> intended.
>
> On Sun, Apr 8, 2012 at 11:40 PM, Ondřej Klimpera<kl...@fit.cvut.cz>  wrote:
>> Thanks for your advise, File.createTempFile() works great, at least in
>> pseudo-ditributed mode, hope cluster solution will do the same work. You
>> saved me hours of trying...
>>
>>
>>
>> On 04/07/2012 11:29 PM, Harsh J wrote:
>>> MapReduce sets "mapred.child.tmp" for all tasks to be the Task
>>> Attempt's WorkingDir/tmp automatically. This also sets the
>>> -Djava.io.tmpdir prop for each task at JVM boot.
>>>
>>> Hence you may use the regular Java API to create a temporary file:
>>>
>>> http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)
>>>
>>> These files would also be automatically deleted away after the task
>>> attempt is done.
>>>
>>> On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera<kl...@fit.cvut.cz>
>>>   wrote:
>>>> Hello,
>>>>
>>>> I would like to ask you if it is possible to create and work with a
>>>> temporary file while in a map function.
>>>>
>>>> I suppose that map function is running on a single node in Hadoop
>>>> cluster.
>>>> So what is a safe way to create a temporary file and read from it in one
>>>> map() run. If it is possible is there a size limit for the file.
>>>>
>>>> The file can not be created before hadoop job is created. I need to
>>>> create
>>>> and process the file inside map().
>>>>
>>>> Thanks for your answer.
>>>>
>>>> Ondrej Klimpera.
>>>
>>>
>
>

Re: Creating and working with temporary file in a map() function

Posted by Harsh J <ha...@cloudera.com>.

It will work. Pseudo-distributed mode shouldn't be all that different
from a fully distributed mode. Do let us know if it does not work as
intended.

On Sun, Apr 8, 2012 at 11:40 PM, Ondřej Klimpera <kl...@fit.cvut.cz> wrote:
> Thanks for your advise, File.createTempFile() works great, at least in
> pseudo-ditributed mode, hope cluster solution will do the same work. You
> saved me hours of trying...
>
>
>
> On 04/07/2012 11:29 PM, Harsh J wrote:
>>
>> MapReduce sets "mapred.child.tmp" for all tasks to be the Task
>> Attempt's WorkingDir/tmp automatically. This also sets the
>> -Djava.io.tmpdir prop for each task at JVM boot.
>>
>> Hence you may use the regular Java API to create a temporary file:
>>
>> http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)
>>
>> These files would also be automatically deleted away after the task
>> attempt is done.
>>
>> On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera<kl...@fit.cvut.cz>
>>  wrote:
>>>
>>> Hello,
>>>
>>> I would like to ask you if it is possible to create and work with a
>>> temporary file while in a map function.
>>>
>>> I suppose that map function is running on a single node in Hadoop
>>> cluster.
>>> So what is a safe way to create a temporary file and read from it in one
>>> map() run. If it is possible is there a size limit for the file.
>>>
>>> The file can not be created before hadoop job is created. I need to
>>> create
>>> and process the file inside map().
>>>
>>> Thanks for your answer.
>>>
>>> Ondrej Klimpera.
>>
>>
>>
>



-- 
Harsh J

Re: Creating and working with temporary file in a map() function

Posted by Ondřej Klimpera <kl...@fit.cvut.cz>.

Thanks for your advise, File.createTempFile() works great, at least in 
pseudo-ditributed mode, hope cluster solution will do the same work. You 
saved me hours of trying...


On 04/07/2012 11:29 PM, Harsh J wrote:
> MapReduce sets "mapred.child.tmp" for all tasks to be the Task
> Attempt's WorkingDir/tmp automatically. This also sets the
> -Djava.io.tmpdir prop for each task at JVM boot.
>
> Hence you may use the regular Java API to create a temporary file:
> http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)
>
> These files would also be automatically deleted away after the task
> attempt is done.
>
> On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera<kl...@fit.cvut.cz>  wrote:
>> Hello,
>>
>> I would like to ask you if it is possible to create and work with a
>> temporary file while in a map function.
>>
>> I suppose that map function is running on a single node in Hadoop cluster.
>> So what is a safe way to create a temporary file and read from it in one
>> map() run. If it is possible is there a size limit for the file.
>>
>> The file can not be created before hadoop job is created. I need to create
>> and process the file inside map().
>>
>> Thanks for your answer.
>>
>> Ondrej Klimpera.
>
>

Re: Job, JobConf, and Configuration.

Posted by Harsh J <ha...@cloudera.com>.

The Job class encapsulates the Configuration object and manages it for
you. You can also get its reference out via Job.getConfiguration() ->
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/JobContext.html#getConfiguration()

Hence, when you do a: "Job job = new Job();", the internal
Configuration object is auto-created for you. This is how the
underlying constructor looks:

public Job() throws IOException {
    this(new Configuration());
}

On Mon, Apr 9, 2012 at 12:24 AM, JAX <ja...@gmail.com> wrote:
> Hi guys.  Just a theoretical question here : I notice in chapter 1 of the Hadoop orielly book that the "new API" example has *no* Configuration object.
>
> Why is that?
>
> I thought the new API still uses / needs a Configuration class when running jobs.
>
>
>
> Jay Vyas
> MMSB
> UCHC
>
> On Apr 7, 2012, at 4:29 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> MapReduce sets "mapred.child.tmp" for all tasks to be the Task
>> Attempt's WorkingDir/tmp automatically. This also sets the
>> -Djava.io.tmpdir prop for each task at JVM boot.
>>
>> Hence you may use the regular Java API to create a temporary file:
>> http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)
>>
>> These files would also be automatically deleted away after the task
>> attempt is done.
>>
>> On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera <kl...@fit.cvut.cz> wrote:
>>> Hello,
>>>
>>> I would like to ask you if it is possible to create and work with a
>>> temporary file while in a map function.
>>>
>>> I suppose that map function is running on a single node in Hadoop cluster.
>>> So what is a safe way to create a temporary file and read from it in one
>>> map() run. If it is possible is there a size limit for the file.
>>>
>>> The file can not be created before hadoop job is created. I need to create
>>> and process the file inside map().
>>>
>>> Thanks for your answer.
>>>
>>> Ondrej Klimpera.
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Job, JobConf, and Configuration.

Posted by JAX <ja...@gmail.com>.

Hi guys.  Just a theoretical question here : I notice in chapter 1 of the Hadoop orielly book that the "new API" example has *no* Configuration object.

Why is that? 

I thought the new API still uses / needs a Configuration class when running jobs.



Jay Vyas 
MMSB
UCHC

On Apr 7, 2012, at 4:29 PM, Harsh J <ha...@cloudera.com> wrote:

> MapReduce sets "mapred.child.tmp" for all tasks to be the Task
> Attempt's WorkingDir/tmp automatically. This also sets the
> -Djava.io.tmpdir prop for each task at JVM boot.
> 
> Hence you may use the regular Java API to create a temporary file:
> http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)
> 
> These files would also be automatically deleted away after the task
> attempt is done.
> 
> On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera <kl...@fit.cvut.cz> wrote:
>> Hello,
>> 
>> I would like to ask you if it is possible to create and work with a
>> temporary file while in a map function.
>> 
>> I suppose that map function is running on a single node in Hadoop cluster.
>> So what is a safe way to create a temporary file and read from it in one
>> map() run. If it is possible is there a size limit for the file.
>> 
>> The file can not be created before hadoop job is created. I need to create
>> and process the file inside map().
>> 
>> Thanks for your answer.
>> 
>> Ondrej Klimpera.
> 
> 
> 
> -- 
> Harsh J

Re: Creating and working with temporary file in a map() function

Posted by Harsh J <ha...@cloudera.com>.

MapReduce sets "mapred.child.tmp" for all tasks to be the Task
Attempt's WorkingDir/tmp automatically. This also sets the
-Djava.io.tmpdir prop for each task at JVM boot.

Hence you may use the regular Java API to create a temporary file:
http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)

These files would also be automatically deleted away after the task
attempt is done.

On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera <kl...@fit.cvut.cz> wrote:
> Hello,
>
> I would like to ask you if it is possible to create and work with a
> temporary file while in a map function.
>
> I suppose that map function is running on a single node in Hadoop cluster.
> So what is a safe way to create a temporary file and read from it in one
> map() run. If it is possible is there a size limit for the file.
>
> The file can not be created before hadoop job is created. I need to create
> and process the file inside map().
>
> Thanks for your answer.
>
> Ondrej Klimpera.



-- 
Harsh J