You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by lessonz <le...@q.com> on 2011/09/27 23:47:38 UTC

Temporary Files to be sent to DistributedCache

I have a need to write information retrieved from a database to a series of
files that need to be made available to my mappers. Because each mapper
needs access to all of these files, I want to put them in the
DistributedCache. Is there a preferred method to writing new information to
the DistributedCache? I can use Java's File.createTempFile(String prefix,
String suffix), but that uses the system default temporary folder. While
that should usually work, I'd rather have a method that doesn't depend on
writing to the local file system before copying files to the
DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
something obvious.

Thank you for your time.

Re: Temporary Files to be sent to DistributedCache

Posted by Linden Hillenbrand <li...@cloudera.com>.
Most likely the easiest and fastest way as you will be leveraging the
distributed ingestion of Sqoop, rather than a single-thread import some
other way.

On Wed, Sep 28, 2011 at 12:27 AM, lessonz <le...@q.com> wrote:

> So, I thought about that, and I'd considered writing to the HDFS and then
> copying the file into the DistributedCache so each mapper/reducer doesn't
> have to reach into the HDFS for these files. Is that the "best" way to
> handle this?
>
> On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000) <
> matthew.goeke@monsanto.com> wrote:
>
> > The simplest route I can think of is to ingest the data directly into
> HDFS
> > using Sqoop if there is a driver currently made for your database. At
> that
> > point it would be relatively simple just to read directly from HDFS in
> your
> > MR code.
> >
> > Matt
> >
> > -----Original Message-----
> > From: lessonz [mailto:lessonz@q.com]
> > Sent: Tuesday, September 27, 2011 4:48 PM
> > To: common-user@hadoop.apache.org
> > Subject: Temporary Files to be sent to DistributedCache
> >
> > I have a need to write information retrieved from a database to a series
> of
> > files that need to be made available to my mappers. Because each mapper
> > needs access to all of these files, I want to put them in the
> > DistributedCache. Is there a preferred method to writing new information
> to
> > the DistributedCache? I can use Java's File.createTempFile(String prefix,
> > String suffix), but that uses the system default temporary folder. While
> > that should usually work, I'd rather have a method that doesn't depend on
> > writing to the local file system before copying files to the
> > DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
> > something obvious.
> >
> > Thank you for your time.
> > This e-mail message may contain privileged and/or confidential
> information,
> > and is intended to be received only by persons entitled
> > to receive such information. If you have received this e-mail in error,
> > please notify the sender immediately. Please delete it and
> > all attachments from any servers, hard drives or any other media. Other
> use
> > of this e-mail by you is strictly prohibited.
> >
> > All e-mails and attachments sent and received are subject to monitoring,
> > reading and archival by Monsanto, including its
> > subsidiaries. The recipient of this e-mail is solely responsible for
> > checking for the presence of "Viruses" or other "Malware".
> > Monsanto, along with its subsidiaries, accepts no liability for any
> damage
> > caused by any such code transmitted by or accompanying
> > this e-mail or any attachment.
> >
> >
> > The information contained in this email may be subject to the export
> > control laws and regulations of the United States, potentially
> > including but not limited to the Export Administration Regulations (EAR)
> > and sanctions regulations issued by the U.S. Department of
> > Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of
> this
> > information you are obligated to comply with all
> > applicable U.S. export laws and regulations.
> >
> >
>



-- 
Linden Hillenbrand
Customer Operations Engineer

Phone:  650.644.3900 x4946
Email:   linden@cloudera.com
Twitter: @lhillenbrand
Data:    http://www.cloudera.com

Re: Temporary Files to be sent to DistributedCache

Posted by lessonz <le...@q.com>.
So, I thought about that, and I'd considered writing to the HDFS and then
copying the file into the DistributedCache so each mapper/reducer doesn't
have to reach into the HDFS for these files. Is that the "best" way to
handle this?

On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000) <
matthew.goeke@monsanto.com> wrote:

> The simplest route I can think of is to ingest the data directly into HDFS
> using Sqoop if there is a driver currently made for your database. At that
> point it would be relatively simple just to read directly from HDFS in your
> MR code.
>
> Matt
>
> -----Original Message-----
> From: lessonz [mailto:lessonz@q.com]
> Sent: Tuesday, September 27, 2011 4:48 PM
> To: common-user@hadoop.apache.org
> Subject: Temporary Files to be sent to DistributedCache
>
> I have a need to write information retrieved from a database to a series of
> files that need to be made available to my mappers. Because each mapper
> needs access to all of these files, I want to put them in the
> DistributedCache. Is there a preferred method to writing new information to
> the DistributedCache? I can use Java's File.createTempFile(String prefix,
> String suffix), but that uses the system default temporary folder. While
> that should usually work, I'd rather have a method that doesn't depend on
> writing to the local file system before copying files to the
> DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
> something obvious.
>
> Thank you for your time.
> This e-mail message may contain privileged and/or confidential information,
> and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use
> of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>
>

RE: Temporary Files to be sent to DistributedCache

Posted by "GOEKE, MATTHEW (AG/1000)" <ma...@monsanto.com>.
The simplest route I can think of is to ingest the data directly into HDFS using Sqoop if there is a driver currently made for your database. At that point it would be relatively simple just to read directly from HDFS in your MR code. 

Matt

-----Original Message-----
From: lessonz [mailto:lessonz@q.com] 
Sent: Tuesday, September 27, 2011 4:48 PM
To: common-user@hadoop.apache.org
Subject: Temporary Files to be sent to DistributedCache

I have a need to write information retrieved from a database to a series of
files that need to be made available to my mappers. Because each mapper
needs access to all of these files, I want to put them in the
DistributedCache. Is there a preferred method to writing new information to
the DistributedCache? I can use Java's File.createTempFile(String prefix,
String suffix), but that uses the system default temporary folder. While
that should usually work, I'd rather have a method that doesn't depend on
writing to the local file system before copying files to the
DistributedCache. As I'm extremely new to Hadoop, I hope I'm not missing
something obvious.

Thank you for your time.
This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
applicable U.S. export laws and regulations.