You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rasit OZDAS <ra...@gmail.com> on 2009/04/02 15:02:24 UTC

Re: HELP: I wanna store the output value into a list not write to the disk

Hi, hadoop is normally designed to write to disk. There are a special file
format, which writes output to RAM instead of disk.
But I don't have an idea if it's what you're looking for.
If what you said exists, there should be a mechanism which sends output as
objects rather than file content across computers, as far as I know there is
no such feature yet.

Good luck.

2009/4/2 andy2005cst <an...@gmail.com>

>
> I need to use the output of the reduce, but I don't know how to do.
> use the wordcount program as an example if i want to collect the wordcount
> into a hashtable for further use, how can i do?
> the example just show how to let the result onto disk.
> myemail is : andy2005cst@gmail.com
> looking forward your help. thanks a lot.
> --
> View this message in context:
> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
M. Raşit ÖZDAŞ

Re: HELP: I wanna store the output value into a list not write to the disk

Posted by He Chen <ai...@gmail.com>.
It seems like the InMemoryFileSystem class has been deprecated in Hadoop
0.19.1. Why?

I want to reuse the result of reduce as the next time map's input. Cascading
does not work, because the data of each step is dependent. I set each
timestep mapreduce job as synchronization. If the InMemoryFileSystem is
deprecated. How can I reduce the I/O for each timestep's mapreduce job.

2009/4/2 Farhan Husain <ru...@gmail.com>

> Is there a way to implement some OutputCollector that can do what Andy
> wants
> to do?
>
> On Thu, Apr 2, 2009 at 10:21 AM, Rasit OZDAS <ra...@gmail.com> wrote:
>
> > Andy, I didn't try this feature. But I know that Yahoo had a
> > performance record with this file format.
> > I came across a file system included in hadoop code (probably that
> > one) when searching the source code.
> > Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem
> > But if you have a lot of big files, this approach won't be suitable I
> > think.
> >
> > Maybe someone can give further info.
> >
> > 2009/4/2 andy2005cst <an...@gmail.com>:
> > >
> > > thanks for your reply. Let me explain more clearly, since Map Reduce is
> > just
> > > one step of my program, I need to use the output of reduce for furture
> > > computation, so i do not need to want to wirte the output into disk,
> but
> > > wanna to get the collection or list of the output in RAM. if it
> directly
> > > wirtes into disk, I have to read it back into RAM again.
> > > you have mentioned a special file format, will you please show me what
> is
> > > it? and give some example if possible.
> > >
> > > thank you so much.
> > >
> > >
> > > Rasit OZDAS wrote:
> > >>
> > >> Hi, hadoop is normally designed to write to disk. There are a special
> > file
> > >> format, which writes output to RAM instead of disk.
> > >> But I don't have an idea if it's what you're looking for.
> > >> If what you said exists, there should be a mechanism which sends
> output
> > as
> > >> objects rather than file content across computers, as far as I know
> > there
> > >> is
> > >> no such feature yet.
> > >>
> > >> Good luck.
> > >>
> > >> 2009/4/2 andy2005cst <an...@gmail.com>
> > >>
> > >>>
> > >>> I need to use the output of the reduce, but I don't know how to do.
> > >>> use the wordcount program as an example if i want to collect the
> > >>> wordcount
> > >>> into a hashtable for further use, how can i do?
> > >>> the example just show how to let the result onto disk.
> > >>> myemail is : andy2005cst@gmail.com
> > >>> looking forward your help. thanks a lot.
> > >>> --
> > >>> View this message in context:
> > >>>
> >
> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
> > >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> M. Raşit ÖZDAŞ
> > >>
> > >>
> > >
> > > --
> > > View this message in context:
> >
> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
> > > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > >
> > >
> >
> >
> >
> > --
> > M. Raşit ÖZDAŞ
> >
>
>
>
> --
> Mohammad Farhan Husain
> Research Assistant
> Department of Computer Science
> Erik Jonsson School of Engineering and Computer Science
> University of Texas at Dallas
>



-- 
Chen He
RCF CSE Dept.
University of Nebraska-Lincoln
US

Re: HELP: I wanna store the output value into a list not write to the disk

Posted by Farhan Husain <ru...@gmail.com>.
Is there a way to implement some OutputCollector that can do what Andy wants
to do?

On Thu, Apr 2, 2009 at 10:21 AM, Rasit OZDAS <ra...@gmail.com> wrote:

> Andy, I didn't try this feature. But I know that Yahoo had a
> performance record with this file format.
> I came across a file system included in hadoop code (probably that
> one) when searching the source code.
> Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem
> But if you have a lot of big files, this approach won't be suitable I
> think.
>
> Maybe someone can give further info.
>
> 2009/4/2 andy2005cst <an...@gmail.com>:
> >
> > thanks for your reply. Let me explain more clearly, since Map Reduce is
> just
> > one step of my program, I need to use the output of reduce for furture
> > computation, so i do not need to want to wirte the output into disk, but
> > wanna to get the collection or list of the output in RAM. if it directly
> > wirtes into disk, I have to read it back into RAM again.
> > you have mentioned a special file format, will you please show me what is
> > it? and give some example if possible.
> >
> > thank you so much.
> >
> >
> > Rasit OZDAS wrote:
> >>
> >> Hi, hadoop is normally designed to write to disk. There are a special
> file
> >> format, which writes output to RAM instead of disk.
> >> But I don't have an idea if it's what you're looking for.
> >> If what you said exists, there should be a mechanism which sends output
> as
> >> objects rather than file content across computers, as far as I know
> there
> >> is
> >> no such feature yet.
> >>
> >> Good luck.
> >>
> >> 2009/4/2 andy2005cst <an...@gmail.com>
> >>
> >>>
> >>> I need to use the output of the reduce, but I don't know how to do.
> >>> use the wordcount program as an example if i want to collect the
> >>> wordcount
> >>> into a hashtable for further use, how can i do?
> >>> the example just show how to let the result onto disk.
> >>> myemail is : andy2005cst@gmail.com
> >>> looking forward your help. thanks a lot.
> >>> --
> >>> View this message in context:
> >>>
> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >>
> >> --
> >> M. Raşit ÖZDAŞ
> >>
> >>
> >
> > --
> > View this message in context:
> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>
>
>
> --
> M. Raşit ÖZDAŞ
>



-- 
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas

Re: HELP: I wanna store the output value into a list not write to the disk

Posted by Rasit OZDAS <ra...@gmail.com>.
Andy, I didn't try this feature. But I know that Yahoo had a
performance record with this file format.
I came across a file system included in hadoop code (probably that
one) when searching the source code.
Luckily I found it: org.apache.hadoop.fs.InMemoryFileSystem
But if you have a lot of big files, this approach won't be suitable I think.

Maybe someone can give further info.

2009/4/2 andy2005cst <an...@gmail.com>:
>
> thanks for your reply. Let me explain more clearly, since Map Reduce is just
> one step of my program, I need to use the output of reduce for furture
> computation, so i do not need to want to wirte the output into disk, but
> wanna to get the collection or list of the output in RAM. if it directly
> wirtes into disk, I have to read it back into RAM again.
> you have mentioned a special file format, will you please show me what is
> it? and give some example if possible.
>
> thank you so much.
>
>
> Rasit OZDAS wrote:
>>
>> Hi, hadoop is normally designed to write to disk. There are a special file
>> format, which writes output to RAM instead of disk.
>> But I don't have an idea if it's what you're looking for.
>> If what you said exists, there should be a mechanism which sends output as
>> objects rather than file content across computers, as far as I know there
>> is
>> no such feature yet.
>>
>> Good luck.
>>
>> 2009/4/2 andy2005cst <an...@gmail.com>
>>
>>>
>>> I need to use the output of the reduce, but I don't know how to do.
>>> use the wordcount program as an example if i want to collect the
>>> wordcount
>>> into a hashtable for further use, how can i do?
>>> the example just show how to let the result onto disk.
>>> myemail is : andy2005cst@gmail.com
>>> looking forward your help. thanks a lot.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> M. Raşit ÖZDAŞ
>>
>>
>
> --
> View this message in context: http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
M. Raşit ÖZDAŞ

Re: HELP: I wanna store the output value into a list not write to the disk

Posted by andy2005cst <an...@gmail.com>.
thanks for your reply. Let me explain more clearly, since Map Reduce is just
one step of my program, I need to use the output of reduce for furture
computation, so i do not need to want to wirte the output into disk, but
wanna to get the collection or list of the output in RAM. if it directly
wirtes into disk, I have to read it back into RAM again.
you have mentioned a special file format, will you please show me what is
it? and give some example if possible.

thank you so much.


Rasit OZDAS wrote:
> 
> Hi, hadoop is normally designed to write to disk. There are a special file
> format, which writes output to RAM instead of disk.
> But I don't have an idea if it's what you're looking for.
> If what you said exists, there should be a mechanism which sends output as
> objects rather than file content across computers, as far as I know there
> is
> no such feature yet.
> 
> Good luck.
> 
> 2009/4/2 andy2005cst <an...@gmail.com>
> 
>>
>> I need to use the output of the reduce, but I don't know how to do.
>> use the wordcount program as an example if i want to collect the
>> wordcount
>> into a hashtable for further use, how can i do?
>> the example just show how to let the result onto disk.
>> myemail is : andy2005cst@gmail.com
>> looking forward your help. thanks a lot.
>> --
>> View this message in context:
>> http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22844277.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> M. Raşit ÖZDAŞ
> 
> 

-- 
View this message in context: http://www.nabble.com/HELP%3A-I-wanna-store-the-output-value-into-a-list-not-write-to-the-disk-tp22844277p22848070.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.