You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by lei wang <ha...@gmail.com> on 2009/10/28 09:14:18 UTC

Need Help: The problem with text key of MapFile

Hi, friends
I need store the web pages(a huge one) in the MapFile of the hadoop, So i
did use the url as the key, and its type is "text", When  writring the
records into the mapfile, it give an error as "out of order", which type
should I choose to represent  the key "url", can anyone give me some detail
answer, thanks for you help.

Re: Need Help: The problem with text key of MapFile

Posted by lei wang <ha...@gmail.com>.
I cann't understand why you give me two web sites.

On Thu, Oct 29, 2009 at 10:27 AM, Lori Ann Martin <lm...@altair.com>wrote:

> heck out www.HiQube.com or www.pbsgridworks.com
>
> -----Original Message-----
> From: lei wang [mailto:hadoopmaillist@gmail.com]
> Sent: Wednesday, October 28, 2009 7:22 PM
> To: general@hadoop.apache.org
> Subject: Re: Need Help: The problem with text key of MapFile
>
> Oh, I  have tried hbase in the early.
> But I think HDFS may give me a choice.
> Thanks.
>
> On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
> > I guess maybe HBase will be fit for you.   HBase is a distributed
> database
> > built upon Hadoop.
> > You can use the url as the row key and put other fields into columns.
> >
> > then you can retrieve the web page through HBase Client API and insert
> new
> > web page into it. The performance of HBase 0.20 is good enough for you.
> >
> > Best Regards,
> > Jeff zhang
> >
> >
> > On Thu, Oct 29, 2009 at 8:53 AM, lei wang <ha...@gmail.com>
> > wrote:
> >
> > > hi,juff, thanks for your comments.
> > >   I did read this book early, I use MapFile to store my web pages for
> > > random access.
> > > First I think the SquenceFile conversion as a solution, howerve, the
> > > problem is that I need append the new pages to the MapFile by minute
> > > or second, so I didn't think SquenceFile conversion can manage this.
> > > Would you give me some suggestion? Think your very much!
> > >
> > > Best wishes.
> > >
> > > On 10/28/09, Jeff Zhang <zj...@gmail.com> wrote:
> > > > I do not know why you need use MapFile, could you use SequenceFile
> > > instead ?
> > > >
> > > > The MapFile's advantage is its read performance, because it build
> index
> > > on
> > > > its keys. So its keys must be in order.
> > > >
> > > > If you really want to use MapFile, you can first write your data to
> > > > SequenceFile and then covert it to MapFile.
> > > >
> > > > About  how to convert SequenceFile to MapFile:
> > > > 1. Sort the SequenceFile using sort in examples of hadoop
> > > > 2. create index for the output of the above step. then you get both
> of
> > > the
> > > > data file and index file
> > > >
> > > >
> > > > You an refer Tom Whilte's book "Hadoop definitive guide" for details
> > > about
> > > > how to convert SequenceFile into MapFile
> > > >
> > > > Jeff Zhang
> > > >
> > > >
> > > >
> > > > On Wed, Oct 28, 2009 at 4:47 PM, lei wang <ha...@gmail.com>
> > > wrote:
> > > >
> > > >> but now, "url" is not in order,  must the key be intwritable ?
> should
> > it
> > > >>  be
> > > >> comparable ?
> > > >> How to make sure them in order?sort it first?
> > > >> I just want to insert the pages for  random acess by "url ".
> > > >>
> > > >> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zj...@gmail.com>
> wrote:
> > > >>
> > > >> > Hi Wang,
> > > >> >
> > > >> > The keys of MapFile should be in order, so when you add records
> into
> > > >> > MapFile, you should make sure you insert them in order
> > > >> >
> > > >> > Best Regards,
> > > >> >
> > > >> > Jeff Zhang
> > > >> >
> > > >> >
> > > >> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <
> hadoopmaillist@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi, friends
> > > >> > > I need store the web pages(a huge one) in the MapFile of the
> > hadoop,
> > > >> > > So
> > > >> i
> > > >> > > did use the url as the key, and its type is "text", When
>  writring
> > > the
> > > >> > > records into the mapfile, it give an error as "out of order",
> > which
> > > >> type
> > > >> > > should I choose to represent  the key "url", can anyone give me
> > some
> > > >> > detail
> > > >> > > answer, thanks for you help.
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

RE: Need Help: The problem with text key of MapFile

Posted by Lori Ann Martin <lm...@altair.com>.
heck out www.HiQube.com or www.pbsgridworks.com

-----Original Message-----
From: lei wang [mailto:hadoopmaillist@gmail.com] 
Sent: Wednesday, October 28, 2009 7:22 PM
To: general@hadoop.apache.org
Subject: Re: Need Help: The problem with text key of MapFile

Oh, I  have tried hbase in the early.
But I think HDFS may give me a choice.
Thanks.

On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang <zj...@gmail.com> wrote:

> I guess maybe HBase will be fit for you.   HBase is a distributed database
> built upon Hadoop.
> You can use the url as the row key and put other fields into columns.
>
> then you can retrieve the web page through HBase Client API and insert new
> web page into it. The performance of HBase 0.20 is good enough for you.
>
> Best Regards,
> Jeff zhang
>
>
> On Thu, Oct 29, 2009 at 8:53 AM, lei wang <ha...@gmail.com>
> wrote:
>
> > hi,juff, thanks for your comments.
> >   I did read this book early, I use MapFile to store my web pages for
> > random access.
> > First I think the SquenceFile conversion as a solution, howerve, the
> > problem is that I need append the new pages to the MapFile by minute
> > or second, so I didn't think SquenceFile conversion can manage this.
> > Would you give me some suggestion? Think your very much!
> >
> > Best wishes.
> >
> > On 10/28/09, Jeff Zhang <zj...@gmail.com> wrote:
> > > I do not know why you need use MapFile, could you use SequenceFile
> > instead ?
> > >
> > > The MapFile's advantage is its read performance, because it build index
> > on
> > > its keys. So its keys must be in order.
> > >
> > > If you really want to use MapFile, you can first write your data to
> > > SequenceFile and then covert it to MapFile.
> > >
> > > About  how to convert SequenceFile to MapFile:
> > > 1. Sort the SequenceFile using sort in examples of hadoop
> > > 2. create index for the output of the above step. then you get both of
> > the
> > > data file and index file
> > >
> > >
> > > You an refer Tom Whilte's book "Hadoop definitive guide" for details
> > about
> > > how to convert SequenceFile into MapFile
> > >
> > > Jeff Zhang
> > >
> > >
> > >
> > > On Wed, Oct 28, 2009 at 4:47 PM, lei wang <ha...@gmail.com>
> > wrote:
> > >
> > >> but now, "url" is not in order,  must the key be intwritable ? should
> it
> > >>  be
> > >> comparable ?
> > >> How to make sure them in order?sort it first?
> > >> I just want to insert the pages for  random acess by "url ".
> > >>
> > >> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zj...@gmail.com> wrote:
> > >>
> > >> > Hi Wang,
> > >> >
> > >> > The keys of MapFile should be in order, so when you add records into
> > >> > MapFile, you should make sure you insert them in order
> > >> >
> > >> > Best Regards,
> > >> >
> > >> > Jeff Zhang
> > >> >
> > >> >
> > >> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <hadoopmaillist@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Hi, friends
> > >> > > I need store the web pages(a huge one) in the MapFile of the
> hadoop,
> > >> > > So
> > >> i
> > >> > > did use the url as the key, and its type is "text", When  writring
> > the
> > >> > > records into the mapfile, it give an error as "out of order",
> which
> > >> type
> > >> > > should I choose to represent  the key "url", can anyone give me
> some
> > >> > detail
> > >> > > answer, thanks for you help.
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Need Help: The problem with text key of MapFile

Posted by lei wang <ha...@gmail.com>.
Oh, I  have tried hbase in the early.
But I think HDFS may give me a choice.
Thanks.

On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang <zj...@gmail.com> wrote:

> I guess maybe HBase will be fit for you.   HBase is a distributed database
> built upon Hadoop.
> You can use the url as the row key and put other fields into columns.
>
> then you can retrieve the web page through HBase Client API and insert new
> web page into it. The performance of HBase 0.20 is good enough for you.
>
> Best Regards,
> Jeff zhang
>
>
> On Thu, Oct 29, 2009 at 8:53 AM, lei wang <ha...@gmail.com>
> wrote:
>
> > hi,juff, thanks for your comments.
> >   I did read this book early, I use MapFile to store my web pages for
> > random access.
> > First I think the SquenceFile conversion as a solution, howerve, the
> > problem is that I need append the new pages to the MapFile by minute
> > or second, so I didn't think SquenceFile conversion can manage this.
> > Would you give me some suggestion? Think your very much!
> >
> > Best wishes.
> >
> > On 10/28/09, Jeff Zhang <zj...@gmail.com> wrote:
> > > I do not know why you need use MapFile, could you use SequenceFile
> > instead ?
> > >
> > > The MapFile's advantage is its read performance, because it build index
> > on
> > > its keys. So its keys must be in order.
> > >
> > > If you really want to use MapFile, you can first write your data to
> > > SequenceFile and then covert it to MapFile.
> > >
> > > About  how to convert SequenceFile to MapFile:
> > > 1. Sort the SequenceFile using sort in examples of hadoop
> > > 2. create index for the output of the above step. then you get both of
> > the
> > > data file and index file
> > >
> > >
> > > You an refer Tom Whilte's book "Hadoop definitive guide" for details
> > about
> > > how to convert SequenceFile into MapFile
> > >
> > > Jeff Zhang
> > >
> > >
> > >
> > > On Wed, Oct 28, 2009 at 4:47 PM, lei wang <ha...@gmail.com>
> > wrote:
> > >
> > >> but now, "url" is not in order,  must the key be intwritable ? should
> it
> > >>  be
> > >> comparable ?
> > >> How to make sure them in order?sort it first?
> > >> I just want to insert the pages for  random acess by "url ".
> > >>
> > >> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zj...@gmail.com> wrote:
> > >>
> > >> > Hi Wang,
> > >> >
> > >> > The keys of MapFile should be in order, so when you add records into
> > >> > MapFile, you should make sure you insert them in order
> > >> >
> > >> > Best Regards,
> > >> >
> > >> > Jeff Zhang
> > >> >
> > >> >
> > >> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <hadoopmaillist@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Hi, friends
> > >> > > I need store the web pages(a huge one) in the MapFile of the
> hadoop,
> > >> > > So
> > >> i
> > >> > > did use the url as the key, and its type is "text", When  writring
> > the
> > >> > > records into the mapfile, it give an error as "out of order",
> which
> > >> type
> > >> > > should I choose to represent  the key "url", can anyone give me
> some
> > >> > detail
> > >> > > answer, thanks for you help.
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Need Help: The problem with text key of MapFile

Posted by Jeff Zhang <zj...@gmail.com>.
I guess maybe HBase will be fit for you.   HBase is a distributed database
built upon Hadoop.
You can use the url as the row key and put other fields into columns.

then you can retrieve the web page through HBase Client API and insert new
web page into it. The performance of HBase 0.20 is good enough for you.

Best Regards,
Jeff zhang


On Thu, Oct 29, 2009 at 8:53 AM, lei wang <ha...@gmail.com> wrote:

> hi,juff, thanks for your comments.
>   I did read this book early, I use MapFile to store my web pages for
> random access.
> First I think the SquenceFile conversion as a solution, howerve, the
> problem is that I need append the new pages to the MapFile by minute
> or second, so I didn't think SquenceFile conversion can manage this.
> Would you give me some suggestion? Think your very much!
>
> Best wishes.
>
> On 10/28/09, Jeff Zhang <zj...@gmail.com> wrote:
> > I do not know why you need use MapFile, could you use SequenceFile
> instead ?
> >
> > The MapFile's advantage is its read performance, because it build index
> on
> > its keys. So its keys must be in order.
> >
> > If you really want to use MapFile, you can first write your data to
> > SequenceFile and then covert it to MapFile.
> >
> > About  how to convert SequenceFile to MapFile:
> > 1. Sort the SequenceFile using sort in examples of hadoop
> > 2. create index for the output of the above step. then you get both of
> the
> > data file and index file
> >
> >
> > You an refer Tom Whilte's book "Hadoop definitive guide" for details
> about
> > how to convert SequenceFile into MapFile
> >
> > Jeff Zhang
> >
> >
> >
> > On Wed, Oct 28, 2009 at 4:47 PM, lei wang <ha...@gmail.com>
> wrote:
> >
> >> but now, "url" is not in order,  must the key be intwritable ? should it
> >>  be
> >> comparable ?
> >> How to make sure them in order?sort it first?
> >> I just want to insert the pages for  random acess by "url ".
> >>
> >> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zj...@gmail.com> wrote:
> >>
> >> > Hi Wang,
> >> >
> >> > The keys of MapFile should be in order, so when you add records into
> >> > MapFile, you should make sure you insert them in order
> >> >
> >> > Best Regards,
> >> >
> >> > Jeff Zhang
> >> >
> >> >
> >> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <ha...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi, friends
> >> > > I need store the web pages(a huge one) in the MapFile of the hadoop,
> >> > > So
> >> i
> >> > > did use the url as the key, and its type is "text", When  writring
> the
> >> > > records into the mapfile, it give an error as "out of order", which
> >> type
> >> > > should I choose to represent  the key "url", can anyone give me some
> >> > detail
> >> > > answer, thanks for you help.
> >> > >
> >> >
> >>
> >
>

Re: Need Help: The problem with text key of MapFile

Posted by lei wang <ha...@gmail.com>.
hi,juff, thanks for your comments.
   I did read this book early, I use MapFile to store my web pages for
random access.
First I think the SquenceFile conversion as a solution, howerve, the
problem is that I need append the new pages to the MapFile by minute
or second, so I didn't think SquenceFile conversion can manage this.
Would you give me some suggestion? Think your very much!

Best wishes.

On 10/28/09, Jeff Zhang <zj...@gmail.com> wrote:
> I do not know why you need use MapFile, could you use SequenceFile instead ?
>
> The MapFile's advantage is its read performance, because it build index on
> its keys. So its keys must be in order.
>
> If you really want to use MapFile, you can first write your data to
> SequenceFile and then covert it to MapFile.
>
> About  how to convert SequenceFile to MapFile:
> 1. Sort the SequenceFile using sort in examples of hadoop
> 2. create index for the output of the above step. then you get both of the
> data file and index file
>
>
> You an refer Tom Whilte's book "Hadoop definitive guide" for details about
> how to convert SequenceFile into MapFile
>
> Jeff Zhang
>
>
>
> On Wed, Oct 28, 2009 at 4:47 PM, lei wang <ha...@gmail.com> wrote:
>
>> but now, "url" is not in order,  must the key be intwritable ? should it
>>  be
>> comparable ?
>> How to make sure them in order?sort it first?
>> I just want to insert the pages for  random acess by "url ".
>>
>> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>> > Hi Wang,
>> >
>> > The keys of MapFile should be in order, so when you add records into
>> > MapFile, you should make sure you insert them in order
>> >
>> > Best Regards,
>> >
>> > Jeff Zhang
>> >
>> >
>> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <ha...@gmail.com>
>> > wrote:
>> >
>> > > Hi, friends
>> > > I need store the web pages(a huge one) in the MapFile of the hadoop,
>> > > So
>> i
>> > > did use the url as the key, and its type is "text", When  writring the
>> > > records into the mapfile, it give an error as "out of order", which
>> type
>> > > should I choose to represent  the key "url", can anyone give me some
>> > detail
>> > > answer, thanks for you help.
>> > >
>> >
>>
>

Re: Need Help: The problem with text key of MapFile

Posted by Jeff Zhang <zj...@gmail.com>.
I do not know why you need use MapFile, could you use SequenceFile instead ?

The MapFile's advantage is its read performance, because it build index on
its keys. So its keys must be in order.

If you really want to use MapFile, you can first write your data to
SequenceFile and then covert it to MapFile.

About  how to convert SequenceFile to MapFile:
1. Sort the SequenceFile using sort in examples of hadoop
2. create index for the output of the above step. then you get both of the
data file and index file


You an refer Tom Whilte's book "Hadoop definitive guide" for details about
how to convert SequenceFile into MapFile

Jeff Zhang



On Wed, Oct 28, 2009 at 4:47 PM, lei wang <ha...@gmail.com> wrote:

> but now, "url" is not in order,  must the key be intwritable ? should it
>  be
> comparable ?
> How to make sure them in order?sort it first?
> I just want to insert the pages for  random acess by "url ".
>
> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zj...@gmail.com> wrote:
>
> > Hi Wang,
> >
> > The keys of MapFile should be in order, so when you add records into
> > MapFile, you should make sure you insert them in order
> >
> > Best Regards,
> >
> > Jeff Zhang
> >
> >
> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <ha...@gmail.com>
> > wrote:
> >
> > > Hi, friends
> > > I need store the web pages(a huge one) in the MapFile of the hadoop, So
> i
> > > did use the url as the key, and its type is "text", When  writring the
> > > records into the mapfile, it give an error as "out of order", which
> type
> > > should I choose to represent  the key "url", can anyone give me some
> > detail
> > > answer, thanks for you help.
> > >
> >
>

Re: Need Help: The problem with text key of MapFile

Posted by lei wang <ha...@gmail.com>.
but now, "url" is not in order,  must the key be intwritable ? should it  be
comparable ?
How to make sure them in order?sort it first?
I just want to insert the pages for  random acess by "url ".

On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi Wang,
>
> The keys of MapFile should be in order, so when you add records into
> MapFile, you should make sure you insert them in order
>
> Best Regards,
>
> Jeff Zhang
>
>
> On Wed, Oct 28, 2009 at 4:14 PM, lei wang <ha...@gmail.com>
> wrote:
>
> > Hi, friends
> > I need store the web pages(a huge one) in the MapFile of the hadoop, So i
> > did use the url as the key, and its type is "text", When  writring the
> > records into the mapfile, it give an error as "out of order", which type
> > should I choose to represent  the key "url", can anyone give me some
> detail
> > answer, thanks for you help.
> >
>

Re: Need Help: The problem with text key of MapFile

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Wang,

The keys of MapFile should be in order, so when you add records into
MapFile, you should make sure you insert them in order

Best Regards,

Jeff Zhang


On Wed, Oct 28, 2009 at 4:14 PM, lei wang <ha...@gmail.com> wrote:

> Hi, friends
> I need store the web pages(a huge one) in the MapFile of the hadoop, So i
> did use the url as the key, and its type is "text", When  writring the
> records into the mapfile, it give an error as "out of order", which type
> should I choose to represent  the key "url", can anyone give me some detail
> answer, thanks for you help.
>