You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Open Study <op...@gmail.com> on 2007/06/26 17:12:15 UTC

is that possible to make MapFile "mutable" ?

Hi all,

MapFile doesn't support append mode of creation, so every time the existing
mapfile would be overwritten if a new one with same name is created.

Is there anyway I can append to an MapFile or alike without erasing the old
content? or it doesn't makes sense at all?

In my scenario I need to split mass (count by tens of millions) messages
according to certain rules and put them into different mapfiles, which are
supposed to get updated when new messages come in. Since I didn't find a way
to make mapfile appendable, I have to create new mapfiles, so one mapfile
can contain as little as one message in worst case and I will have to later
merge them with their proper siblings.

Regards

RE: is that possible to make MapFile "mutable" ?

Posted by Devaraj Das <dd...@yahoo-inc.com>.
No, you cannot append to a file on the dfs and your app should be able to
treat multiple files as one single logical file (as you point out). But in
your case, it seems like you could design your app to have some buffering,
for example, you could have a buffer for the n different files, and could
flush the buffer as different files to the dfs only when you have reached a
certain limit on the amount of data in the buffer. 
I am not sure whether fault handling is of concern to you but there is the
danger of losing the buffered messages if your app goes down. One way to
handle this - assuming you have the ability to reprocess messages, you could
checkpoint the state of the message processor in the dfs - the state could
include the last message ID you flushed, and the next time your app starts
up, read the checkpoint file from the dfs, get the ID, and process messages
starting from (ID + 1).

-----Original Message-----
From: Open Study [mailto:open.study@gmail.com] 
Sent: Tuesday, June 26, 2007 8:42 PM
To: hadoop-user@lucene.apache.org
Subject: is that possible to make MapFile "mutable" ?

Hi all,

MapFile doesn't support append mode of creation, so every time the existing
mapfile would be overwritten if a new one with same name is created.

Is there anyway I can append to an MapFile or alike without erasing the old
content? or it doesn't makes sense at all?

In my scenario I need to split mass (count by tens of millions) messages
according to certain rules and put them into different mapfiles, which are
supposed to get updated when new messages come in. Since I didn't find a way
to make mapfile appendable, I have to create new mapfiles, so one mapfile
can contain as little as one message in worst case and I will have to later
merge them with their proper siblings.

Regards


Re: is that possible to make MapFile "mutable" ?

Posted by Open Study <op...@gmail.com>.
Hi Devaraj, thanks for the reply and suggestion. I had a similar but less
sophisticated checkpoint mechanism.

Another question is, why the MapFile(in fact the SequenceFile) is made
immutable in the first place? I believe the motivation must make sense but
so far I don't know it.


On 6/27/07, Devaraj Das <dd...@yahoo-inc.com> wrote:
>
> No, you cannot append to a file on the dfs and your app should be able to
> treat multiple files as one single logical file (as you point out). But in
> your case, it seems like you could design your app to have some buffering,
> for example, you could have a buffer for the n different files, and could
> flush the buffer to different files on the dfs only when you have reached
> a
> certain limit on the amount of data in the buffer.
> I am not sure whether fault handling is of concern to you but there is the
> danger of losing the buffered messages if your app goes down. One way to
> handle this - assuming you have the ability to reprocess messages, you
> could
> checkpoint the state of the message processor in the dfs - the state could
> include the last message ID you flushed, and the next time your app starts
> up, it reads the checkpoint file from the dfs, gets the ID, and process
> messages starting from (ID + 1).
>
> -----Original Message-----
> From: Open Study [mailto:open.study@gmail.com]
> Sent: Tuesday, June 26, 2007 8:42 PM
> To: hadoop-user@lucene.apache.org
> Subject: is that possible to make MapFile "mutable" ?
>
> Hi all,
>
> MapFile doesn't support append mode of creation, so every time the
> existing
> mapfile would be overwritten if a new one with same name is created.
>
> Is there anyway I can append to an MapFile or alike without erasing the
> old
> content? or it doesn't makes sense at all?
>
> In my scenario I need to split mass (count by tens of millions) messages
> according to certain rules and put them into different mapfiles, which are
> supposed to get updated when new messages come in. Since I didn't find a
> way
> to make mapfile appendable, I have to create new mapfiles, so one mapfile
> can contain as little as one message in worst case and I will have to
> later
> merge them with their proper siblings.
>
> Regards
>
>

RE: is that possible to make MapFile "mutable" ?

Posted by Devaraj Das <dd...@yahoo-inc.com>.
No, you cannot append to a file on the dfs and your app should be able to
treat multiple files as one single logical file (as you point out). But in
your case, it seems like you could design your app to have some buffering,
for example, you could have a buffer for the n different files, and could
flush the buffer to different files on the dfs only when you have reached a
certain limit on the amount of data in the buffer. 
I am not sure whether fault handling is of concern to you but there is the
danger of losing the buffered messages if your app goes down. One way to
handle this - assuming you have the ability to reprocess messages, you could
checkpoint the state of the message processor in the dfs - the state could
include the last message ID you flushed, and the next time your app starts
up, it reads the checkpoint file from the dfs, gets the ID, and process
messages starting from (ID + 1). 

-----Original Message-----
From: Open Study [mailto:open.study@gmail.com] 
Sent: Tuesday, June 26, 2007 8:42 PM
To: hadoop-user@lucene.apache.org
Subject: is that possible to make MapFile "mutable" ?

Hi all,

MapFile doesn't support append mode of creation, so every time the existing
mapfile would be overwritten if a new one with same name is created.

Is there anyway I can append to an MapFile or alike without erasing the old
content? or it doesn't makes sense at all?

In my scenario I need to split mass (count by tens of millions) messages
according to certain rules and put them into different mapfiles, which are
supposed to get updated when new messages come in. Since I didn't find a way
to make mapfile appendable, I have to create new mapfiles, so one mapfile
can contain as little as one message in worst case and I will have to later
merge them with their proper siblings.

Regards