You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Manickam P <ma...@outlook.com> on 2013/07/04 13:10:57 UTC

How to update a file which is in HDFS

Hi,
I have moved my input file into the HDFS location in the cluster setup. Now i got a new set of file which has some new records along with the old one. I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. Is it possible or do i need to move the entire file into HDFS again? 


Thanks,
Manickam P

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

I totally agree harsh. It was just to avoid any misinterpretation :). I
have seen quite a few discussions as well that talk about the issues.

I would strongly recommend to switch from 1.x if append is desired.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jul 6, 2013 at 7:29 AM, Harsh J <ha...@cloudera.com> wrote:

> The append in 1.x is very broken. You'll run into very weird states
> and we officially do not support it (we even call out in the config as
> broken). I wouldn't recommend using it even if a simple test appears
> to work.
>
> On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> > @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I
> had
> > tried append last time and it was not working despite of the fact that
> API
> > had it. I tried it with 1.1.2 and it seems to work fine.
> >
> > @Manickam : Apologies for the incorrect info. Latest stable
> release(1.1.2)
> > supports append. But, you should consider whatever Harsh has said.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> If it is 1k new records at the "end of the file" then you may extract
> >> them out and append the existing file in HDFS. I'd recommend using
> >> HDFS from Apache Hadoop 2.x for this purpose.
> >>
> >> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com>
> wrote:
> >> > Hi,
> >> >
> >> > Let me explain the question clearly. I have a file which has one
> million
> >> > records and i moved into my hadoop cluster.
> >> > After one month i got a new file which has same one million plus 1000
> >> > new
> >> > records added in end of the file.
> >> > Here i just want to move the 1000 records alone into HDFS instead of
> >> > overwriting the entire file.
> >> >
> >> > Can i use HBase for this scenario? i don't have clear idea about
> HBase.
> >> > Just
> >> > asking.
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> > Manickam P
> >> >
> >> >
> >> >> From: harsh@cloudera.com
> >> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >> >
> >> >> Subject: Re: How to update a file which is in HDFS
> >> >> To: user@hadoop.apache.org
> >> >
> >> >>
> >> >> The answer to the "delta" part is more that HDFS does not presently
> >> >> support random writes. You cannot alter a closed file for anything
> >> >> other than appending at the end, which I doubt will help you if you
> >> >> are also receiving updates (it isn't clear from your question what
> >> >> this added data really is).
> >> >>
> >> >> HBase sounds like something that may solve your requirement though,
> >> >> depending on how much of your read/write load is random. You could
> >> >> consider it.
> >> >>
> >> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >> >>
> >> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> >> >> wrote:
> >> >> > Hello Manickam,
> >> >> >
> >> >> > Append is currently not possible.
> >> >> >
> >> >> > Warm Regards,
> >> >> > Tariq
> >> >> > cloudfront.blogspot.com
> >> >> >
> >> >> >
> >> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manickam.p@outlook.com
> >
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I have moved my input file into the HDFS location in the cluster
> >> >> >> setup.
> >> >> >> Now i got a new set of file which has some new records along with
> >> >> >> the
> >> >> >> old
> >> >> >> one.
> >> >> >> I want to move the delta part alone into HDFS because it will take
> >> >> >> more
> >> >> >> time to move the file from my local to HDFS location.
> >> >> >> Is it possible or do i need to move the entire file into HDFS
> again?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Manickam P
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

I totally agree harsh. It was just to avoid any misinterpretation :). I
have seen quite a few discussions as well that talk about the issues.

I would strongly recommend to switch from 1.x if append is desired.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jul 6, 2013 at 7:29 AM, Harsh J <ha...@cloudera.com> wrote:

> The append in 1.x is very broken. You'll run into very weird states
> and we officially do not support it (we even call out in the config as
> broken). I wouldn't recommend using it even if a simple test appears
> to work.
>
> On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> > @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I
> had
> > tried append last time and it was not working despite of the fact that
> API
> > had it. I tried it with 1.1.2 and it seems to work fine.
> >
> > @Manickam : Apologies for the incorrect info. Latest stable
> release(1.1.2)
> > supports append. But, you should consider whatever Harsh has said.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> If it is 1k new records at the "end of the file" then you may extract
> >> them out and append the existing file in HDFS. I'd recommend using
> >> HDFS from Apache Hadoop 2.x for this purpose.
> >>
> >> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com>
> wrote:
> >> > Hi,
> >> >
> >> > Let me explain the question clearly. I have a file which has one
> million
> >> > records and i moved into my hadoop cluster.
> >> > After one month i got a new file which has same one million plus 1000
> >> > new
> >> > records added in end of the file.
> >> > Here i just want to move the 1000 records alone into HDFS instead of
> >> > overwriting the entire file.
> >> >
> >> > Can i use HBase for this scenario? i don't have clear idea about
> HBase.
> >> > Just
> >> > asking.
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> > Manickam P
> >> >
> >> >
> >> >> From: harsh@cloudera.com
> >> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >> >
> >> >> Subject: Re: How to update a file which is in HDFS
> >> >> To: user@hadoop.apache.org
> >> >
> >> >>
> >> >> The answer to the "delta" part is more that HDFS does not presently
> >> >> support random writes. You cannot alter a closed file for anything
> >> >> other than appending at the end, which I doubt will help you if you
> >> >> are also receiving updates (it isn't clear from your question what
> >> >> this added data really is).
> >> >>
> >> >> HBase sounds like something that may solve your requirement though,
> >> >> depending on how much of your read/write load is random. You could
> >> >> consider it.
> >> >>
> >> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >> >>
> >> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> >> >> wrote:
> >> >> > Hello Manickam,
> >> >> >
> >> >> > Append is currently not possible.
> >> >> >
> >> >> > Warm Regards,
> >> >> > Tariq
> >> >> > cloudfront.blogspot.com
> >> >> >
> >> >> >
> >> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manickam.p@outlook.com
> >
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I have moved my input file into the HDFS location in the cluster
> >> >> >> setup.
> >> >> >> Now i got a new set of file which has some new records along with
> >> >> >> the
> >> >> >> old
> >> >> >> one.
> >> >> >> I want to move the delta part alone into HDFS because it will take
> >> >> >> more
> >> >> >> time to move the file from my local to HDFS location.
> >> >> >> Is it possible or do i need to move the entire file into HDFS
> again?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Manickam P
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

I totally agree harsh. It was just to avoid any misinterpretation :). I
have seen quite a few discussions as well that talk about the issues.

I would strongly recommend to switch from 1.x if append is desired.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jul 6, 2013 at 7:29 AM, Harsh J <ha...@cloudera.com> wrote:

> The append in 1.x is very broken. You'll run into very weird states
> and we officially do not support it (we even call out in the config as
> broken). I wouldn't recommend using it even if a simple test appears
> to work.
>
> On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> > @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I
> had
> > tried append last time and it was not working despite of the fact that
> API
> > had it. I tried it with 1.1.2 and it seems to work fine.
> >
> > @Manickam : Apologies for the incorrect info. Latest stable
> release(1.1.2)
> > supports append. But, you should consider whatever Harsh has said.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> If it is 1k new records at the "end of the file" then you may extract
> >> them out and append the existing file in HDFS. I'd recommend using
> >> HDFS from Apache Hadoop 2.x for this purpose.
> >>
> >> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com>
> wrote:
> >> > Hi,
> >> >
> >> > Let me explain the question clearly. I have a file which has one
> million
> >> > records and i moved into my hadoop cluster.
> >> > After one month i got a new file which has same one million plus 1000
> >> > new
> >> > records added in end of the file.
> >> > Here i just want to move the 1000 records alone into HDFS instead of
> >> > overwriting the entire file.
> >> >
> >> > Can i use HBase for this scenario? i don't have clear idea about
> HBase.
> >> > Just
> >> > asking.
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> > Manickam P
> >> >
> >> >
> >> >> From: harsh@cloudera.com
> >> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >> >
> >> >> Subject: Re: How to update a file which is in HDFS
> >> >> To: user@hadoop.apache.org
> >> >
> >> >>
> >> >> The answer to the "delta" part is more that HDFS does not presently
> >> >> support random writes. You cannot alter a closed file for anything
> >> >> other than appending at the end, which I doubt will help you if you
> >> >> are also receiving updates (it isn't clear from your question what
> >> >> this added data really is).
> >> >>
> >> >> HBase sounds like something that may solve your requirement though,
> >> >> depending on how much of your read/write load is random. You could
> >> >> consider it.
> >> >>
> >> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >> >>
> >> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> >> >> wrote:
> >> >> > Hello Manickam,
> >> >> >
> >> >> > Append is currently not possible.
> >> >> >
> >> >> > Warm Regards,
> >> >> > Tariq
> >> >> > cloudfront.blogspot.com
> >> >> >
> >> >> >
> >> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manickam.p@outlook.com
> >
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I have moved my input file into the HDFS location in the cluster
> >> >> >> setup.
> >> >> >> Now i got a new set of file which has some new records along with
> >> >> >> the
> >> >> >> old
> >> >> >> one.
> >> >> >> I want to move the delta part alone into HDFS because it will take
> >> >> >> more
> >> >> >> time to move the file from my local to HDFS location.
> >> >> >> Is it possible or do i need to move the entire file into HDFS
> again?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Manickam P
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

I totally agree harsh. It was just to avoid any misinterpretation :). I
have seen quite a few discussions as well that talk about the issues.

I would strongly recommend to switch from 1.x if append is desired.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sat, Jul 6, 2013 at 7:29 AM, Harsh J <ha...@cloudera.com> wrote:

> The append in 1.x is very broken. You'll run into very weird states
> and we officially do not support it (we even call out in the config as
> broken). I wouldn't recommend using it even if a simple test appears
> to work.
>
> On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> > @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I
> had
> > tried append last time and it was not working despite of the fact that
> API
> > had it. I tried it with 1.1.2 and it seems to work fine.
> >
> > @Manickam : Apologies for the incorrect info. Latest stable
> release(1.1.2)
> > supports append. But, you should consider whatever Harsh has said.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> If it is 1k new records at the "end of the file" then you may extract
> >> them out and append the existing file in HDFS. I'd recommend using
> >> HDFS from Apache Hadoop 2.x for this purpose.
> >>
> >> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com>
> wrote:
> >> > Hi,
> >> >
> >> > Let me explain the question clearly. I have a file which has one
> million
> >> > records and i moved into my hadoop cluster.
> >> > After one month i got a new file which has same one million plus 1000
> >> > new
> >> > records added in end of the file.
> >> > Here i just want to move the 1000 records alone into HDFS instead of
> >> > overwriting the entire file.
> >> >
> >> > Can i use HBase for this scenario? i don't have clear idea about
> HBase.
> >> > Just
> >> > asking.
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> > Manickam P
> >> >
> >> >
> >> >> From: harsh@cloudera.com
> >> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >> >
> >> >> Subject: Re: How to update a file which is in HDFS
> >> >> To: user@hadoop.apache.org
> >> >
> >> >>
> >> >> The answer to the "delta" part is more that HDFS does not presently
> >> >> support random writes. You cannot alter a closed file for anything
> >> >> other than appending at the end, which I doubt will help you if you
> >> >> are also receiving updates (it isn't clear from your question what
> >> >> this added data really is).
> >> >>
> >> >> HBase sounds like something that may solve your requirement though,
> >> >> depending on how much of your read/write load is random. You could
> >> >> consider it.
> >> >>
> >> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >> >>
> >> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> >> >> wrote:
> >> >> > Hello Manickam,
> >> >> >
> >> >> > Append is currently not possible.
> >> >> >
> >> >> > Warm Regards,
> >> >> > Tariq
> >> >> > cloudfront.blogspot.com
> >> >> >
> >> >> >
> >> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <manickam.p@outlook.com
> >
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I have moved my input file into the HDFS location in the cluster
> >> >> >> setup.
> >> >> >> Now i got a new set of file which has some new records along with
> >> >> >> the
> >> >> >> old
> >> >> >> one.
> >> >> >> I want to move the delta part alone into HDFS because it will take
> >> >> >> more
> >> >> >> time to move the file from my local to HDFS location.
> >> >> >> Is it possible or do i need to move the entire file into HDFS
> again?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Manickam P
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The append in 1.x is very broken. You'll run into very weird states
and we officially do not support it (we even call out in the config as
broken). I wouldn't recommend using it even if a simple test appears
to work.

On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
> tried append last time and it was not working despite of the fact that API
> had it. I tried it with 1.1.2 and it seems to work fine.
>
> @Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
> supports append. But, you should consider whatever Harsh has said.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> If it is 1k new records at the "end of the file" then you may extract
>> them out and append the existing file in HDFS. I'd recommend using
>> HDFS from Apache Hadoop 2.x for this purpose.
>>
>> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
>> > Hi,
>> >
>> > Let me explain the question clearly. I have a file which has one million
>> > records and i moved into my hadoop cluster.
>> > After one month i got a new file which has same one million plus 1000
>> > new
>> > records added in end of the file.
>> > Here i just want to move the 1000 records alone into HDFS instead of
>> > overwriting the entire file.
>> >
>> > Can i use HBase for this scenario? i don't have clear idea about HBase.
>> > Just
>> > asking.
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Manickam P
>> >
>> >
>> >> From: harsh@cloudera.com
>> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
>> >
>> >> Subject: Re: How to update a file which is in HDFS
>> >> To: user@hadoop.apache.org
>> >
>> >>
>> >> The answer to the "delta" part is more that HDFS does not presently
>> >> support random writes. You cannot alter a closed file for anything
>> >> other than appending at the end, which I doubt will help you if you
>> >> are also receiving updates (it isn't clear from your question what
>> >> this added data really is).
>> >>
>> >> HBase sounds like something that may solve your requirement though,
>> >> depending on how much of your read/write load is random. You could
>> >> consider it.
>> >>
>> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
>> >>
>> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
>> >> wrote:
>> >> > Hello Manickam,
>> >> >
>> >> > Append is currently not possible.
>> >> >
>> >> > Warm Regards,
>> >> > Tariq
>> >> > cloudfront.blogspot.com
>> >> >
>> >> >
>> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I have moved my input file into the HDFS location in the cluster
>> >> >> setup.
>> >> >> Now i got a new set of file which has some new records along with
>> >> >> the
>> >> >> old
>> >> >> one.
>> >> >> I want to move the delta part alone into HDFS because it will take
>> >> >> more
>> >> >> time to move the file from my local to HDFS location.
>> >> >> Is it possible or do i need to move the entire file into HDFS again?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Manickam P
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The append in 1.x is very broken. You'll run into very weird states
and we officially do not support it (we even call out in the config as
broken). I wouldn't recommend using it even if a simple test appears
to work.

On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
> tried append last time and it was not working despite of the fact that API
> had it. I tried it with 1.1.2 and it seems to work fine.
>
> @Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
> supports append. But, you should consider whatever Harsh has said.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> If it is 1k new records at the "end of the file" then you may extract
>> them out and append the existing file in HDFS. I'd recommend using
>> HDFS from Apache Hadoop 2.x for this purpose.
>>
>> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
>> > Hi,
>> >
>> > Let me explain the question clearly. I have a file which has one million
>> > records and i moved into my hadoop cluster.
>> > After one month i got a new file which has same one million plus 1000
>> > new
>> > records added in end of the file.
>> > Here i just want to move the 1000 records alone into HDFS instead of
>> > overwriting the entire file.
>> >
>> > Can i use HBase for this scenario? i don't have clear idea about HBase.
>> > Just
>> > asking.
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Manickam P
>> >
>> >
>> >> From: harsh@cloudera.com
>> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
>> >
>> >> Subject: Re: How to update a file which is in HDFS
>> >> To: user@hadoop.apache.org
>> >
>> >>
>> >> The answer to the "delta" part is more that HDFS does not presently
>> >> support random writes. You cannot alter a closed file for anything
>> >> other than appending at the end, which I doubt will help you if you
>> >> are also receiving updates (it isn't clear from your question what
>> >> this added data really is).
>> >>
>> >> HBase sounds like something that may solve your requirement though,
>> >> depending on how much of your read/write load is random. You could
>> >> consider it.
>> >>
>> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
>> >>
>> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
>> >> wrote:
>> >> > Hello Manickam,
>> >> >
>> >> > Append is currently not possible.
>> >> >
>> >> > Warm Regards,
>> >> > Tariq
>> >> > cloudfront.blogspot.com
>> >> >
>> >> >
>> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I have moved my input file into the HDFS location in the cluster
>> >> >> setup.
>> >> >> Now i got a new set of file which has some new records along with
>> >> >> the
>> >> >> old
>> >> >> one.
>> >> >> I want to move the delta part alone into HDFS because it will take
>> >> >> more
>> >> >> time to move the file from my local to HDFS location.
>> >> >> Is it possible or do i need to move the entire file into HDFS again?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Manickam P
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The append in 1.x is very broken. You'll run into very weird states
and we officially do not support it (we even call out in the config as
broken). I wouldn't recommend using it even if a simple test appears
to work.

On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
> tried append last time and it was not working despite of the fact that API
> had it. I tried it with 1.1.2 and it seems to work fine.
>
> @Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
> supports append. But, you should consider whatever Harsh has said.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> If it is 1k new records at the "end of the file" then you may extract
>> them out and append the existing file in HDFS. I'd recommend using
>> HDFS from Apache Hadoop 2.x for this purpose.
>>
>> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
>> > Hi,
>> >
>> > Let me explain the question clearly. I have a file which has one million
>> > records and i moved into my hadoop cluster.
>> > After one month i got a new file which has same one million plus 1000
>> > new
>> > records added in end of the file.
>> > Here i just want to move the 1000 records alone into HDFS instead of
>> > overwriting the entire file.
>> >
>> > Can i use HBase for this scenario? i don't have clear idea about HBase.
>> > Just
>> > asking.
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Manickam P
>> >
>> >
>> >> From: harsh@cloudera.com
>> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
>> >
>> >> Subject: Re: How to update a file which is in HDFS
>> >> To: user@hadoop.apache.org
>> >
>> >>
>> >> The answer to the "delta" part is more that HDFS does not presently
>> >> support random writes. You cannot alter a closed file for anything
>> >> other than appending at the end, which I doubt will help you if you
>> >> are also receiving updates (it isn't clear from your question what
>> >> this added data really is).
>> >>
>> >> HBase sounds like something that may solve your requirement though,
>> >> depending on how much of your read/write load is random. You could
>> >> consider it.
>> >>
>> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
>> >>
>> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
>> >> wrote:
>> >> > Hello Manickam,
>> >> >
>> >> > Append is currently not possible.
>> >> >
>> >> > Warm Regards,
>> >> > Tariq
>> >> > cloudfront.blogspot.com
>> >> >
>> >> >
>> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I have moved my input file into the HDFS location in the cluster
>> >> >> setup.
>> >> >> Now i got a new set of file which has some new records along with
>> >> >> the
>> >> >> old
>> >> >> one.
>> >> >> I want to move the delta part alone into HDFS because it will take
>> >> >> more
>> >> >> time to move the file from my local to HDFS location.
>> >> >> Is it possible or do i need to move the entire file into HDFS again?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Manickam P
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The append in 1.x is very broken. You'll run into very weird states
and we officially do not support it (we even call out in the config as
broken). I wouldn't recommend using it even if a simple test appears
to work.

On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
> @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
> tried append last time and it was not working despite of the fact that API
> had it. I tried it with 1.1.2 and it seems to work fine.
>
> @Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
> supports append. But, you should consider whatever Harsh has said.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> If it is 1k new records at the "end of the file" then you may extract
>> them out and append the existing file in HDFS. I'd recommend using
>> HDFS from Apache Hadoop 2.x for this purpose.
>>
>> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
>> > Hi,
>> >
>> > Let me explain the question clearly. I have a file which has one million
>> > records and i moved into my hadoop cluster.
>> > After one month i got a new file which has same one million plus 1000
>> > new
>> > records added in end of the file.
>> > Here i just want to move the 1000 records alone into HDFS instead of
>> > overwriting the entire file.
>> >
>> > Can i use HBase for this scenario? i don't have clear idea about HBase.
>> > Just
>> > asking.
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Manickam P
>> >
>> >
>> >> From: harsh@cloudera.com
>> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
>> >
>> >> Subject: Re: How to update a file which is in HDFS
>> >> To: user@hadoop.apache.org
>> >
>> >>
>> >> The answer to the "delta" part is more that HDFS does not presently
>> >> support random writes. You cannot alter a closed file for anything
>> >> other than appending at the end, which I doubt will help you if you
>> >> are also receiving updates (it isn't clear from your question what
>> >> this added data really is).
>> >>
>> >> HBase sounds like something that may solve your requirement though,
>> >> depending on how much of your read/write load is random. You could
>> >> consider it.
>> >>
>> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
>> >>
>> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
>> >> wrote:
>> >> > Hello Manickam,
>> >> >
>> >> > Append is currently not possible.
>> >> >
>> >> > Warm Regards,
>> >> > Tariq
>> >> > cloudfront.blogspot.com
>> >> >
>> >> >
>> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I have moved my input file into the HDFS location in the cluster
>> >> >> setup.
>> >> >> Now i got a new set of file which has some new records along with
>> >> >> the
>> >> >> old
>> >> >> one.
>> >> >> I want to move the delta part alone into HDFS because it will take
>> >> >> more
>> >> >> time to move the file from my local to HDFS location.
>> >> >> Is it possible or do i need to move the entire file into HDFS again?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Manickam P
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

@Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
tried append last time and it was not working despite of the fact that API
had it. I tried it with 1.1.2 and it seems to work fine.

@Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
supports append. But, you should consider whatever Harsh has said.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:

> If it is 1k new records at the "end of the file" then you may extract
> them out and append the existing file in HDFS. I'd recommend using
> HDFS from Apache Hadoop 2.x for this purpose.
>
> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> > Hi,
> >
> > Let me explain the question clearly. I have a file which has one million
> > records and i moved into my hadoop cluster.
> > After one month i got a new file which has same one million plus 1000 new
> > records added in end of the file.
> > Here i just want to move the 1000 records alone into HDFS instead of
> > overwriting the entire file.
> >
> > Can i use HBase for this scenario? i don't have clear idea about HBase.
> Just
> > asking.
> >
> >
> >
> >
> > Thanks,
> > Manickam P
> >
> >
> >> From: harsh@cloudera.com
> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >
> >> Subject: Re: How to update a file which is in HDFS
> >> To: user@hadoop.apache.org
> >
> >>
> >> The answer to the "delta" part is more that HDFS does not presently
> >> support random writes. You cannot alter a closed file for anything
> >> other than appending at the end, which I doubt will help you if you
> >> are also receiving updates (it isn't clear from your question what
> >> this added data really is).
> >>
> >> HBase sounds like something that may solve your requirement though,
> >> depending on how much of your read/write load is random. You could
> >> consider it.
> >>
> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >>
> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> >> > Hello Manickam,
> >> >
> >> > Append is currently not possible.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I have moved my input file into the HDFS location in the cluster
> setup.
> >> >> Now i got a new set of file which has some new records along with the
> >> >> old
> >> >> one.
> >> >> I want to move the delta part alone into HDFS because it will take
> more
> >> >> time to move the file from my local to HDFS location.
> >> >> Is it possible or do i need to move the entire file into HDFS again?
> >> >>
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Manickam P
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

@Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
tried append last time and it was not working despite of the fact that API
had it. I tried it with 1.1.2 and it seems to work fine.

@Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
supports append. But, you should consider whatever Harsh has said.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:

> If it is 1k new records at the "end of the file" then you may extract
> them out and append the existing file in HDFS. I'd recommend using
> HDFS from Apache Hadoop 2.x for this purpose.
>
> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> > Hi,
> >
> > Let me explain the question clearly. I have a file which has one million
> > records and i moved into my hadoop cluster.
> > After one month i got a new file which has same one million plus 1000 new
> > records added in end of the file.
> > Here i just want to move the 1000 records alone into HDFS instead of
> > overwriting the entire file.
> >
> > Can i use HBase for this scenario? i don't have clear idea about HBase.
> Just
> > asking.
> >
> >
> >
> >
> > Thanks,
> > Manickam P
> >
> >
> >> From: harsh@cloudera.com
> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >
> >> Subject: Re: How to update a file which is in HDFS
> >> To: user@hadoop.apache.org
> >
> >>
> >> The answer to the "delta" part is more that HDFS does not presently
> >> support random writes. You cannot alter a closed file for anything
> >> other than appending at the end, which I doubt will help you if you
> >> are also receiving updates (it isn't clear from your question what
> >> this added data really is).
> >>
> >> HBase sounds like something that may solve your requirement though,
> >> depending on how much of your read/write load is random. You could
> >> consider it.
> >>
> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >>
> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> >> > Hello Manickam,
> >> >
> >> > Append is currently not possible.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I have moved my input file into the HDFS location in the cluster
> setup.
> >> >> Now i got a new set of file which has some new records along with the
> >> >> old
> >> >> one.
> >> >> I want to move the delta part alone into HDFS because it will take
> more
> >> >> time to move the file from my local to HDFS location.
> >> >> Is it possible or do i need to move the entire file into HDFS again?
> >> >>
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Manickam P
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

@Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
tried append last time and it was not working despite of the fact that API
had it. I tried it with 1.1.2 and it seems to work fine.

@Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
supports append. But, you should consider whatever Harsh has said.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:

> If it is 1k new records at the "end of the file" then you may extract
> them out and append the existing file in HDFS. I'd recommend using
> HDFS from Apache Hadoop 2.x for this purpose.
>
> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> > Hi,
> >
> > Let me explain the question clearly. I have a file which has one million
> > records and i moved into my hadoop cluster.
> > After one month i got a new file which has same one million plus 1000 new
> > records added in end of the file.
> > Here i just want to move the 1000 records alone into HDFS instead of
> > overwriting the entire file.
> >
> > Can i use HBase for this scenario? i don't have clear idea about HBase.
> Just
> > asking.
> >
> >
> >
> >
> > Thanks,
> > Manickam P
> >
> >
> >> From: harsh@cloudera.com
> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >
> >> Subject: Re: How to update a file which is in HDFS
> >> To: user@hadoop.apache.org
> >
> >>
> >> The answer to the "delta" part is more that HDFS does not presently
> >> support random writes. You cannot alter a closed file for anything
> >> other than appending at the end, which I doubt will help you if you
> >> are also receiving updates (it isn't clear from your question what
> >> this added data really is).
> >>
> >> HBase sounds like something that may solve your requirement though,
> >> depending on how much of your read/write load is random. You could
> >> consider it.
> >>
> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >>
> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> >> > Hello Manickam,
> >> >
> >> > Append is currently not possible.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I have moved my input file into the HDFS location in the cluster
> setup.
> >> >> Now i got a new set of file which has some new records along with the
> >> >> old
> >> >> one.
> >> >> I want to move the delta part alone into HDFS because it will take
> more
> >> >> time to move the file from my local to HDFS location.
> >> >> Is it possible or do i need to move the entire file into HDFS again?
> >> >>
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Manickam P
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

@Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
tried append last time and it was not working despite of the fact that API
had it. I tried it with 1.1.2 and it seems to work fine.

@Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
supports append. But, you should consider whatever Harsh has said.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <ha...@cloudera.com> wrote:

> If it is 1k new records at the "end of the file" then you may extract
> them out and append the existing file in HDFS. I'd recommend using
> HDFS from Apache Hadoop 2.x for this purpose.
>
> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> > Hi,
> >
> > Let me explain the question clearly. I have a file which has one million
> > records and i moved into my hadoop cluster.
> > After one month i got a new file which has same one million plus 1000 new
> > records added in end of the file.
> > Here i just want to move the 1000 records alone into HDFS instead of
> > overwriting the entire file.
> >
> > Can i use HBase for this scenario? i don't have clear idea about HBase.
> Just
> > asking.
> >
> >
> >
> >
> > Thanks,
> > Manickam P
> >
> >
> >> From: harsh@cloudera.com
> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >
> >> Subject: Re: How to update a file which is in HDFS
> >> To: user@hadoop.apache.org
> >
> >>
> >> The answer to the "delta" part is more that HDFS does not presently
> >> support random writes. You cannot alter a closed file for anything
> >> other than appending at the end, which I doubt will help you if you
> >> are also receiving updates (it isn't clear from your question what
> >> this added data really is).
> >>
> >> HBase sounds like something that may solve your requirement though,
> >> depending on how much of your read/write load is random. You could
> >> consider it.
> >>
> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >>
> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> >> > Hello Manickam,
> >> >
> >> > Append is currently not possible.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I have moved my input file into the HDFS location in the cluster
> setup.
> >> >> Now i got a new set of file which has some new records along with the
> >> >> old
> >> >> one.
> >> >> I want to move the delta part alone into HDFS because it will take
> more
> >> >> time to move the file from my local to HDFS location.
> >> >> Is it possible or do i need to move the entire file into HDFS again?
> >> >>
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Manickam P
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

If it is 1k new records at the "end of the file" then you may extract
them out and append the existing file in HDFS. I'd recommend using
HDFS from Apache Hadoop 2.x for this purpose.

On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> Hi,
>
> Let me explain the question clearly. I have a file which has one million
> records and i moved into my hadoop cluster.
> After one month i got a new file which has same one million plus 1000 new
> records added in end of the file.
> Here i just want to move the 1000 records alone into HDFS instead of
> overwriting the entire file.
>
> Can i use HBase for this scenario? i don't have clear idea about HBase. Just
> asking.
>
>
>
>
> Thanks,
> Manickam P
>
>
>> From: harsh@cloudera.com
>> Date: Fri, 5 Jul 2013 16:13:16 +0530
>
>> Subject: Re: How to update a file which is in HDFS
>> To: user@hadoop.apache.org
>
>>
>> The answer to the "delta" part is more that HDFS does not presently
>> support random writes. You cannot alter a closed file for anything
>> other than appending at the end, which I doubt will help you if you
>> are also receiving updates (it isn't clear from your question what
>> this added data really is).
>>
>> HBase sounds like something that may solve your requirement though,
>> depending on how much of your read/write load is random. You could
>> consider it.
>>
>> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> it either). AFAIK, only Flume's making use of it, if you allow it to.
>>
>> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> > Hello Manickam,
>> >
>> > Append is currently not possible.
>> >
>> > Warm Regards,
>> > Tariq
>> > cloudfront.blogspot.com
>> >
>> >
>> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have moved my input file into the HDFS location in the cluster setup.
>> >> Now i got a new set of file which has some new records along with the
>> >> old
>> >> one.
>> >> I want to move the delta part alone into HDFS because it will take more
>> >> time to move the file from my local to HDFS location.
>> >> Is it possible or do i need to move the entire file into HDFS again?
>> >>
>> >>
>> >>
>> >> Thanks,
>> >> Manickam P
>> >
>> >
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

If it is 1k new records at the "end of the file" then you may extract
them out and append the existing file in HDFS. I'd recommend using
HDFS from Apache Hadoop 2.x for this purpose.

On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> Hi,
>
> Let me explain the question clearly. I have a file which has one million
> records and i moved into my hadoop cluster.
> After one month i got a new file which has same one million plus 1000 new
> records added in end of the file.
> Here i just want to move the 1000 records alone into HDFS instead of
> overwriting the entire file.
>
> Can i use HBase for this scenario? i don't have clear idea about HBase. Just
> asking.
>
>
>
>
> Thanks,
> Manickam P
>
>
>> From: harsh@cloudera.com
>> Date: Fri, 5 Jul 2013 16:13:16 +0530
>
>> Subject: Re: How to update a file which is in HDFS
>> To: user@hadoop.apache.org
>
>>
>> The answer to the "delta" part is more that HDFS does not presently
>> support random writes. You cannot alter a closed file for anything
>> other than appending at the end, which I doubt will help you if you
>> are also receiving updates (it isn't clear from your question what
>> this added data really is).
>>
>> HBase sounds like something that may solve your requirement though,
>> depending on how much of your read/write load is random. You could
>> consider it.
>>
>> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> it either). AFAIK, only Flume's making use of it, if you allow it to.
>>
>> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> > Hello Manickam,
>> >
>> > Append is currently not possible.
>> >
>> > Warm Regards,
>> > Tariq
>> > cloudfront.blogspot.com
>> >
>> >
>> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have moved my input file into the HDFS location in the cluster setup.
>> >> Now i got a new set of file which has some new records along with the
>> >> old
>> >> one.
>> >> I want to move the delta part alone into HDFS because it will take more
>> >> time to move the file from my local to HDFS location.
>> >> Is it possible or do i need to move the entire file into HDFS again?
>> >>
>> >>
>> >>
>> >> Thanks,
>> >> Manickam P
>> >
>> >
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

If it is 1k new records at the "end of the file" then you may extract
them out and append the existing file in HDFS. I'd recommend using
HDFS from Apache Hadoop 2.x for this purpose.

On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> Hi,
>
> Let me explain the question clearly. I have a file which has one million
> records and i moved into my hadoop cluster.
> After one month i got a new file which has same one million plus 1000 new
> records added in end of the file.
> Here i just want to move the 1000 records alone into HDFS instead of
> overwriting the entire file.
>
> Can i use HBase for this scenario? i don't have clear idea about HBase. Just
> asking.
>
>
>
>
> Thanks,
> Manickam P
>
>
>> From: harsh@cloudera.com
>> Date: Fri, 5 Jul 2013 16:13:16 +0530
>
>> Subject: Re: How to update a file which is in HDFS
>> To: user@hadoop.apache.org
>
>>
>> The answer to the "delta" part is more that HDFS does not presently
>> support random writes. You cannot alter a closed file for anything
>> other than appending at the end, which I doubt will help you if you
>> are also receiving updates (it isn't clear from your question what
>> this added data really is).
>>
>> HBase sounds like something that may solve your requirement though,
>> depending on how much of your read/write load is random. You could
>> consider it.
>>
>> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> it either). AFAIK, only Flume's making use of it, if you allow it to.
>>
>> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> > Hello Manickam,
>> >
>> > Append is currently not possible.
>> >
>> > Warm Regards,
>> > Tariq
>> > cloudfront.blogspot.com
>> >
>> >
>> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have moved my input file into the HDFS location in the cluster setup.
>> >> Now i got a new set of file which has some new records along with the
>> >> old
>> >> one.
>> >> I want to move the delta part alone into HDFS because it will take more
>> >> time to move the file from my local to HDFS location.
>> >> Is it possible or do i need to move the entire file into HDFS again?
>> >>
>> >>
>> >>
>> >> Thanks,
>> >> Manickam P
>> >
>> >
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

If it is 1k new records at the "end of the file" then you may extract
them out and append the existing file in HDFS. I'd recommend using
HDFS from Apache Hadoop 2.x for this purpose.

On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <ma...@outlook.com> wrote:
> Hi,
>
> Let me explain the question clearly. I have a file which has one million
> records and i moved into my hadoop cluster.
> After one month i got a new file which has same one million plus 1000 new
> records added in end of the file.
> Here i just want to move the 1000 records alone into HDFS instead of
> overwriting the entire file.
>
> Can i use HBase for this scenario? i don't have clear idea about HBase. Just
> asking.
>
>
>
>
> Thanks,
> Manickam P
>
>
>> From: harsh@cloudera.com
>> Date: Fri, 5 Jul 2013 16:13:16 +0530
>
>> Subject: Re: How to update a file which is in HDFS
>> To: user@hadoop.apache.org
>
>>
>> The answer to the "delta" part is more that HDFS does not presently
>> support random writes. You cannot alter a closed file for anything
>> other than appending at the end, which I doubt will help you if you
>> are also receiving updates (it isn't clear from your question what
>> this added data really is).
>>
>> HBase sounds like something that may solve your requirement though,
>> depending on how much of your read/write load is random. You could
>> consider it.
>>
>> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> it either). AFAIK, only Flume's making use of it, if you allow it to.
>>
>> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> > Hello Manickam,
>> >
>> > Append is currently not possible.
>> >
>> > Warm Regards,
>> > Tariq
>> > cloudfront.blogspot.com
>> >
>> >
>> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have moved my input file into the HDFS location in the cluster setup.
>> >> Now i got a new set of file which has some new records along with the
>> >> old
>> >> one.
>> >> I want to move the delta part alone into HDFS because it will take more
>> >> time to move the file from my local to HDFS location.
>> >> Is it possible or do i need to move the entire file into HDFS again?
>> >>
>> >>
>> >>
>> >> Thanks,
>> >> Manickam P
>> >
>> >
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

RE: How to update a file which is in HDFS

Posted by Manickam P <ma...@outlook.com>.

Hi,
Let me explain the question clearly. I have a file which has one million records and i moved into my hadoop cluster. After one month i got a new file which has same one million plus 1000 new records added in end of the file. Here i just want to move the 1000 records alone into HDFS instead of overwriting the entire file. 
Can i use HBase for this scenario? i don't have clear idea about HBase. Just asking.



Thanks,
Manickam P

> From: harsh@cloudera.com
> Date: Fri, 5 Jul 2013 16:13:16 +0530
> Subject: Re: How to update a file which is in HDFS
> To: user@hadoop.apache.org
> 
> The answer to the "delta" part is more that HDFS does not presently
> support random writes. You cannot alter a closed file for anything
> other than appending at the end, which I doubt will help you if you
> are also receiving updates (it isn't clear from your question what
> this added data really is).
> 
> HBase sounds like something that may solve your requirement though,
> depending on how much of your read/write load is random. You could
> consider it.
> 
> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> it either). AFAIK, only Flume's making use of it, if you allow it to.
> 
> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> > Hello Manickam,
> >
> >         Append is currently not possible.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> >>
> >> Hi,
> >>
> >> I have moved my input file into the HDFS location in the cluster setup.
> >> Now i got a new set of file which has some new records along with the old
> >> one.
> >> I want to move the delta part alone into HDFS because it will take more
> >> time to move the file from my local to HDFS location.
> >> Is it possible or do i need to move the entire file into HDFS again?
> >>
> >>
> >>
> >> Thanks,
> >> Manickam P
> >
> >
> 
> 
> 
> --
> Harsh J

RE: How to update a file which is in HDFS

Posted by Manickam P <ma...@outlook.com>.

Hi,
Let me explain the question clearly. I have a file which has one million records and i moved into my hadoop cluster. After one month i got a new file which has same one million plus 1000 new records added in end of the file. Here i just want to move the 1000 records alone into HDFS instead of overwriting the entire file. 
Can i use HBase for this scenario? i don't have clear idea about HBase. Just asking.



Thanks,
Manickam P

> From: harsh@cloudera.com
> Date: Fri, 5 Jul 2013 16:13:16 +0530
> Subject: Re: How to update a file which is in HDFS
> To: user@hadoop.apache.org
> 
> The answer to the "delta" part is more that HDFS does not presently
> support random writes. You cannot alter a closed file for anything
> other than appending at the end, which I doubt will help you if you
> are also receiving updates (it isn't clear from your question what
> this added data really is).
> 
> HBase sounds like something that may solve your requirement though,
> depending on how much of your read/write load is random. You could
> consider it.
> 
> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> it either). AFAIK, only Flume's making use of it, if you allow it to.
> 
> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> > Hello Manickam,
> >
> >         Append is currently not possible.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> >>
> >> Hi,
> >>
> >> I have moved my input file into the HDFS location in the cluster setup.
> >> Now i got a new set of file which has some new records along with the old
> >> one.
> >> I want to move the delta part alone into HDFS because it will take more
> >> time to move the file from my local to HDFS location.
> >> Is it possible or do i need to move the entire file into HDFS again?
> >>
> >>
> >>
> >> Thanks,
> >> Manickam P
> >
> >
> 
> 
> 
> --
> Harsh J

RE: How to update a file which is in HDFS

Posted by Manickam P <ma...@outlook.com>.

Hi,
Let me explain the question clearly. I have a file which has one million records and i moved into my hadoop cluster. After one month i got a new file which has same one million plus 1000 new records added in end of the file. Here i just want to move the 1000 records alone into HDFS instead of overwriting the entire file. 
Can i use HBase for this scenario? i don't have clear idea about HBase. Just asking.



Thanks,
Manickam P

> From: harsh@cloudera.com
> Date: Fri, 5 Jul 2013 16:13:16 +0530
> Subject: Re: How to update a file which is in HDFS
> To: user@hadoop.apache.org
> 
> The answer to the "delta" part is more that HDFS does not presently
> support random writes. You cannot alter a closed file for anything
> other than appending at the end, which I doubt will help you if you
> are also receiving updates (it isn't clear from your question what
> this added data really is).
> 
> HBase sounds like something that may solve your requirement though,
> depending on how much of your read/write load is random. You could
> consider it.
> 
> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> it either). AFAIK, only Flume's making use of it, if you allow it to.
> 
> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> > Hello Manickam,
> >
> >         Append is currently not possible.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> >>
> >> Hi,
> >>
> >> I have moved my input file into the HDFS location in the cluster setup.
> >> Now i got a new set of file which has some new records along with the old
> >> one.
> >> I want to move the delta part alone into HDFS because it will take more
> >> time to move the file from my local to HDFS location.
> >> Is it possible or do i need to move the entire file into HDFS again?
> >>
> >>
> >>
> >> Thanks,
> >> Manickam P
> >
> >
> 
> 
> 
> --
> Harsh J

RE: How to update a file which is in HDFS

Posted by Manickam P <ma...@outlook.com>.

Hi,
Let me explain the question clearly. I have a file which has one million records and i moved into my hadoop cluster. After one month i got a new file which has same one million plus 1000 new records added in end of the file. Here i just want to move the 1000 records alone into HDFS instead of overwriting the entire file. 
Can i use HBase for this scenario? i don't have clear idea about HBase. Just asking.



Thanks,
Manickam P

> From: harsh@cloudera.com
> Date: Fri, 5 Jul 2013 16:13:16 +0530
> Subject: Re: How to update a file which is in HDFS
> To: user@hadoop.apache.org
> 
> The answer to the "delta" part is more that HDFS does not presently
> support random writes. You cannot alter a closed file for anything
> other than appending at the end, which I doubt will help you if you
> are also receiving updates (it isn't clear from your question what
> this added data really is).
> 
> HBase sounds like something that may solve your requirement though,
> depending on how much of your read/write load is random. You could
> consider it.
> 
> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> it either). AFAIK, only Flume's making use of it, if you allow it to.
> 
> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> > Hello Manickam,
> >
> >         Append is currently not possible.
> >
> > Warm Regards,
> > Tariq
> > cloudfront.blogspot.com
> >
> >
> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> >>
> >> Hi,
> >>
> >> I have moved my input file into the HDFS location in the cluster setup.
> >> Now i got a new set of file which has some new records along with the old
> >> one.
> >> I want to move the delta part alone into HDFS because it will take more
> >> time to move the file from my local to HDFS location.
> >> Is it possible or do i need to move the entire file into HDFS again?
> >>
> >>
> >>
> >> Thanks,
> >> Manickam P
> >
> >
> 
> 
> 
> --
> Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The answer to the "delta" part is more that HDFS does not presently
support random writes. You cannot alter a closed file for anything
other than appending at the end, which I doubt will help you if you
are also receiving updates (it isn't clear from your question what
this added data really is).

HBase sounds like something that may solve your requirement though,
depending on how much of your read/write load is random. You could
consider it.

P.s. HBase too doesn't use the append() APIs today (and doesn't need
it either). AFAIK, only Flume's making use of it, if you allow it to.

On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>
>

--
Harsh J

Re: How to update a file which is in HDFS

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

You can append using WebHDFS.. Following link may help you--
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Append_to_a_File


On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

Ok just read the JIRA in detail (pays to read these things before posting). It says:

Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you're OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag "dfs.support.broken.append" to true.

That says to me you can have append working if you set dfs.support.broken.append to true. So append appears to be available in 1.x but it is hardly recommended.

Robi


On 5 Jul 2013, at 08:45, Robin East <ro...@xense.co.uk> wrote:

> The API for 1.1.2 FileSystem seems to include append().
> Robin 
> On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:
> 
>> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
>> See this JIRA.
>> 
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>> 
>> 
>> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
>> Manickam,
>> 
>>  
>> 
>> HDFS supports append; it is the command-line client that does not. 
>> 
>> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
>> 
>> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
>> 
>>  
>> 
>> John
>> 
>>  
>> 
>>  
>> 
>> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
>> Sent: Thursday, July 04, 2013 5:47 AM
>> To: user@hadoop.apache.org
>> Subject: Re: How to update a file which is in HDFS
>> 
>>  
>> 
>> Hello Manickam,
>> 
>>  
>> 
>>         Append is currently not possible.
>> 
>> 
>> 
>> Warm Regards,
>> 
>> Tariq
>> 
>> cloudfront.blogspot.com
>> 
>>  
>> 
>> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> I have moved my input file into the HDFS location in the cluster setup. 
>> 
>> Now i got a new set of file which has some new records along with the old one. 
>> 
>> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
>> 
>> Is it possible or do i need to move the entire file into HDFS again? 
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> Thanks,
>> Manickam P
>> 
>>  
>> 
>> 
>

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

Ok just read the JIRA in detail (pays to read these things before posting). It says:

Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you're OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag "dfs.support.broken.append" to true.

That says to me you can have append working if you set dfs.support.broken.append to true. So append appears to be available in 1.x but it is hardly recommended.

Robi


On 5 Jul 2013, at 08:45, Robin East <ro...@xense.co.uk> wrote:

> The API for 1.1.2 FileSystem seems to include append().
> Robin 
> On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:
> 
>> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
>> See this JIRA.
>> 
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>> 
>> 
>> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
>> Manickam,
>> 
>>  
>> 
>> HDFS supports append; it is the command-line client that does not. 
>> 
>> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
>> 
>> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
>> 
>>  
>> 
>> John
>> 
>>  
>> 
>>  
>> 
>> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
>> Sent: Thursday, July 04, 2013 5:47 AM
>> To: user@hadoop.apache.org
>> Subject: Re: How to update a file which is in HDFS
>> 
>>  
>> 
>> Hello Manickam,
>> 
>>  
>> 
>>         Append is currently not possible.
>> 
>> 
>> 
>> Warm Regards,
>> 
>> Tariq
>> 
>> cloudfront.blogspot.com
>> 
>>  
>> 
>> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> I have moved my input file into the HDFS location in the cluster setup. 
>> 
>> Now i got a new set of file which has some new records along with the old one. 
>> 
>> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
>> 
>> Is it possible or do i need to move the entire file into HDFS again? 
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> Thanks,
>> Manickam P
>> 
>>  
>> 
>> 
>

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

Ok just read the JIRA in detail (pays to read these things before posting). It says:

Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you're OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag "dfs.support.broken.append" to true.

That says to me you can have append working if you set dfs.support.broken.append to true. So append appears to be available in 1.x but it is hardly recommended.

Robi


On 5 Jul 2013, at 08:45, Robin East <ro...@xense.co.uk> wrote:

> The API for 1.1.2 FileSystem seems to include append().
> Robin 
> On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:
> 
>> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
>> See this JIRA.
>> 
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>> 
>> 
>> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
>> Manickam,
>> 
>>  
>> 
>> HDFS supports append; it is the command-line client that does not. 
>> 
>> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
>> 
>> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
>> 
>>  
>> 
>> John
>> 
>>  
>> 
>>  
>> 
>> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
>> Sent: Thursday, July 04, 2013 5:47 AM
>> To: user@hadoop.apache.org
>> Subject: Re: How to update a file which is in HDFS
>> 
>>  
>> 
>> Hello Manickam,
>> 
>>  
>> 
>>         Append is currently not possible.
>> 
>> 
>> 
>> Warm Regards,
>> 
>> Tariq
>> 
>> cloudfront.blogspot.com
>> 
>>  
>> 
>> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> I have moved my input file into the HDFS location in the cluster setup. 
>> 
>> Now i got a new set of file which has some new records along with the old one. 
>> 
>> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
>> 
>> Is it possible or do i need to move the entire file into HDFS again? 
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> Thanks,
>> Manickam P
>> 
>>  
>> 
>> 
>

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

Ok just read the JIRA in detail (pays to read these things before posting). It says:

Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you're OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag "dfs.support.broken.append" to true.

That says to me you can have append working if you set dfs.support.broken.append to true. So append appears to be available in 1.x but it is hardly recommended.

Robi


On 5 Jul 2013, at 08:45, Robin East <ro...@xense.co.uk> wrote:

> The API for 1.1.2 FileSystem seems to include append().
> Robin 
> On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:
> 
>> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
>> See this JIRA.
>> 
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>> 
>> 
>> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
>> Manickam,
>> 
>>  
>> 
>> HDFS supports append; it is the command-line client that does not. 
>> 
>> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
>> 
>> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
>> 
>>  
>> 
>> John
>> 
>>  
>> 
>>  
>> 
>> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
>> Sent: Thursday, July 04, 2013 5:47 AM
>> To: user@hadoop.apache.org
>> Subject: Re: How to update a file which is in HDFS
>> 
>>  
>> 
>> Hello Manickam,
>> 
>>  
>> 
>>         Append is currently not possible.
>> 
>> 
>> 
>> Warm Regards,
>> 
>> Tariq
>> 
>> cloudfront.blogspot.com
>> 
>>  
>> 
>> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> I have moved my input file into the HDFS location in the cluster setup. 
>> 
>> Now i got a new set of file which has some new records along with the old one. 
>> 
>> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
>> 
>> Is it possible or do i need to move the entire file into HDFS again? 
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> Thanks,
>> Manickam P
>> 
>>  
>> 
>> 
>

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

The API for 1.1.2 FileSystem seems to include append().
Robin 
On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:

> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
> See this JIRA.
> 
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
> 
> 
> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
> Manickam,
> 
>  
> 
> HDFS supports append; it is the command-line client that does not. 
> 
> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
> 
> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
> 
>  
> 
> John
> 
>  
> 
>  
> 
> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
> Sent: Thursday, July 04, 2013 5:47 AM
> To: user@hadoop.apache.org
> Subject: Re: How to update a file which is in HDFS
> 
>  
> 
> Hello Manickam,
> 
>  
> 
>         Append is currently not possible.
> 
> 
> 
> Warm Regards,
> 
> Tariq
> 
> cloudfront.blogspot.com
> 
>  
> 
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> 
> Hi,
> 
>  
> 
> I have moved my input file into the HDFS location in the cluster setup. 
> 
> Now i got a new set of file which has some new records along with the old one. 
> 
> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
> 
> Is it possible or do i need to move the entire file into HDFS again? 
> 
>  
> 
>  
> 
>  
> 
> Thanks,
> Manickam P
> 
>  
> 
>

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

The API for 1.1.2 FileSystem seems to include append().
Robin 
On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:

> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
> See this JIRA.
> 
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
> 
> 
> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
> Manickam,
> 
>  
> 
> HDFS supports append; it is the command-line client that does not. 
> 
> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
> 
> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
> 
>  
> 
> John
> 
>  
> 
>  
> 
> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
> Sent: Thursday, July 04, 2013 5:47 AM
> To: user@hadoop.apache.org
> Subject: Re: How to update a file which is in HDFS
> 
>  
> 
> Hello Manickam,
> 
>  
> 
>         Append is currently not possible.
> 
> 
> 
> Warm Regards,
> 
> Tariq
> 
> cloudfront.blogspot.com
> 
>  
> 
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> 
> Hi,
> 
>  
> 
> I have moved my input file into the HDFS location in the cluster setup. 
> 
> Now i got a new set of file which has some new records along with the old one. 
> 
> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
> 
> Is it possible or do i need to move the entire file into HDFS again? 
> 
>  
> 
>  
> 
>  
> 
> Thanks,
> Manickam P
> 
>  
> 
>

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

The API for 1.1.2 FileSystem seems to include append().
Robin 
On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:

> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
> See this JIRA.
> 
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
> 
> 
> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
> Manickam,
> 
>  
> 
> HDFS supports append; it is the command-line client that does not. 
> 
> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
> 
> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
> 
>  
> 
> John
> 
>  
> 
>  
> 
> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
> Sent: Thursday, July 04, 2013 5:47 AM
> To: user@hadoop.apache.org
> Subject: Re: How to update a file which is in HDFS
> 
>  
> 
> Hello Manickam,
> 
>  
> 
>         Append is currently not possible.
> 
> 
> 
> Warm Regards,
> 
> Tariq
> 
> cloudfront.blogspot.com
> 
>  
> 
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> 
> Hi,
> 
>  
> 
> I have moved my input file into the HDFS location in the cluster setup. 
> 
> Now i got a new set of file which has some new records along with the old one. 
> 
> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
> 
> Is it possible or do i need to move the entire file into HDFS again? 
> 
>  
> 
>  
> 
>  
> 
> Thanks,
> Manickam P
> 
>  
> 
>

Re: How to update a file which is in HDFS

Posted by Robin East <ro...@xense.co.uk>.

The API for 1.1.2 FileSystem seems to include append().
Robin 
On 5 Jul 2013, at 01:50, Mohammad Tariq <do...@gmail.com> wrote:

> The current stable release doesn't support append, not even through the API. If you really want this you have to switch to hadoop 2.x.
> See this JIRA.
> 
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
> 
> 
> On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net> wrote:
> Manickam,
> 
>  
> 
> HDFS supports append; it is the command-line client that does not. 
> 
> You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
> 
> However, this doesn’t completely answer your original question: “How do I move only the delta part”?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.
> 
>  
> 
> John
> 
>  
> 
>  
> 
> From: Mohammad Tariq [mailto:dontariq@gmail.com] 
> Sent: Thursday, July 04, 2013 5:47 AM
> To: user@hadoop.apache.org
> Subject: Re: How to update a file which is in HDFS
> 
>  
> 
> Hello Manickam,
> 
>  
> 
>         Append is currently not possible.
> 
> 
> 
> Warm Regards,
> 
> Tariq
> 
> cloudfront.blogspot.com
> 
>  
> 
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> 
> Hi,
> 
>  
> 
> I have moved my input file into the HDFS location in the cluster setup. 
> 
> Now i got a new set of file which has some new records along with the old one. 
> 
> I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location. 
> 
> Is it possible or do i need to move the entire file into HDFS again? 
> 
>  
> 
>  
> 
>  
> 
> Thanks,
> Manickam P
> 
>  
> 
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

The current stable release doesn't support append, not even through the
API. If you really want this you have to switch to hadoop 2.x.
See this JIRA <https://issues.apache.org/jira/browse/HADOOP-8230>.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net>wrote:

>  Manickam,****
>
> ** **
>
> HDFS supports append; it is the command-line client that does not.  ****
>
> You can write a Java application that opens an HDFS-based file for append,
> and use that instead of the hadoop command line.****
>
> However, this doesn’t completely answer your original question: “How do I
> move only the delta part”?  This can be more complex than simply doing an
> append.  Have records in the original file changed in addition to new
> records becoming available?  If that is the case, you will need to
> completely rewrite the file, as there is no overwriting of existing file
> sections, even directly using HDFS.  There are clever strategies for
> working around this, like splitting the file into multiple parts on HDFS so
> that the overwrite can proceed in parallel on the cluster; however, that
> may be more work that you are looking for.  Even if the delta is limited to
> new records, the problem may not be trivial.  How do you know which records
> are new?  Are all of the new records a the end of the file?  Or can they be
> anywhere in the file?  If the latter, you will need more complex logic.***
> *
>
> ** **
>
> John****
>
> ** **
>
> ** **
>
> *From:* Mohammad Tariq [mailto:dontariq@gmail.com]
> *Sent:* Thursday, July 04, 2013 5:47 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to update a file which is in HDFS****
>
> ** **
>
> Hello Manickam,****
>
> ** **
>
>         Append is currently not possible.****
>
>
> ****
>
> Warm Regards,****
>
> Tariq****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> ****
>
> Hi,****
>
> ** **
>
> I have moved my input file into the HDFS location in the cluster setup. **
> **
>
> Now i got a new set of file which has some new records along with the old
> one. ****
>
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location. ****
>
> Is it possible or do i need to move the entire file into HDFS again? ****
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,
> Manickam P****
>
> ** **
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

The current stable release doesn't support append, not even through the
API. If you really want this you have to switch to hadoop 2.x.
See this JIRA <https://issues.apache.org/jira/browse/HADOOP-8230>.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net>wrote:

>  Manickam,****
>
> ** **
>
> HDFS supports append; it is the command-line client that does not.  ****
>
> You can write a Java application that opens an HDFS-based file for append,
> and use that instead of the hadoop command line.****
>
> However, this doesn’t completely answer your original question: “How do I
> move only the delta part”?  This can be more complex than simply doing an
> append.  Have records in the original file changed in addition to new
> records becoming available?  If that is the case, you will need to
> completely rewrite the file, as there is no overwriting of existing file
> sections, even directly using HDFS.  There are clever strategies for
> working around this, like splitting the file into multiple parts on HDFS so
> that the overwrite can proceed in parallel on the cluster; however, that
> may be more work that you are looking for.  Even if the delta is limited to
> new records, the problem may not be trivial.  How do you know which records
> are new?  Are all of the new records a the end of the file?  Or can they be
> anywhere in the file?  If the latter, you will need more complex logic.***
> *
>
> ** **
>
> John****
>
> ** **
>
> ** **
>
> *From:* Mohammad Tariq [mailto:dontariq@gmail.com]
> *Sent:* Thursday, July 04, 2013 5:47 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to update a file which is in HDFS****
>
> ** **
>
> Hello Manickam,****
>
> ** **
>
>         Append is currently not possible.****
>
>
> ****
>
> Warm Regards,****
>
> Tariq****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> ****
>
> Hi,****
>
> ** **
>
> I have moved my input file into the HDFS location in the cluster setup. **
> **
>
> Now i got a new set of file which has some new records along with the old
> one. ****
>
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location. ****
>
> Is it possible or do i need to move the entire file into HDFS again? ****
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,
> Manickam P****
>
> ** **
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

The current stable release doesn't support append, not even through the
API. If you really want this you have to switch to hadoop 2.x.
See this JIRA <https://issues.apache.org/jira/browse/HADOOP-8230>.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net>wrote:

>  Manickam,****
>
> ** **
>
> HDFS supports append; it is the command-line client that does not.  ****
>
> You can write a Java application that opens an HDFS-based file for append,
> and use that instead of the hadoop command line.****
>
> However, this doesn’t completely answer your original question: “How do I
> move only the delta part”?  This can be more complex than simply doing an
> append.  Have records in the original file changed in addition to new
> records becoming available?  If that is the case, you will need to
> completely rewrite the file, as there is no overwriting of existing file
> sections, even directly using HDFS.  There are clever strategies for
> working around this, like splitting the file into multiple parts on HDFS so
> that the overwrite can proceed in parallel on the cluster; however, that
> may be more work that you are looking for.  Even if the delta is limited to
> new records, the problem may not be trivial.  How do you know which records
> are new?  Are all of the new records a the end of the file?  Or can they be
> anywhere in the file?  If the latter, you will need more complex logic.***
> *
>
> ** **
>
> John****
>
> ** **
>
> ** **
>
> *From:* Mohammad Tariq [mailto:dontariq@gmail.com]
> *Sent:* Thursday, July 04, 2013 5:47 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to update a file which is in HDFS****
>
> ** **
>
> Hello Manickam,****
>
> ** **
>
>         Append is currently not possible.****
>
>
> ****
>
> Warm Regards,****
>
> Tariq****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> ****
>
> Hi,****
>
> ** **
>
> I have moved my input file into the HDFS location in the cluster setup. **
> **
>
> Now i got a new set of file which has some new records along with the old
> one. ****
>
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location. ****
>
> Is it possible or do i need to move the entire file into HDFS again? ****
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,
> Manickam P****
>
> ** **
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

The current stable release doesn't support append, not even through the
API. If you really want this you have to switch to hadoop 2.x.
See this JIRA <https://issues.apache.org/jira/browse/HADOOP-8230>.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Fri, Jul 5, 2013 at 3:05 AM, John Lilley <jo...@redpoint.net>wrote:

>  Manickam,****
>
> ** **
>
> HDFS supports append; it is the command-line client that does not.  ****
>
> You can write a Java application that opens an HDFS-based file for append,
> and use that instead of the hadoop command line.****
>
> However, this doesn’t completely answer your original question: “How do I
> move only the delta part”?  This can be more complex than simply doing an
> append.  Have records in the original file changed in addition to new
> records becoming available?  If that is the case, you will need to
> completely rewrite the file, as there is no overwriting of existing file
> sections, even directly using HDFS.  There are clever strategies for
> working around this, like splitting the file into multiple parts on HDFS so
> that the overwrite can proceed in parallel on the cluster; however, that
> may be more work that you are looking for.  Even if the delta is limited to
> new records, the problem may not be trivial.  How do you know which records
> are new?  Are all of the new records a the end of the file?  Or can they be
> anywhere in the file?  If the latter, you will need more complex logic.***
> *
>
> ** **
>
> John****
>
> ** **
>
> ** **
>
> *From:* Mohammad Tariq [mailto:dontariq@gmail.com]
> *Sent:* Thursday, July 04, 2013 5:47 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to update a file which is in HDFS****
>
> ** **
>
> Hello Manickam,****
>
> ** **
>
>         Append is currently not possible.****
>
>
> ****
>
> Warm Regards,****
>
> Tariq****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
> ****
>
> Hi,****
>
> ** **
>
> I have moved my input file into the HDFS location in the cluster setup. **
> **
>
> Now i got a new set of file which has some new records along with the old
> one. ****
>
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location. ****
>
> Is it possible or do i need to move the entire file into HDFS again? ****
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,
> Manickam P****
>
> ** **
>

RE: How to update a file which is in HDFS

Posted by John Lilley <jo...@redpoint.net>.

Manickam,

HDFS supports append; it is the command-line client that does not.
You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
However, this doesn't completely answer your original question: "How do I move only the delta part"?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.

John


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Thursday, July 04, 2013 5:47 AM
To: user@hadoop.apache.org
Subject: Re: How to update a file which is in HDFS

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>> wrote:
Hi,

I have moved my input file into the HDFS location in the cluster setup.
Now i got a new set of file which has some new records along with the old one.
I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location.
Is it possible or do i need to move the entire file into HDFS again?



Thanks,
Manickam P

RE: How to update a file which is in HDFS

Posted by John Lilley <jo...@redpoint.net>.

Manickam,

HDFS supports append; it is the command-line client that does not.
You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
However, this doesn't completely answer your original question: "How do I move only the delta part"?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.

John


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Thursday, July 04, 2013 5:47 AM
To: user@hadoop.apache.org
Subject: Re: How to update a file which is in HDFS

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>> wrote:
Hi,

I have moved my input file into the HDFS location in the cluster setup.
Now i got a new set of file which has some new records along with the old one.
I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location.
Is it possible or do i need to move the entire file into HDFS again?



Thanks,
Manickam P

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The answer to the "delta" part is more that HDFS does not presently
support random writes. You cannot alter a closed file for anything
other than appending at the end, which I doubt will help you if you
are also receiving updates (it isn't clear from your question what
this added data really is).

HBase sounds like something that may solve your requirement though,
depending on how much of your read/write load is random. You could
consider it.

P.s. HBase too doesn't use the append() APIs today (and doesn't need
it either). AFAIK, only Flume's making use of it, if you allow it to.

On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>
>

--
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The answer to the "delta" part is more that HDFS does not presently
support random writes. You cannot alter a closed file for anything
other than appending at the end, which I doubt will help you if you
are also receiving updates (it isn't clear from your question what
this added data really is).

HBase sounds like something that may solve your requirement though,
depending on how much of your read/write load is random. You could
consider it.

P.s. HBase too doesn't use the append() APIs today (and doesn't need
it either). AFAIK, only Flume's making use of it, if you allow it to.

On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>
>

--
Harsh J

Re: How to update a file which is in HDFS

Posted by Harsh J <ha...@cloudera.com>.

The answer to the "delta" part is more that HDFS does not presently
support random writes. You cannot alter a closed file for anything
other than appending at the end, which I doubt will help you if you
are also receiving updates (it isn't clear from your question what
this added data really is).

HBase sounds like something that may solve your requirement though,
depending on how much of your read/write load is random. You could
consider it.

P.s. HBase too doesn't use the append() APIs today (and doesn't need
it either). AFAIK, only Flume's making use of it, if you allow it to.

On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>
>

--
Harsh J

Re: How to update a file which is in HDFS

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

You can append using WebHDFS.. Following link may help you--
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Append_to_a_File


On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: How to update a file which is in HDFS

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

You can append using WebHDFS.. Following link may help you--
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Append_to_a_File


On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

RE: How to update a file which is in HDFS

Posted by John Lilley <jo...@redpoint.net>.

Manickam,

HDFS supports append; it is the command-line client that does not.
You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
However, this doesn't completely answer your original question: "How do I move only the delta part"?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.

John


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Thursday, July 04, 2013 5:47 AM
To: user@hadoop.apache.org
Subject: Re: How to update a file which is in HDFS

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>> wrote:
Hi,

I have moved my input file into the HDFS location in the cluster setup.
Now i got a new set of file which has some new records along with the old one.
I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location.
Is it possible or do i need to move the entire file into HDFS again?



Thanks,
Manickam P

RE: How to update a file which is in HDFS

Posted by John Lilley <jo...@redpoint.net>.

Manickam,

HDFS supports append; it is the command-line client that does not.
You can write a Java application that opens an HDFS-based file for append, and use that instead of the hadoop command line.
However, this doesn't completely answer your original question: "How do I move only the delta part"?  This can be more complex than simply doing an append.  Have records in the original file changed in addition to new records becoming available?  If that is the case, you will need to completely rewrite the file, as there is no overwriting of existing file sections, even directly using HDFS.  There are clever strategies for working around this, like splitting the file into multiple parts on HDFS so that the overwrite can proceed in parallel on the cluster; however, that may be more work that you are looking for.  Even if the delta is limited to new records, the problem may not be trivial.  How do you know which records are new?  Are all of the new records a the end of the file?  Or can they be anywhere in the file?  If the latter, you will need more complex logic.

John


From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Thursday, July 04, 2013 5:47 AM
To: user@hadoop.apache.org
Subject: Re: How to update a file which is in HDFS

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com>> wrote:
Hi,

I have moved my input file into the HDFS location in the cluster setup.
Now i got a new set of file which has some new records along with the old one.
I want to move the delta part alone into HDFS because it will take more time to move the file from my local to HDFS location.
Is it possible or do i need to move the entire file into HDFS again?



Thanks,
Manickam P

Re: How to update a file which is in HDFS

Posted by Mohammad Mustaqeem <3m...@gmail.com>.

You can append using WebHDFS.. Following link may help you--
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Append_to_a_File


On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Manickam,
>
>         Append is currently not possible.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:
>
>> Hi,
>>
>> I have moved my input file into the HDFS location in the cluster setup.
>> Now i got a new set of file which has some new records along with the old
>> one.
>> I want to move the delta part alone into HDFS because it will take more
>> time to move the file from my local to HDFS location.
>> Is it possible or do i need to move the entire file into HDFS again?
>>
>>
>>
>> Thanks,
>> Manickam P
>>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:

> Hi,
>
> I have moved my input file into the HDFS location in the cluster setup.
> Now i got a new set of file which has some new records along with the old
> one.
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location.
> Is it possible or do i need to move the entire file into HDFS again?
>
>
>
> Thanks,
> Manickam P
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:

> Hi,
>
> I have moved my input file into the HDFS location in the cluster setup.
> Now i got a new set of file which has some new records along with the old
> one.
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location.
> Is it possible or do i need to move the entire file into HDFS again?
>
>
>
> Thanks,
> Manickam P
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:

> Hi,
>
> I have moved my input file into the HDFS location in the cluster setup.
> Now i got a new set of file which has some new records along with the old
> one.
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location.
> Is it possible or do i need to move the entire file into HDFS again?
>
>
>
> Thanks,
> Manickam P
>

Re: How to update a file which is in HDFS

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Manickam,

        Append is currently not possible.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <ma...@outlook.com> wrote:

> Hi,
>
> I have moved my input file into the HDFS location in the cluster setup.
> Now i got a new set of file which has some new records along with the old
> one.
> I want to move the delta part alone into HDFS because it will take more
> time to move the file from my local to HDFS location.
> Is it possible or do i need to move the entire file into HDFS again?
>
>
>
> Thanks,
> Manickam P
>