You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Adamantios Corais <ad...@gmail.com> on 2013/05/31 17:23:30 UTC

File Reloading

I am new to hadoop so apologize beforehand for my very-fundamental question.

Lets assume that I have a file stored into hadoop that it gets updated once
a day, Also assume that there is a task running at the back end of hadoop
that never stops. How could I reload this file so that hadoop starts
considering the updated values than the old ones???

Re: File Reloading

Posted by Raj K Singh <ra...@gmail.com>.

hadoop assume that you have put the updated file into the input folder.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Fri, May 31, 2013 at 8:53 PM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I do not see Raj's response but first, yes you can overwrite data (file) as
many times as you want at the same location in HDFS/Hadoop. Secondly, you
say that the file is small and you indeed want to read it as whole. So, as
I said, then the issue of making sure that the reader task gets the latest
version, then this becomes a generic problem rather than specific to Hadoop
or HDFS. Basically, you would do or adopt the same approach in resolving
this if you were doing this on any file system. As far as I understand,
there is nothing special that you need to do for Hadoop/HDFS.

Regards,
Shahab


On Fri, May 31, 2013 at 11:51 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> @Raj: so, updating the data and storing them into the same destination
> would work?
>
> @Shahab the file is very small, and therefore I am expecting to read it at
> once. what would you suggest?
>
>
> On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> I might not have understood your usecase properly so I apologize for
>> that.
>>
>> But what I think here you need is something outside of Hadoop/HDFS. I am
>> presuming that you need to read the whole updated file when you are going
>> to process it with your never-ending job, right? You don't expect to read
>> it piecemeal or in chunks. If that is indeed the case, then your never
>> ending job can use generic techniques to check whether file's signature or
>> any property has changed from the last time and only process it if it has
>> changed. You file writing/updating process can update the file
>> independently of the reading/processing one.
>>
>> Regards,
>> Shahab
>>
>>
>> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
>> adamantios.corais@gmail.com> wrote:
>>
>>> I am new to hadoop so apologize beforehand for my very-fundamental
>>> question.
>>>
>>> Lets assume that I have a file stored into hadoop that it gets updated
>>> once a day, Also assume that there is a task running at the back end of
>>> hadoop that never stops. How could I reload this file so that hadoop starts
>>> considering the updated values than the old ones???
>>>
>>
>>
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I do not see Raj's response but first, yes you can overwrite data (file) as
many times as you want at the same location in HDFS/Hadoop. Secondly, you
say that the file is small and you indeed want to read it as whole. So, as
I said, then the issue of making sure that the reader task gets the latest
version, then this becomes a generic problem rather than specific to Hadoop
or HDFS. Basically, you would do or adopt the same approach in resolving
this if you were doing this on any file system. As far as I understand,
there is nothing special that you need to do for Hadoop/HDFS.

Regards,
Shahab


On Fri, May 31, 2013 at 11:51 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> @Raj: so, updating the data and storing them into the same destination
> would work?
>
> @Shahab the file is very small, and therefore I am expecting to read it at
> once. what would you suggest?
>
>
> On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> I might not have understood your usecase properly so I apologize for
>> that.
>>
>> But what I think here you need is something outside of Hadoop/HDFS. I am
>> presuming that you need to read the whole updated file when you are going
>> to process it with your never-ending job, right? You don't expect to read
>> it piecemeal or in chunks. If that is indeed the case, then your never
>> ending job can use generic techniques to check whether file's signature or
>> any property has changed from the last time and only process it if it has
>> changed. You file writing/updating process can update the file
>> independently of the reading/processing one.
>>
>> Regards,
>> Shahab
>>
>>
>> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
>> adamantios.corais@gmail.com> wrote:
>>
>>> I am new to hadoop so apologize beforehand for my very-fundamental
>>> question.
>>>
>>> Lets assume that I have a file stored into hadoop that it gets updated
>>> once a day, Also assume that there is a task running at the back end of
>>> hadoop that never stops. How could I reload this file so that hadoop starts
>>> considering the updated values than the old ones???
>>>
>>
>>
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I do not see Raj's response but first, yes you can overwrite data (file) as
many times as you want at the same location in HDFS/Hadoop. Secondly, you
say that the file is small and you indeed want to read it as whole. So, as
I said, then the issue of making sure that the reader task gets the latest
version, then this becomes a generic problem rather than specific to Hadoop
or HDFS. Basically, you would do or adopt the same approach in resolving
this if you were doing this on any file system. As far as I understand,
there is nothing special that you need to do for Hadoop/HDFS.

Regards,
Shahab


On Fri, May 31, 2013 at 11:51 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> @Raj: so, updating the data and storing them into the same destination
> would work?
>
> @Shahab the file is very small, and therefore I am expecting to read it at
> once. what would you suggest?
>
>
> On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> I might not have understood your usecase properly so I apologize for
>> that.
>>
>> But what I think here you need is something outside of Hadoop/HDFS. I am
>> presuming that you need to read the whole updated file when you are going
>> to process it with your never-ending job, right? You don't expect to read
>> it piecemeal or in chunks. If that is indeed the case, then your never
>> ending job can use generic techniques to check whether file's signature or
>> any property has changed from the last time and only process it if it has
>> changed. You file writing/updating process can update the file
>> independently of the reading/processing one.
>>
>> Regards,
>> Shahab
>>
>>
>> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
>> adamantios.corais@gmail.com> wrote:
>>
>>> I am new to hadoop so apologize beforehand for my very-fundamental
>>> question.
>>>
>>> Lets assume that I have a file stored into hadoop that it gets updated
>>> once a day, Also assume that there is a task running at the back end of
>>> hadoop that never stops. How could I reload this file so that hadoop starts
>>> considering the updated values than the old ones???
>>>
>>
>>
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I do not see Raj's response but first, yes you can overwrite data (file) as
many times as you want at the same location in HDFS/Hadoop. Secondly, you
say that the file is small and you indeed want to read it as whole. So, as
I said, then the issue of making sure that the reader task gets the latest
version, then this becomes a generic problem rather than specific to Hadoop
or HDFS. Basically, you would do or adopt the same approach in resolving
this if you were doing this on any file system. As far as I understand,
there is nothing special that you need to do for Hadoop/HDFS.

Regards,
Shahab


On Fri, May 31, 2013 at 11:51 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> @Raj: so, updating the data and storing them into the same destination
> would work?
>
> @Shahab the file is very small, and therefore I am expecting to read it at
> once. what would you suggest?
>
>
> On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:
>
>> I might not have understood your usecase properly so I apologize for
>> that.
>>
>> But what I think here you need is something outside of Hadoop/HDFS. I am
>> presuming that you need to read the whole updated file when you are going
>> to process it with your never-ending job, right? You don't expect to read
>> it piecemeal or in chunks. If that is indeed the case, then your never
>> ending job can use generic techniques to check whether file's signature or
>> any property has changed from the last time and only process it if it has
>> changed. You file writing/updating process can update the file
>> independently of the reading/processing one.
>>
>> Regards,
>> Shahab
>>
>>
>> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
>> adamantios.corais@gmail.com> wrote:
>>
>>> I am new to hadoop so apologize beforehand for my very-fundamental
>>> question.
>>>
>>> Lets assume that I have a file stored into hadoop that it gets updated
>>> once a day, Also assume that there is a task running at the back end of
>>> hadoop that never stops. How could I reload this file so that hadoop starts
>>> considering the updated values than the old ones???
>>>
>>
>>
>

Re: File Reloading

Posted by Adamantios Corais <ad...@gmail.com>.

@Raj: so, updating the data and storing them into the same destination
would work?

@Shahab the file is very small, and therefore I am expecting to read it at
once. what would you suggest?


On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:

> I might not have understood your usecase properly so I apologize for that.
>
> But what I think here you need is something outside of Hadoop/HDFS. I am
> presuming that you need to read the whole updated file when you are going
> to process it with your never-ending job, right? You don't expect to read
> it piecemeal or in chunks. If that is indeed the case, then your never
> ending job can use generic techniques to check whether file's signature or
> any property has changed from the last time and only process it if it has
> changed. You file writing/updating process can update the file
> independently of the reading/processing one.
>
> Regards,
> Shahab
>
>
> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
>> I am new to hadoop so apologize beforehand for my very-fundamental
>> question.
>>
>> Lets assume that I have a file stored into hadoop that it gets updated
>> once a day, Also assume that there is a task running at the back end of
>> hadoop that never stops. How could I reload this file so that hadoop starts
>> considering the updated values than the old ones???
>>
>
>

Re: File Reloading

Posted by Adamantios Corais <ad...@gmail.com>.

@Raj: so, updating the data and storing them into the same destination
would work?

@Shahab the file is very small, and therefore I am expecting to read it at
once. what would you suggest?


On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:

> I might not have understood your usecase properly so I apologize for that.
>
> But what I think here you need is something outside of Hadoop/HDFS. I am
> presuming that you need to read the whole updated file when you are going
> to process it with your never-ending job, right? You don't expect to read
> it piecemeal or in chunks. If that is indeed the case, then your never
> ending job can use generic techniques to check whether file's signature or
> any property has changed from the last time and only process it if it has
> changed. You file writing/updating process can update the file
> independently of the reading/processing one.
>
> Regards,
> Shahab
>
>
> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
>> I am new to hadoop so apologize beforehand for my very-fundamental
>> question.
>>
>> Lets assume that I have a file stored into hadoop that it gets updated
>> once a day, Also assume that there is a task running at the back end of
>> hadoop that never stops. How could I reload this file so that hadoop starts
>> considering the updated values than the old ones???
>>
>
>

Re: File Reloading

Posted by Adamantios Corais <ad...@gmail.com>.

@Raj: so, updating the data and storing them into the same destination
would work?

@Shahab the file is very small, and therefore I am expecting to read it at
once. what would you suggest?


On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:

> I might not have understood your usecase properly so I apologize for that.
>
> But what I think here you need is something outside of Hadoop/HDFS. I am
> presuming that you need to read the whole updated file when you are going
> to process it with your never-ending job, right? You don't expect to read
> it piecemeal or in chunks. If that is indeed the case, then your never
> ending job can use generic techniques to check whether file's signature or
> any property has changed from the last time and only process it if it has
> changed. You file writing/updating process can update the file
> independently of the reading/processing one.
>
> Regards,
> Shahab
>
>
> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
>> I am new to hadoop so apologize beforehand for my very-fundamental
>> question.
>>
>> Lets assume that I have a file stored into hadoop that it gets updated
>> once a day, Also assume that there is a task running at the back end of
>> hadoop that never stops. How could I reload this file so that hadoop starts
>> considering the updated values than the old ones???
>>
>
>

Re: File Reloading

Posted by Adamantios Corais <ad...@gmail.com>.

@Raj: so, updating the data and storing them into the same destination
would work?

@Shahab the file is very small, and therefore I am expecting to read it at
once. what would you suggest?


On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <sh...@gmail.com>wrote:

> I might not have understood your usecase properly so I apologize for that.
>
> But what I think here you need is something outside of Hadoop/HDFS. I am
> presuming that you need to read the whole updated file when you are going
> to process it with your never-ending job, right? You don't expect to read
> it piecemeal or in chunks. If that is indeed the case, then your never
> ending job can use generic techniques to check whether file's signature or
> any property has changed from the last time and only process it if it has
> changed. You file writing/updating process can update the file
> independently of the reading/processing one.
>
> Regards,
> Shahab
>
>
> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
>> I am new to hadoop so apologize beforehand for my very-fundamental
>> question.
>>
>> Lets assume that I have a file stored into hadoop that it gets updated
>> once a day, Also assume that there is a task running at the back end of
>> hadoop that never stops. How could I reload this file so that hadoop starts
>> considering the updated values than the old ones???
>>
>
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I might not have understood your usecase properly so I apologize for that.

But what I think here you need is something outside of Hadoop/HDFS. I am
presuming that you need to read the whole updated file when you are going
to process it with your never-ending job, right? You don't expect to read
it piecemeal or in chunks. If that is indeed the case, then your never
ending job can use generic techniques to check whether file's signature or
any property has changed from the last time and only process it if it has
changed. You file writing/updating process can update the file
independently of the reading/processing one.

Regards,
Shahab

On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I might not have understood your usecase properly so I apologize for that.

But what I think here you need is something outside of Hadoop/HDFS. I am
presuming that you need to read the whole updated file when you are going
to process it with your never-ending job, right? You don't expect to read
it piecemeal or in chunks. If that is indeed the case, then your never
ending job can use generic techniques to check whether file's signature or
any property has changed from the last time and only process it if it has
changed. You file writing/updating process can update the file
independently of the reading/processing one.

Regards,
Shahab

On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>

Re: File Reloading

Posted by Raj K Singh <ra...@gmail.com>.

hadoop assume that you have put the updated file into the input folder.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Fri, May 31, 2013 at 8:53 PM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I might not have understood your usecase properly so I apologize for that.

But what I think here you need is something outside of Hadoop/HDFS. I am
presuming that you need to read the whole updated file when you are going
to process it with your never-ending job, right? You don't expect to read
it piecemeal or in chunks. If that is indeed the case, then your never
ending job can use generic techniques to check whether file's signature or
any property has changed from the last time and only process it if it has
changed. You file writing/updating process can update the file
independently of the reading/processing one.

Regards,
Shahab

On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>

Re: File Reloading

Posted by Raj K Singh <ra...@gmail.com>.

hadoop assume that you have put the updated file into the input folder.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Fri, May 31, 2013 at 8:53 PM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>

Re: File Reloading

Posted by Shahab Yunus <sh...@gmail.com>.

I might not have understood your usecase properly so I apologize for that.

But what I think here you need is something outside of Hadoop/HDFS. I am
presuming that you need to read the whole updated file when you are going
to process it with your never-ending job, right? You don't expect to read
it piecemeal or in chunks. If that is indeed the case, then your never
ending job can use generic techniques to check whether file's signature or
any property has changed from the last time and only process it if it has
changed. You file writing/updating process can update the file
independently of the reading/processing one.

Regards,
Shahab

On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>

Re: File Reloading

Posted by Raj K Singh <ra...@gmail.com>.

hadoop assume that you have put the updated file into the input folder.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Fri, May 31, 2013 at 8:53 PM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
>
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???
>