You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Michael G." <vi...@gmail.com> on 2012/06/01 20:54:42 UTC

file manipulation

Hi all
I'm new in pig and in hadoop .
Can you tell me how I can :
1. append to existing file on HDFS with pig
2. update file  with pig, if it could be passible.

10x.

-- 
-- Michael G. --

Re: file manipulation

Posted by "Michael G." <vi...@gmail.com>.
thanks for all yours answers.
Michael G.


2012/6/3 Jagat <ja...@gmail.com>

> Hi
>
> Alan has already given you background of append in Hadoop.
>
> Another suggestion to merge two files , you can also look at Pig Union
>
> http://pig.apache.org/docs/r0.10.0/basic.html#union
>
> UNION operator to merge the contents of two or more relations
> The simple workflow can be
>
> load A
> load B
> Store union of A and B
>
> Have a look at how Pig Union works
>  On Sun, Jun 3, 2012 at 8:28 AM, Alan Gates <ga...@hortonworks.com> wrote:
>
> > MapReduce (and hence Pig) does not support file append.  This is because
> > in MapReduce tasks may be run multiple times in the case of failure or
> due
> > to speculative execution.  This would result in duplicate appends.  Also,
> > if the job fails, it would not be able to remove the appended data.
> >
> > As far as updating your data, what kind of updates do you want to do?
> >  Stores like HBase (which can be accessed from Pig) support updates.  But
> > whether this is a good fit depends on your use case.
> >
> > Alan.
> >
> > On Jun 1, 2012, at 11:54 AM, Michael G. wrote:
> >
> > > Hi all
> > > I'm new in pig and in hadoop .
> > > Can you tell me how I can :
> > > 1. append to existing file on HDFS with pig
> > > 2. update file  with pig, if it could be passible.
> > >
> > > 10x.
> > >
> > > --
> > > -- Michael G. --
> >
> >
>



-- 
-- Michael G. --

Re: file manipulation

Posted by Jagat <ja...@gmail.com>.
Hi

Alan has already given you background of append in Hadoop.

Another suggestion to merge two files , you can also look at Pig Union

http://pig.apache.org/docs/r0.10.0/basic.html#union

UNION operator to merge the contents of two or more relations
The simple workflow can be

load A
load B
Store union of A and B

Have a look at how Pig Union works
On Sun, Jun 3, 2012 at 8:28 AM, Alan Gates <ga...@hortonworks.com> wrote:

> MapReduce (and hence Pig) does not support file append.  This is because
> in MapReduce tasks may be run multiple times in the case of failure or due
> to speculative execution.  This would result in duplicate appends.  Also,
> if the job fails, it would not be able to remove the appended data.
>
> As far as updating your data, what kind of updates do you want to do?
>  Stores like HBase (which can be accessed from Pig) support updates.  But
> whether this is a good fit depends on your use case.
>
> Alan.
>
> On Jun 1, 2012, at 11:54 AM, Michael G. wrote:
>
> > Hi all
> > I'm new in pig and in hadoop .
> > Can you tell me how I can :
> > 1. append to existing file on HDFS with pig
> > 2. update file  with pig, if it could be passible.
> >
> > 10x.
> >
> > --
> > -- Michael G. --
>
>

Re: file manipulation

Posted by Alan Gates <ga...@hortonworks.com>.
MapReduce (and hence Pig) does not support file append.  This is because in MapReduce tasks may be run multiple times in the case of failure or due to speculative execution.  This would result in duplicate appends.  Also, if the job fails, it would not be able to remove the appended data.

As far as updating your data, what kind of updates do you want to do?  Stores like HBase (which can be accessed from Pig) support updates.  But whether this is a good fit depends on your use case.

Alan.

On Jun 1, 2012, at 11:54 AM, Michael G. wrote:

> Hi all
> I'm new in pig and in hadoop .
> Can you tell me how I can :
> 1. append to existing file on HDFS with pig
> 2. update file  with pig, if it could be passible.
> 
> 10x.
> 
> -- 
> -- Michael G. --


Re: file manipulation

Posted by Jonathan Coveney <jc...@gmail.com>.
Appending is an HDFS issue. I haven't followed it closely, but I know it
was only added relatively recently (if at all) and I personally haven't
used it. Generally, you can't append to files.

"Updating a file" in our workflows generally involves making a new file,
then deleting and moving the new file to replace the old one.

2012/6/1 Michael G. <vi...@gmail.com>

> Hi all
> I'm new in pig and in hadoop .
> Can you tell me how I can :
> 1. append to existing file on HDFS with pig
> 2. update file  with pig, if it could be passible.
>
> 10x.
>
> --
> -- Michael G. --
>