You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Jarek Jarcec Cecho <ja...@apache.org> on 2014/02/18 20:15:00 UTC

Re: Edit/change saved job without losing incremental history?

Hi Devin,
Indeed you are correct. The original Sqoop 1 metastore do not supports updates of the stored jobs and the only way how to achieve update is to delete the old variant and re-create it. I would like to mention that we've significantly improved this concept in Sqoop 2, where the metastore (now called repository) is a first class citizen that offers (not only) the update capability.

You do not have to be concerned about re-creating incremental jobs in Sqoop 1 though. You can use the --last-value argument to instruct Sqoop not starting from scratch when re-creating the saved jobs. Sqoop is smart enough to realize that this is just the original state that will change on every incremental import.

Jarcec

On Thu, Jan 30, 2014 at 09:52:06PM -0500, Devin Suiter RDX wrote:
> Hello,
> 
> I am exploring Sqoop, and wondering about a hypothetical problem. What
> happens if someone needs to change a saved incremental append job in Sqoop
> 1? Say for example the connection string needed to be changed because of a
> network change or something. Shouldn't do that to your cluster, I know,
> but, things happen.
> 
> From what I have seen, there isn't any way to edit an in-place job...only
> to delete it and create a new one. If you have to do that, I'm guessing the
> metastore will drop the history of the incremental append last value. If
> you build a new incremental append job, that will bring everything over
> from the beginning, which may not be good to do. If you run an incremental
> import specifying the "from last value" parameter, that won't be a good for
> a saved job, will it? Wouldn't it just begin from that value every time?
> 
> Is there a way to edit saved job parameters in the metastore directly so
> you can update the connection string or inject the "last value" value in
> the new job history after you create it but before running it the first
> time? Or is something saved by job name?
> 
> Just wondering if there is a solution here that isn't really apparent in
> the documentation...
> 
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com

Re: Edit/change saved job without losing incremental history?

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
> Oh, great! So, just get the last value from the old job before it's wiped,
> set it up as incremental append and also the last-value arg, and it will
> sort it out? That's good to know. 

Yup, exactly.

Jarcec

On Tue, Feb 18, 2014 at 02:24:22PM -0500, Devin Suiter RDX wrote:
> Oh, great! So, just get the last value from the old job before it's wiped,
> set it up as incremental append and also the last-value arg, and it will
> sort it out? That's good to know. I thought maybe sqoop -merge to flatten
> the sets if you had to start over would be one way to have the same import
> directory with everything from both imports once everything was finished.
> This way is cleaner and easier. I know Sqoop 2 is better for this sort of
> thing, not sure why we ended up using Sqoop 1.
> 
> Thanks for the reply!
> 
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
> 
> 
> On Tue, Feb 18, 2014 at 2:15 PM, Jarek Jarcec Cecho <ja...@apache.org>wrote:
> 
> > Hi Devin,
> > Indeed you are correct. The original Sqoop 1 metastore do not supports
> > updates of the stored jobs and the only way how to achieve update is to
> > delete the old variant and re-create it. I would like to mention that we've
> > significantly improved this concept in Sqoop 2, where the metastore (now
> > called repository) is a first class citizen that offers (not only) the
> > update capability.
> >
> > You do not have to be concerned about re-creating incremental jobs in
> > Sqoop 1 though. You can use the --last-value argument to instruct Sqoop not
> > starting from scratch when re-creating the saved jobs. Sqoop is smart
> > enough to realize that this is just the original state that will change on
> > every incremental import.
> >
> > Jarcec
> >
> > On Thu, Jan 30, 2014 at 09:52:06PM -0500, Devin Suiter RDX wrote:
> > > Hello,
> > >
> > > I am exploring Sqoop, and wondering about a hypothetical problem. What
> > > happens if someone needs to change a saved incremental append job in
> > Sqoop
> > > 1? Say for example the connection string needed to be changed because of
> > a
> > > network change or something. Shouldn't do that to your cluster, I know,
> > > but, things happen.
> > >
> > > From what I have seen, there isn't any way to edit an in-place job...only
> > > to delete it and create a new one. If you have to do that, I'm guessing
> > the
> > > metastore will drop the history of the incremental append last value. If
> > > you build a new incremental append job, that will bring everything over
> > > from the beginning, which may not be good to do. If you run an
> > incremental
> > > import specifying the "from last value" parameter, that won't be a good
> > for
> > > a saved job, will it? Wouldn't it just begin from that value every time?
> > >
> > > Is there a way to edit saved job parameters in the metastore directly so
> > > you can update the connection string or inject the "last value" value in
> > > the new job history after you create it but before running it the first
> > > time? Or is something saved by job name?
> > >
> > > Just wondering if there is a solution here that isn't really apparent in
> > > the documentation...
> > >
> > > Thanks,
> > > *Devin Suiter*
> > > Jr. Data Solutions Software Engineer
> > > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> > > Google Voice: 412-256-8556 | www.rdx.com
> >

Re: Edit/change saved job without losing incremental history?

Posted by Devin Suiter RDX <ds...@rdx.com>.
Oh, great! So, just get the last value from the old job before it's wiped,
set it up as incremental append and also the last-value arg, and it will
sort it out? That's good to know. I thought maybe sqoop -merge to flatten
the sets if you had to start over would be one way to have the same import
directory with everything from both imports once everything was finished.
This way is cleaner and easier. I know Sqoop 2 is better for this sort of
thing, not sure why we ended up using Sqoop 1.

Thanks for the reply!

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Tue, Feb 18, 2014 at 2:15 PM, Jarek Jarcec Cecho <ja...@apache.org>wrote:

> Hi Devin,
> Indeed you are correct. The original Sqoop 1 metastore do not supports
> updates of the stored jobs and the only way how to achieve update is to
> delete the old variant and re-create it. I would like to mention that we've
> significantly improved this concept in Sqoop 2, where the metastore (now
> called repository) is a first class citizen that offers (not only) the
> update capability.
>
> You do not have to be concerned about re-creating incremental jobs in
> Sqoop 1 though. You can use the --last-value argument to instruct Sqoop not
> starting from scratch when re-creating the saved jobs. Sqoop is smart
> enough to realize that this is just the original state that will change on
> every incremental import.
>
> Jarcec
>
> On Thu, Jan 30, 2014 at 09:52:06PM -0500, Devin Suiter RDX wrote:
> > Hello,
> >
> > I am exploring Sqoop, and wondering about a hypothetical problem. What
> > happens if someone needs to change a saved incremental append job in
> Sqoop
> > 1? Say for example the connection string needed to be changed because of
> a
> > network change or something. Shouldn't do that to your cluster, I know,
> > but, things happen.
> >
> > From what I have seen, there isn't any way to edit an in-place job...only
> > to delete it and create a new one. If you have to do that, I'm guessing
> the
> > metastore will drop the history of the incremental append last value. If
> > you build a new incremental append job, that will bring everything over
> > from the beginning, which may not be good to do. If you run an
> incremental
> > import specifying the "from last value" parameter, that won't be a good
> for
> > a saved job, will it? Wouldn't it just begin from that value every time?
> >
> > Is there a way to edit saved job parameters in the metastore directly so
> > you can update the connection string or inject the "last value" value in
> > the new job history after you create it but before running it the first
> > time? Or is something saved by job name?
> >
> > Just wondering if there is a solution here that isn't really apparent in
> > the documentation...
> >
> > Thanks,
> > *Devin Suiter*
> > Jr. Data Solutions Software Engineer
> > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> > Google Voice: 412-256-8556 | www.rdx.com
>