You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Abraham Fine <ab...@abrahamfine.com> on 2016/03/01 19:42:33 UTC

Re: Sqoop2 scheduler pool support

At this time sqoop2 does not provide a mechanism to configure the job’s scheduler pool or provide a mechanism for passing through arbitrary configuration to the map/reduce job.

I am not sure that configuring a scheduler pool is something that we would want to specifically prompt for in the shell but I definitely could see the use case for passing through job specific mapreduce configuration.

Please feel free to open a JIRA for this feature request.

Thanks,
Abe


> On Feb 29, 2016, at 10:33 AM, Scott Kuehn <sc...@opower.com> wrote:
> 
> Does sqoop2 provide a mechanism to configure jobs to run in ad-hoc scheduler pools? By ad-hoc, I mean a scheduler pool that is not necessarily the same as the pool configured in the sqoop2 server's mapred-site.xml.
> 
> The use case is to limit cluster-wide sqoop access to a particular FROM resource. While the throttling extractor mechanics are useful for preventing a single job from saturating the resource, this mechanism cannot limit aggregate resource access across jobs. I'd like to allocate a yarn scheduler pool that caps the vcores and ram available for jobs accessing the particularly sensitive database. A subset of sqoop2 jobs would be configured to run in this pool, whereas other sqoop2 jobs would fall back to the default pool configured for the sqoop2 server.
> 
> A glance at the code and some recent configuration work <https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+Config+as+Top+Level+Entity> suggests this functionality isn't available today. I'm interested to hear if this is the case, and whether or not any reasonable workarounds exist. I'm using apache sqoop 1.99.6-RC2.
>

Re: Sqoop2 scheduler pool support

Posted by Scott Kuehn <sc...@opower.com>.

Thanks Abe. SQOOP-2861 <https://issues.apache.org/jira/browse/SQOOP-2861> has
been created for this feature request.

On Tue, Mar 1, 2016 at 10:42 AM, Abraham Fine <ab...@abrahamfine.com> wrote:

> At this time sqoop2 does not provide a mechanism to configure the job’s
> scheduler pool or provide a mechanism for passing through arbitrary
> configuration to the map/reduce job.
>
> I am not sure that configuring a scheduler pool is something that we would
> want to specifically prompt for in the shell but I definitely could see the
> use case for passing through job specific mapreduce configuration.
>
> Please feel free to open a JIRA for this feature request.
>
> Thanks,
> Abe
>
>
> On Feb 29, 2016, at 10:33 AM, Scott Kuehn <sc...@opower.com> wrote:
>
> Does sqoop2 provide a mechanism to configure jobs to run in ad-hoc
> scheduler pools? By ad-hoc, I mean a scheduler pool that is not necessarily
> the same as the pool configured in the sqoop2 server's mapred-site.xml.
>
> The use case is to limit cluster-wide sqoop access to a particular FROM
> resource. While the throttling extractor mechanics are useful for
> preventing a single job from saturating the resource, this mechanism cannot
> limit aggregate resource access across jobs. I'd like to allocate a yarn
> scheduler pool that caps the vcores and ram available for jobs accessing
> the particularly sensitive database. A subset of sqoop2 jobs would be
> configured to run in this pool, whereas other sqoop2 jobs would fall back
> to the default pool configured for the sqoop2 server.
>
> A glance at the code and some recent configuration work
> <https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+Config+as+Top+Level+Entity> suggests
> this functionality isn't available today. I'm interested to hear if this is
> the case, and whether or not any reasonable workarounds exist. I'm using
> apache sqoop 1.99.6-RC2.
>
>
>