You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Reed Umbrasas <ry...@microsoft.com.INVALID> on 2018/01/25 19:29:11 UTC

Azure Batch Application Packages

Hi,

As we are developing Azure Batch runtime for REEF, I was looking into what's the best mechanism to submit the shaded JAR to Azure Batch. There are two ways:


  1.  Use Azure Batch Application packages (https://docs.microsoft.com/en-us/azure/batch/batch-application-packages) which is the recommended way to submit application files to Azure Batch nodes.
  2.  Store the JAR in Blob and give the SAS URI to each task as its Resource File (https://docs.microsoft.com/en-us/azure/batch/batch-api-basics#task).

I am listing some pros and cons below. Essentially, it's a trade-off between configuration simplicity and performance. Please let us know your thoughts.

Application Packages:
Pros:

  1.  Their intent exactly matches our use case.
  2.  Each node will download a given application only once during application runtime; if each node runs hundreds or thousands of evaluators that translates to time and bandwidth savings.
Cons:

  1.  Somewhat more complex configuration. In order to create an application package in Azure Batch, we'll need to call Batch management APIs which require service principal authentication. (Data plane APIs require batch key only)
  2.  Batch imposes 20 application limit per account and 40 version limit per application. So we would need to do cleanup work before the Driver completes.

Storage SAS URI:
Pros:

  1.  Simpler configuration - all we need is a storage account name and key.
  2.  No cleanup work necessary.
Cons:

  1.  Batch will download the JAR file every time a task is run which will negatively impact performance.

Thanks,
Reed

Re: Azure Batch Application Packages

Posted by Markus Weimer <ma...@weimo.de>.
We stick per-evaluator information into the JAR files for now. Hence, I
suggest you go with the Storage approach for now. We can revisit if / when
that turns out to become a bottleneck.

Markus

On Thu, Jan 25, 2018 at 11:29 AM, Reed Umbrasas <
ryumbra@microsoft.com.invalid> wrote:

> Hi,
>
> As we are developing Azure Batch runtime for REEF, I was looking into
> what's the best mechanism to submit the shaded JAR to Azure Batch. There
> are two ways:
>
>
>   1.  Use Azure Batch Application packages (https://docs.microsoft.com/
> en-us/azure/batch/batch-application-packages) which is the recommended
> way to submit application files to Azure Batch nodes.
>   2.  Store the JAR in Blob and give the SAS URI to each task as its
> Resource File (https://docs.microsoft.com/en-us/azure/batch/batch-api-
> basics#task).
>
> I am listing some pros and cons below. Essentially, it's a trade-off
> between configuration simplicity and performance. Please let us know your
> thoughts.
>
> Application Packages:
> Pros:
>
>   1.  Their intent exactly matches our use case.
>   2.  Each node will download a given application only once during
> application runtime; if each node runs hundreds or thousands of evaluators
> that translates to time and bandwidth savings.
> Cons:
>
>   1.  Somewhat more complex configuration. In order to create an
> application package in Azure Batch, we'll need to call Batch management
> APIs which require service principal authentication. (Data plane APIs
> require batch key only)
>   2.  Batch imposes 20 application limit per account and 40 version limit
> per application. So we would need to do cleanup work before the Driver
> completes.
>
> Storage SAS URI:
> Pros:
>
>   1.  Simpler configuration - all we need is a storage account name and
> key.
>   2.  No cleanup work necessary.
> Cons:
>
>   1.  Batch will download the JAR file every time a task is run which will
> negatively impact performance.
>
> Thanks,
> Reed
>

Re: Azure Batch Application Packages

Posted by Byung-Gon Chun <bg...@gmail.com>.
Hi Reed,

That's awesome to hear.
Looking forward to hear more.

Cheers,
Gon


On Fri, Jan 26, 2018 at 4:29 AM, Reed Umbrasas <
ryumbra@microsoft.com.invalid> wrote:

> Hi,
>
> As we are developing Azure Batch runtime for REEF, I was looking into
> what's the best mechanism to submit the shaded JAR to Azure Batch. There
> are two ways:
>
>
>   1.  Use Azure Batch Application packages (https://docs.microsoft.com/
> en-us/azure/batch/batch-application-packages) which is the recommended
> way to submit application files to Azure Batch nodes.
>   2.  Store the JAR in Blob and give the SAS URI to each task as its
> Resource File (https://docs.microsoft.com/en-us/azure/batch/batch-api-
> basics#task).
>
> I am listing some pros and cons below. Essentially, it's a trade-off
> between configuration simplicity and performance. Please let us know your
> thoughts.
>
> Application Packages:
> Pros:
>
>   1.  Their intent exactly matches our use case.
>   2.  Each node will download a given application only once during
> application runtime; if each node runs hundreds or thousands of evaluators
> that translates to time and bandwidth savings.
> Cons:
>
>   1.  Somewhat more complex configuration. In order to create an
> application package in Azure Batch, we'll need to call Batch management
> APIs which require service principal authentication. (Data plane APIs
> require batch key only)
>   2.  Batch imposes 20 application limit per account and 40 version limit
> per application. So we would need to do cleanup work before the Driver
> completes.
>
> Storage SAS URI:
> Pros:
>
>   1.  Simpler configuration - all we need is a storage account name and
> key.
>   2.  No cleanup work necessary.
> Cons:
>
>   1.  Batch will download the JAR file every time a task is run which will
> negatively impact performance.
>
> Thanks,
> Reed
>



-- 
Byung-Gon Chun