You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Dwaipayan Mukhopadhyay (JIRA)" <ji...@apache.org> on 2018/05/23 22:37:00 UTC

[jira] [Assigned] (REEF-2017) Org.Apache.REEF.IO.FileSystem.AzureBlob produces Error 503 (server unavailable) when reading data from Azure Blob into >=80 evaluators

     [ https://issues.apache.org/jira/browse/REEF-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dwaipayan Mukhopadhyay reassigned REEF-2017:
--------------------------------------------

    Assignee: Dwaipayan Mukhopadhyay

> Org.Apache.REEF.IO.FileSystem.AzureBlob produces Error 503 (server unavailable) when reading data from Azure Blob into >=80 evaluators
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: REEF-2017
>                 URL: https://issues.apache.org/jira/browse/REEF-2017
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF.NET IO
>    Affects Versions: 0.17
>            Reporter: Najeeb Kazmi
>            Assignee: Dwaipayan Mukhopadhyay
>            Priority: Blocker
>             Fix For: 0.17
>
>
> Running into an issue where Azure Storage produces Microsoft.WindowsAzure.Storage.StorageException Error 503 server unavailable when I run a job that downloads data partitions from Azure Storage to 80 evaluators or more. This does not happen when using 64 evaluators. Full stack trace below.
>  
>  
> Org.Apache.REEF.IMRU.OnREEF.Driver.IMRUDriver`4[[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput, Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput, Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Runtime.IPredictor, Microsoft.MachineLearning.Core, Version=3.9.290.3615, Culture=neutral, PublicKeyToken=d353f9ba84f0e281],[Microsoft.MachineLearning.Distributed.Core.Common.IPipeline, Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null]] Warning: 0 : 2018-05-11T00:59:28.4674513+00:00 0031 : WARNING: Received IFailedEvaluator bf0bcb92-5773-448d-bffa-6c478b619beb from endpoint unknown_endpoint with systemState WaitingForEvaluator in retry# 0 with Exception: Org.Apache.REEF.Driver.Evaluator.EvaluatorException: One or more errors occurred. ---> System.AggregateException: One or more errors occurred. ---> Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (503) Server Unavailable. ---> System.Net.WebException: The remote server returned an error: (503) Server Unavailable.
>  at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpResponseParsers.ProcessExpectedStatusCodeNoException[T](HttpStatusCode expectedStatusCode, HttpStatusCode actualStatusCode, T retVal, StorageCommandBase`1 cmd, Exception ex)
>  at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.<>c__DisplayClass1e.<GetBlobImpl>b__1b(RESTCommand`1 cmd, HttpWebResponse resp, Exception ex, OperationContext ctx)
>  at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndGetResponse[T](IAsyncResult getResponseResult)
>  --- End of inner exception stack trace ---
>  at Microsoft.WindowsAzure.Storage.Core.Util.StorageAsyncResult`1.End()
>  at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass4.<CreateCallbackVoid>b__3(IAsyncResult ar)
>  --- End of inner exception stack trace ---
>  at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
>  at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
>  at Org.Apache.REEF.IO.FileSystem.AzureBlob.AzureCloudBlockBlob.DownloadToFile(String path, FileMode mode)
>  at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Download()
>  at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Cache()
>  at Org.Apache.REEF.IMRU.OnREEF.Driver.DataLoadingContext`1.OnNext(IContextStart value)
>  at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextLifeCycle.Start()
>  at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextRuntime..ctor(IInjector serviceInjector, IConfiguration contextConfiguration, Optional`1 parentContext)
>  --- End of inner exception stack trace ---.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)