You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Wei Chen <we...@apache.org> on 2019/07/10 05:47:05 UTC

Set TimeOut and continue with other tasks

Hello All,

I am using spark to process some files parallelly.
While most files are able to be processed within 3 seconds,
it is possible that we stuck on 1 or 2 files as they will never finish (or
will take more than 48 hours).
Since it is a 3rd party file conversion tool, we are not able to debug why
the converter stuck at the time.

Is it possible that we set a timeout for our process, throw exceptions for
those tasks,
while still continue with other successful tasks?

Best Regards
Wei

Re: Set TimeOut and continue with other tasks

Posted by Wei Chen <we...@apache.org>.
I am currently trying to use Future Await to set a timeout inside the
map-reduce.
However, the tasks now fail instead of stuck, even if I have a Try Match to
catch it.
Doesn't anyone have an idea why?

The code is like

```Scala
files.map { file =>
  Try {
    def tmpFunc(): Boolean = { FILE CONVERTION ON HDFS }
    val tmpFuture = Future[Boolean] { tmpFunc() }
    Await.result(tmpFuture, 600 seconds)
  } match {
    case Failure(e) => "F"
    case Success(r) => "S"
  }
}
```

The converter is created in a lazy function in a broadcast object,
which shouldn't be a problem.

Best Regards
Wei


On Wed, Jul 10, 2019 at 3:16 PM Gourav Sengupta <go...@gmail.com>
wrote:

> Is there a way you can identify those patterns in a file or in its name
> and then just tackle them in separate jobs? I use the function
> input_file_name() to find the name of input file of each record and then
> filter out certain files.
>
> Regards,
> Gourav
>
> On Wed, Jul 10, 2019 at 6:47 AM Wei Chen <we...@apache.org> wrote:
>
>> Hello All,
>>
>> I am using spark to process some files parallelly.
>> While most files are able to be processed within 3 seconds,
>> it is possible that we stuck on 1 or 2 files as they will never finish
>> (or will take more than 48 hours).
>> Since it is a 3rd party file conversion tool, we are not able to debug
>> why the converter stuck at the time.
>>
>> Is it possible that we set a timeout for our process, throw exceptions
>> for those tasks,
>> while still continue with other successful tasks?
>>
>> Best Regards
>> Wei
>>
>

Re: Set TimeOut and continue with other tasks

Posted by Gourav Sengupta <go...@gmail.com>.
Is there a way you can identify those patterns in a file or in its name and
then just tackle them in separate jobs? I use the function
input_file_name() to find the name of input file of each record and then
filter out certain files.

Regards,
Gourav

On Wed, Jul 10, 2019 at 6:47 AM Wei Chen <we...@apache.org> wrote:

> Hello All,
>
> I am using spark to process some files parallelly.
> While most files are able to be processed within 3 seconds,
> it is possible that we stuck on 1 or 2 files as they will never finish (or
> will take more than 48 hours).
> Since it is a 3rd party file conversion tool, we are not able to debug why
> the converter stuck at the time.
>
> Is it possible that we set a timeout for our process, throw exceptions for
> those tasks,
> while still continue with other successful tasks?
>
> Best Regards
> Wei
>