You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Richard Moorhead <ri...@gmail.com> on 2020/02/18 00:15:48 UTC

job history server

I see the following exception often:

2020-02-17 18:13:26,796 ERROR
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
Failure while fetching/processing job archive for job
eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
No space left on device
        at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
        at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:767)
        at
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere
near full?

Is there a workaround to this?

Re: job history server

Posted by Richard Moorhead <ri...@gmail.com>.
2020-02-18 09:44:45,227 ERROR
org.apache.flink.runtime.webmonitor.hist/ry.HistoryServerArchiveFetcher  -
Failure while fetching/process
ing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/062e4d80ed1d4bdafd24e46
2245c5926/subtasks/86/attempts/0.json: No space left on device

and there it is:

42103b5b-5410-d2d8-6a0b-21757e4a0fbc ~
0 % df -iH
Filesystem           Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg00-rootlv00
                       132k   13k  119k   10% /
tmpfs               `  508k  465k   43k   92% /dev/shm

Thanks for the tip.

On Mon, Feb 17, 2020 at 8:08 PM Richard Moorhead <ri...@gmail.com>
wrote:

> I did not know that.
>
> I have since wiped the directory. I will post when I see this error again.
>
> On Mon, Feb 17, 2020 at 8:03 PM Benchao Li <li...@gmail.com> wrote:
>
>> `df -H` only gives the sizes, not inodes information. Could you also show
>> us the result of `df -iH`?
>>
>> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午9:40写道:
>>
>>> Yes, I did. I mentioned it last but I should have been clearer:
>>>
>>> 22526:~/ $ df -H
>>>
>>>
>>>                                  [18:15:20]
>>> Filesystem            Size  Used Avail Use% Mounted on
>>> /dev/mapper/vg00-rootlv00
>>>                       2.1G  777M  1.2G  41% /
>>> tmpfs                 2.1G  753M  1.4G  37% /dev/shm
>>>
>>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:
>>>
>>>> Hi Richard,
>>>>
>>>> Have you checked that inodes of the disk partition were full or not?
>>>>
>>>> Richard Moorhead |richard.moorhead@gmail.com> 于2020年2月18日周二 上午8:16写道:
>>>>
>>>>> I see the following exception often:
>>>>>
>>>>> 2020-02-17 18:13:26,796 ERROR
>>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>>>>> Failure while fetching/processing job archive for job
>>>>> eaf0639027aca1624adaa100bdf1332e.
>>>>> java.nio.file.FileSystemException:
>>>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6ab&3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>>>> No space left on device
>>>>>         at
>>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>>>>         at
>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>>         at
>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>>>>         at
>>>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>>>>         at java.nio.file.Files.createDirectory(Files.java:674)
>>>>>         at
>>>>> java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
J>>>>>         at java.nio.file.Files.createDirectories(Files.java:767)
>>>>>         at
>>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>>>>         at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>         at
>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>>         at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>>         at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>         at java.lang.Thread.run(Thread.java:748)
>>>>>
>>>>>
>>>>> Unfortunately the partition listed does not appear to be full or
>>>>> anywhere near full?
>>>>>
>>>>> Is there ! workaround to this?
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Benchao Li
>>>> School of Electronics Engineering and Computer Science, Peking University
>>>> Tel:+86-15650713730
>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>
>>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>

Re: job history server

Posted by Richard Moorhead <ri...@gmail.com>.
I did not know that.

I have since wiped the directory. I will post when I see this error again.

On Mon, Feb 17, 2020 at 8:03 PM Benchao Li <li...@gmail.com> wrote:

> `df -H` only gives the sizes, not inodes information. Could you also show
> us the result of `df -iH`?
>
> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午9:40写道:
>
>> Yes, I did. I mentioned it last but I should have been clearer:
>>
>> 22526:~/ $ df -H
>>
>>
>>                                  [18:15:20]
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/mapper/vg00-rootlv00
>>                       2.1G  777M  1.2G  41% /
>> tmpfs                 2.1G  753M  1.4G  37% /dev/shm
>>
>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:
>>
>>> Hi Richard,
>>>
>>> Have you checked that inodes of the disk partition were full or not?
>>>
>>> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:
>>>
>>>> I see the following exception often:
>>>>
>>>> 2020-02-17 18:13:26,796 ERROR
>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>>>> Failure while fetching/processing job archive for job
>>>> eaf0639027aca1624adaa100bdf1332e.
>>>> java.nio.file.FileSystemException:
>>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>>> No space left on device
>>>>         at
>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>>>         at
>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>         at
>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>>>         at
>>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>>>         at java.nio.file.Files.createDirectory(Files.java:674)
>>>>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>>>         at java.nio.file.Files.createDirectories(Files.java:767)
>>>>         at
>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>>>         at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>         at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>         at java.lang.Thread.run(Thread.java:748)
>>>>
>>>>
>>>> Unfortunately the partition listed does not appear to be full or
>>>> anywhere near full?
>>>>
>>>> Is there a workaround to this?
>>>>
>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>

Re: job history server

Posted by Benchao Li <li...@gmail.com>.
`df -H` only gives the sizes, not inodes information. Could you also show
us the result of `df -iH`?

Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午9:40写道:

> Yes, I did. I mentioned it last but I should have been clearer:
>
> 22526:~/ $ df -H
>
>
>                                [18:15:20]
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/mapper/vg00-rootlv00
>                       2.1G  777M  1.2G  41% /
> tmpfs                 2.1G  753M  1.4G  37% /dev/shm
>
> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:
>
>> Hi Richard,
>>
>> Have you checked that inodes of the disk partition were full or not?
>>
>> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:
>>
>>> I see the following exception often:
>>>
>>> 2020-02-17 18:13:26,796 ERROR
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>>> Failure while fetching/processing job archive for job
>>> eaf0639027aca1624adaa100bdf1332e.
>>> java.nio.file.FileSystemException:
>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>> No space left on device
>>>         at
>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>>         at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>         at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>>         at
>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>>         at java.nio.file.Files.createDirectory(Files.java:674)
>>>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>>         at java.nio.file.Files.createDirectories(Files.java:767)
>>>         at
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>>         at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>         at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>         at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>         at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>         at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>> Unfortunately the partition listed does not appear to be full or
>>> anywhere near full?
>>>
>>> Is there a workaround to this?
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn

Re: job history server

Posted by Richard Moorhead <ri...@gmail.com>.
Yes, I did. I mentioned it last but I should have been clearer:

22526:~/ $ df -H


                               [18:15:20]
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
                      2.1G  777M  1.2G  41% /
tmpfs                 2.1G  753M  1.4G  37% /dev/shm

On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:

> Hi Richard,
>
> Have you checked that inodes of the disk partition were full or not?
>
> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:
>
>> I see the following exception often:
>>
>> 2020-02-17 18:13:26,796 ERROR
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>> Failure while fetching/processing job archive for job
>> eaf0639027aca1624adaa100bdf1332e.
>> java.nio.file.FileSystemException:
>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>> No space left on device
>>         at
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>         at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>         at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>         at
>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>         at java.nio.file.Files.createDirectory(Files.java:674)
>>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>         at java.nio.file.Files.createDirectories(Files.java:767)
>>         at
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>         at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Unfortunately the partition listed does not appear to be full or anywhere
>> near full?
>>
>> Is there a workaround to this?
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>

Re: job history server

Posted by Benchao Li <li...@gmail.com>.
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:

> I see the following exception often:
>
> 2020-02-17 18:13:26,796 ERROR
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
> Failure while fetching/processing job archive for job
> eaf0639027aca1624adaa100bdf1332e.
> java.nio.file.FileSystemException:
> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
> No space left on device
>         at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>         at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>         at java.nio.file.Files.createDirectory(Files.java:674)
>         at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>         at java.nio.file.Files.createDirectories(Files.java:767)
>         at
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
>
> Unfortunately the partition listed does not appear to be full or anywhere
> near full?
>
> Is there a workaround to this?
>
>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn