You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Richard Moorhead <ri...@gmail.com> on 2020/02/18 00:15:48 UTC
job history server
I see the following exception often:
2020-02-17 18:13:26,796 ERROR
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher -
Failure while fetching/processing job archive for job
eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
No space left on device
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
at java.nio.file.Files.createDirectories(Files.java:767)
at
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Unfortunately the partition listed does not appear to be full or anywhere
near full?
Is there a workaround to this?
Re: job history server
Posted by Richard Moorhead <ri...@gmail.com>.
2020-02-18 09:44:45,227 ERROR
org.apache.flink.runtime.webmonitor.hist/ry.HistoryServerArchiveFetcher -
Failure while fetching/process
ing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/062e4d80ed1d4bdafd24e46
2245c5926/subtasks/86/attempts/0.json: No space left on device
and there it is:
42103b5b-5410-d2d8-6a0b-21757e4a0fbc ~
0 % df -iH
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg00-rootlv00
132k 13k 119k 10% /
tmpfs ` 508k 465k 43k 92% /dev/shm
Thanks for the tip.
On Mon, Feb 17, 2020 at 8:08 PM Richard Moorhead <ri...@gmail.com>
wrote:
> I did not know that.
>
> I have since wiped the directory. I will post when I see this error again.
>
> On Mon, Feb 17, 2020 at 8:03 PM Benchao Li <li...@gmail.com> wrote:
>
>> `df -H` only gives the sizes, not inodes information. Could you also show
>> us the result of `df -iH`?
>>
>> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午9:40写道:
>>
>>> Yes, I did. I mentioned it last but I should have been clearer:
>>>
>>> 22526:~/ $ df -H
>>>
>>>
>>> [18:15:20]
>>> Filesystem Size Used Avail Use% Mounted on
>>> /dev/mapper/vg00-rootlv00
>>> 2.1G 777M 1.2G 41% /
>>> tmpfs 2.1G 753M 1.4G 37% /dev/shm
>>>
>>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:
>>>
>>>> Hi Richard,
>>>>
>>>> Have you checked that inodes of the disk partition were full or not?
>>>>
>>>> Richard Moorhead |richard.moorhead@gmail.com> 于2020年2月18日周二 上午8:16写道:
>>>>
>>>>> I see the following exception often:
>>>>>
>>>>> 2020-02-17 18:13:26,796 ERROR
>>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher -
>>>>> Failure while fetching/processing job archive for job
>>>>> eaf0639027aca1624adaa100bdf1332e.
>>>>> java.nio.file.FileSystemException:
>>>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6ab&3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>>>> No space left on device
>>>>> at
>>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>>>> at
>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>> at
>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>>>> at
>>>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>>>> at java.nio.file.Files.createDirectory(Files.java:674)
>>>>> at
>>>>> java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
J>>>>> at java.nio.file.Files.createDirectories(Files.java:767)
>>>>> at
>>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>>>> at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>> at
>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>
>>>>>
>>>>> Unfortunately the partition listed does not appear to be full or
>>>>> anywhere near full?
>>>>>
>>>>> Is there ! workaround to this?
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Benchao Li
>>>> School of Electronics Engineering and Computer Science, Peking University
>>>> Tel:+86-15650713730
>>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>>
>>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>
Re: job history server
Posted by Richard Moorhead <ri...@gmail.com>.
I did not know that.
I have since wiped the directory. I will post when I see this error again.
On Mon, Feb 17, 2020 at 8:03 PM Benchao Li <li...@gmail.com> wrote:
> `df -H` only gives the sizes, not inodes information. Could you also show
> us the result of `df -iH`?
>
> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午9:40写道:
>
>> Yes, I did. I mentioned it last but I should have been clearer:
>>
>> 22526:~/ $ df -H
>>
>>
>> [18:15:20]
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/mapper/vg00-rootlv00
>> 2.1G 777M 1.2G 41% /
>> tmpfs 2.1G 753M 1.4G 37% /dev/shm
>>
>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:
>>
>>> Hi Richard,
>>>
>>> Have you checked that inodes of the disk partition were full or not?
>>>
>>> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:
>>>
>>>> I see the following exception often:
>>>>
>>>> 2020-02-17 18:13:26,796 ERROR
>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher -
>>>> Failure while fetching/processing job archive for job
>>>> eaf0639027aca1624adaa100bdf1332e.
>>>> java.nio.file.FileSystemException:
>>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>>> No space left on device
>>>> at
>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>>> at
>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>> at
>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>>> at
>>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>>> at java.nio.file.Files.createDirectory(Files.java:674)
>>>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>>> at java.nio.file.Files.createDirectories(Files.java:767)
>>>> at
>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>> at java.lang.Thread.run(Thread.java:748)
>>>>
>>>>
>>>> Unfortunately the partition listed does not appear to be full or
>>>> anywhere near full?
>>>>
>>>> Is there a workaround to this?
>>>>
>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>
Re: job history server
Posted by Benchao Li <li...@gmail.com>.
`df -H` only gives the sizes, not inodes information. Could you also show
us the result of `df -iH`?
Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午9:40写道:
> Yes, I did. I mentioned it last but I should have been clearer:
>
> 22526:~/ $ df -H
>
>
> [18:15:20]
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/vg00-rootlv00
> 2.1G 777M 1.2G 41% /
> tmpfs 2.1G 753M 1.4G 37% /dev/shm
>
> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:
>
>> Hi Richard,
>>
>> Have you checked that inodes of the disk partition were full or not?
>>
>> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:
>>
>>> I see the following exception often:
>>>
>>> 2020-02-17 18:13:26,796 ERROR
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher -
>>> Failure while fetching/processing job archive for job
>>> eaf0639027aca1624adaa100bdf1332e.
>>> java.nio.file.FileSystemException:
>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>> No space left on device
>>> at
>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>> at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>> at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>> at
>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>> at java.nio.file.Files.createDirectory(Files.java:674)
>>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>> at java.nio.file.Files.createDirectories(Files.java:767)
>>> at
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>> Unfortunately the partition listed does not appear to be full or
>>> anywhere near full?
>>>
>>> Is there a workaround to this?
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>>
>>
--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn
Re: job history server
Posted by Richard Moorhead <ri...@gmail.com>.
Yes, I did. I mentioned it last but I should have been clearer:
22526:~/ $ df -H
[18:15:20]
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
2.1G 777M 1.2G 41% /
tmpfs 2.1G 753M 1.4G 37% /dev/shm
On Mon, Feb 17, 2020 at 7:13 PM Benchao Li <li...@gmail.com> wrote:
> Hi Richard,
>
> Have you checked that inodes of the disk partition were full or not?
>
> Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:
>
>> I see the following exception often:
>>
>> 2020-02-17 18:13:26,796 ERROR
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher -
>> Failure while fetching/processing job archive for job
>> eaf0639027aca1624adaa100bdf1332e.
>> java.nio.file.FileSystemException:
>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>> No space left on device
>> at
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>> at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>> at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>> at
>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>> at java.nio.file.Files.createDirectory(Files.java:674)
>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>> at java.nio.file.Files.createDirectories(Files.java:767)
>> at
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Unfortunately the partition listed does not appear to be full or anywhere
>> near full?
>>
>> Is there a workaround to this?
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn
>
>
Re: job history server
Posted by Benchao Li <li...@gmail.com>.
Hi Richard,
Have you checked that inodes of the disk partition were full or not?
Richard Moorhead <ri...@gmail.com> 于2020年2月18日周二 上午8:16写道:
> I see the following exception often:
>
> 2020-02-17 18:13:26,796 ERROR
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher -
> Failure while fetching/processing job archive for job
> eaf0639027aca1624adaa100bdf1332e.
> java.nio.file.FileSystemException:
> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
> No space left on device
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
> at java.nio.file.Files.createDirectory(Files.java:674)
> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
> at java.nio.file.Files.createDirectories(Files.java:767)
> at
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Unfortunately the partition listed does not appear to be full or anywhere
> near full?
>
> Is there a workaround to this?
>
>
--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenchao@gmail.com; libenchao@pku.edu.cn