You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Biswa Singh (Jira)" <ji...@apache.org> on 2021/10/26 21:22:00 UTC
[jira] [Updated] (SPARK-37122) java.lang.IllegalArgumentException
Related to Prometheus
[ https://issues.apache.org/jira/browse/SPARK-37122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Biswa Singh updated SPARK-37122:
--------------------------------
Description:
This issue is similar to https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. We receive the Following warning:
21:00:26.277 [rpc-server-4-2] WARN o.a.s.n.s.TransportChannelHandler - Exception in connection from /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)
Below are other details related to prometheus. Please scroll down to find out details of the issue:
{noformat}
Prometheus Scrape Configuration
===============================
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
tcptrack command output in spark3 pod
======================================
10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s
10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s
10.198.22.240:50354 10.198.40.143:7079 CLOSED 40s 0 B/s
10.198.22.240:33152 10.198.40.143:4040 ESTABLISHED 2s 0 B/s
10.198.22.240:47726 10.198.40.143:8090 ESTABLISHED 9s 0 B/s
10.198.22.240 = prometheus pod
ip10.198.40.143 = testpod ip
Issue
======
Though the scrape config is expected to scrape on port 8090. I see prometheus tries to initiate scrape on ports like 7079, 7078, 4040, etc on
the spark3 pod and hence the exception in spark3 pod. But is this really a prometheus issue or something at spark side? We don't see any such exception in any of the other pods. All our pods including spark3 are annotated with:
annotations:
prometheus.io/port: "8090"
prometheus.io/scrape: "true"
We get the metrics and everything fine just extra warning for this exception.{noformat}
was:
This issue is similar to https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. We receive the Following warning:
21:00:26.277 [rpc-server-4-2] WARN o.a.s.n.s.TransportChannelHandler - Exception in connection from /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)
Below are other details related to prometheus.
{noformat}
Prometheus Scrape Configuration
===============================
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
tcptrack command output in spark3 pod
======================================
10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s
10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s
10.198.22.240:50354 10.198.40.143:7079 CLOSED 40s 0 B/s
10.198.22.240:33152 10.198.40.143:4040 ESTABLISHED 2s 0 B/s
10.198.22.240:47726 10.198.40.143:8090 ESTABLISHED 9s 0 B/s
10.198.22.240 = prometheus pod
ip10.198.40.143 = testpod ip
Issue
======
Though the scrape config is expected to scrape on port 8090. I see prometheus tries to initiate scrape on ports like 7079, 7078, 4040, etc on
the spark3 pod and hence the exception in spark3 pod. But is this really a prometheus issue or something at spark side? We don't see any such exception in any of the other pods. All our pods including spark3 are annotated with:
annotations:
prometheus.io/port: "8090"
prometheus.io/scrape: "true"
We get the metrics and everything fine just extra warning for this exception.{noformat}
> java.lang.IllegalArgumentException Related to Prometheus
> --------------------------------------------------------
>
> Key: SPARK-37122
> URL: https://issues.apache.org/jira/browse/SPARK-37122
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 3.0.2, 3.1.1
> Reporter: Biswa Singh
> Priority: Critical
>
> This issue is similar to https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. We receive the Following warning:
>
> 21:00:26.277 [rpc-server-4-2] WARN o.a.s.n.s.TransportChannelHandler - Exception in connection from /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)
>
> Below are other details related to prometheus. Please scroll down to find out details of the issue:
>
> {noformat}
> Prometheus Scrape Configuration
> ===============================
> - job_name: 'kubernetes-pods'
> kubernetes_sd_configs:
> - role: pod
> relabel_configs:
> - action: labelmap
> regex: __meta_kubernetes_pod_label_(.+)
> - source_labels: [__meta_kubernetes_namespace]
> action: replace
> target_label: kubernetes_namespace
> - source_labels: [__meta_kubernetes_pod_name]
> action: replace
> target_label: kubernetes_pod_name
> - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
> action: keep
> regex: true
> - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
> action: replace
> target_label: __scheme__
> regex: (https?)
> - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
> action: replace
> target_label: __metrics_path__
> regex: (.+)
> - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
> action: replace
> target_label: __address__
> regex: ([^:]+)(?::\d+)?;(\d+)
> replacement: $1:$2
> tcptrack command output in spark3 pod
> ======================================
> 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s
> 10.198.22.240:51258 10.198.40.143:7079 CLOSED 10s 0 B/s
> 10.198.22.240:50354 10.198.40.143:7079 CLOSED 40s 0 B/s
> 10.198.22.240:33152 10.198.40.143:4040 ESTABLISHED 2s 0 B/s
> 10.198.22.240:47726 10.198.40.143:8090 ESTABLISHED 9s 0 B/s
> 10.198.22.240 = prometheus pod
> ip10.198.40.143 = testpod ip
> Issue
> ======
> Though the scrape config is expected to scrape on port 8090. I see prometheus tries to initiate scrape on ports like 7079, 7078, 4040, etc on
> the spark3 pod and hence the exception in spark3 pod. But is this really a prometheus issue or something at spark side? We don't see any such exception in any of the other pods. All our pods including spark3 are annotated with:
> annotations:
> prometheus.io/port: "8090"
> prometheus.io/scrape: "true"
> We get the metrics and everything fine just extra warning for this exception.{noformat}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org