You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Biswa Singh (Jira)" <ji...@apache.org> on 2021/10/26 21:22:00 UTC

[jira] [Updated] (SPARK-37122) java.lang.IllegalArgumentException Related to Prometheus

     [ https://issues.apache.org/jira/browse/SPARK-37122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Biswa Singh updated SPARK-37122:
--------------------------------
    Description: 
This issue is similar to https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. We receive the Following warning:

 

21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - Exception in connection from /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)

 

Below are other details related to prometheus. Please scroll down to find out details of the issue:

 
{noformat}
Prometheus Scrape Configuration
===============================
- job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2

tcptrack command output in spark3 pod
======================================
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s

10.198.22.240 = prometheus pod 

ip10.198.40.143 = testpod ip 

Issue
======
Though the scrape config is expected to scrape on port 8090. I see prometheus tries to initiate scrape on ports like 7079, 7078, 4040, etc on
the spark3 pod and hence the exception in spark3 pod. But is this really a prometheus issue or something at spark side? We don't see any such exception in any of the other pods. All our pods including spark3 are annotated with:

annotations:
   prometheus.io/port: "8090"
   prometheus.io/scrape: "true"

We get the metrics and everything fine just extra warning for this exception.{noformat}
 

  was:
This issue is similar to https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. We receive the Following warning:

 

 

21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - Exception in connection from /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)

 

Below are other details related to prometheus.

 
{noformat}

Prometheus Scrape Configuration
===============================
- job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2

tcptrack command output in spark3 pod
======================================
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s

10.198.22.240 = prometheus pod 

ip10.198.40.143 = testpod ip 

Issue
======
Though the scrape config is expected to scrape on port 8090. I see prometheus tries to initiate scrape on ports like 7079, 7078, 4040, etc on
the spark3 pod and hence the exception in spark3 pod. But is this really a prometheus issue or something at spark side? We don't see any such exception in any of the other pods. All our pods including spark3 are annotated with:

annotations:
   prometheus.io/port: "8090"
   prometheus.io/scrape: "true"

We get the metrics and everything fine just extra warning for this exception.{noformat}
 


> java.lang.IllegalArgumentException Related to Prometheus
> --------------------------------------------------------
>
>                 Key: SPARK-37122
>                 URL: https://issues.apache.org/jira/browse/SPARK-37122
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.2, 3.1.1
>            Reporter: Biswa Singh
>            Priority: Critical
>
> This issue is similar to https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723. We receive the Following warning:
>  
> 21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - Exception in connection from /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)
>  
> Below are other details related to prometheus. Please scroll down to find out details of the issue:
>  
> {noformat}
> Prometheus Scrape Configuration
> ===============================
> - job_name: 'kubernetes-pods'
>       kubernetes_sd_configs:
>         - role: pod
>       relabel_configs:
>         - action: labelmap
>           regex: __meta_kubernetes_pod_label_(.+)
>         - source_labels: [__meta_kubernetes_namespace]
>           action: replace
>           target_label: kubernetes_namespace
>         - source_labels: [__meta_kubernetes_pod_name]
>           action: replace
>           target_label: kubernetes_pod_name
>         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
>           action: keep
>           regex: true
>         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
>           action: replace
>           target_label: __scheme__
>           regex: (https?)
>         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
>           action: replace
>           target_label: __metrics_path__
>           regex: (.+)
>         - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
>           action: replace
>           target_label: __address__
>           regex: ([^:]+)(?::\d+)?;(\d+)
>           replacement: $1:$2
> tcptrack command output in spark3 pod
> ======================================
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
> 10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
> 10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s
> 10.198.22.240 = prometheus pod 
> ip10.198.40.143 = testpod ip 
> Issue
> ======
> Though the scrape config is expected to scrape on port 8090. I see prometheus tries to initiate scrape on ports like 7079, 7078, 4040, etc on
> the spark3 pod and hence the exception in spark3 pod. But is this really a prometheus issue or something at spark side? We don't see any such exception in any of the other pods. All our pods including spark3 are annotated with:
> annotations:
>    prometheus.io/port: "8090"
>    prometheus.io/scrape: "true"
> We get the metrics and everything fine just extra warning for this exception.{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org