You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Weinwurm (Jira)" <ji...@apache.org> on 2022/07/29 23:27:00 UTC

[jira] [Updated] (FLINK-28747) "target_id can not be missing" in HTTP statefun request

     [ https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephan Weinwurm updated FLINK-28747:
-------------------------------------
    Description: 
Hi all,

We've suddenly started to see the following exception in our HTTP statefun functions endpoints:

{code}Traceback (most recent call last):
  File "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 37, in __call__
    await span_processor.execute()
  File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 61, in execute
    raise e
  File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 57, in execute
    await self.app(self.scope, self.receive, self.send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
    raise exc
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
    await self.app(scope, receive, sender)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 680, in __call__
    await route.handle(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
    await self.app(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", line 25, in statefun_handler
    result = await handler.handle_async(request_body)
  File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", line 262, in handle_async
    msg = Message(target_typename=sdk_address.typename, target_id=sdk_address.id,
  File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line 42, in __init__
    raise ValueError("target_id can not be missing"){code}

Interestingly, this has started to happen in three separate Flink deployments at the very same time. The only thing in common between the three deployments is that they consume the same Kafka topics.

No deployments have happened when the issue started happening which was on July 28th 3:05PM. We have since been continuously seeing the error.

We were also able to extract the request that Flink sends to the HTTP statefun endpoint:



{code}{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': 'dummy'}, 'invocations': [{'argument': {'typename': 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': '-redicated-'}}]}}
{code}

As you can see, no `id` field is present in the `invocation.target` object or the `target_id` was an empty string.

 

This is our module.yaml from one of the Flink deployments:

 
{code}
version: "3.0"
module:
meta:
type: remote
spec:
endpoints:
 - endpoint:
meta:
kind: io.statefun.endpoints.v1/http
spec:
functions: com.x.dummy/dummy
urlPathTemplate: [http://x-worker-dummy.x-functions:9090/statefun]
timeouts:
call: 2 min
read: 2 min
write: 2 min
maxNumBatchRequests: 100
ingresses:
 - ingress:
meta:
type: io.statefun.kafka/ingress
id: com.x/ingress
spec:
address: x-kafka-0.x.ue1.x.net:9092
consumerGroupId: x-worker-dummy
topics:
 - topic: v2_post_events
valueType: type.googleapis.com/v2_event.Event
targets:
 - com.x.dummy/dummy
startupPosition:
type: group-offsets
autoOffsetResetPosition: earliest
{code}

 

Can you please help us investigate as this is critically impacting our prod setup?

  was:
Hi all,

We've suddenly started to see the following exception in our HTTP statefun functions endpoints:

```

Traceback (most recent call last):
  File "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 37, in __call__
    await span_processor.execute()
  File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 61, in execute
    raise e
  File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 57, in execute
    await self.app(self.scope, self.receive, self.send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
    raise exc
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
    await self.app(scope, receive, sender)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 680, in __call__
    await route.handle(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
    await self.app(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", line 25, in statefun_handler
    result = await handler.handle_async(request_body)
  File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", line 262, in handle_async
    msg = Message(target_typename=sdk_address.typename, target_id=sdk_address.id,
  File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line 42, in __init__
    raise ValueError("target_id can not be missing")

```

Interestingly, this has started to happen in three separate Flink deployments at the very same time. The only thing in common between the three deployments is that they consume the same Kafka topics.

No deployments have happened when the issue started happening which was on July 28th 3:05PM. We have since been continuously seeing the error.

We were also able to extract the request that Flink sends to the HTTP statefun endpoint:


```
{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': 'dummy'}, 'invocations': [{'argument': {'typename': 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': '-redicated-'}}]}}
```

As you can see, no `id` field is present in the `invocation.target` object or the `target_id` was an empty string.

 

This is our module.yaml from one of the Flink deployments:

 
```
version: "3.0"
module:
meta:
type: remote
spec:
endpoints:
 - endpoint:
meta:
kind: io.statefun.endpoints.v1/http
spec:
functions: com.x.dummy/dummy
urlPathTemplate: [http://x-worker-dummy.x-functions:9090/statefun]
timeouts:
call: 2 min
read: 2 min
write: 2 min
maxNumBatchRequests: 100
ingresses:
 - ingress:
meta:
type: io.statefun.kafka/ingress
id: com.x/ingress
spec:
address: x-kafka-0.x.ue1.x.net:9092
consumerGroupId: x-worker-dummy
topics:
 - topic: v2_post_events
valueType: type.googleapis.com/v2_event.Event
targets:
 - com.x.dummy/dummy
startupPosition:
type: group-offsets
autoOffsetResetPosition: earliest
```

 

Can you please help us investigate as this is critically impacting our prod setup?


> "target_id can not be missing" in HTTP statefun request
> -------------------------------------------------------
>
>                 Key: FLINK-28747
>                 URL: https://issues.apache.org/jira/browse/FLINK-28747
>             Project: Flink
>          Issue Type: Bug
>          Components: Stateful Functions
>    Affects Versions: statefun-3.2.0
>            Reporter: Stephan Weinwurm
>            Priority: Major
>
> Hi all,
> We've suddenly started to see the following exception in our HTTP statefun functions endpoints:
> {code}Traceback (most recent call last):
>   File "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
>     result = await app(self.scope, self.receive, self.send)
>   File "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
>     return await self.app(scope, receive, send)
>   File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 37, in __call__
>     await span_processor.execute()
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 61, in execute
>     raise e
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 57, in execute
>     await self.app(self.scope, self.receive, self.send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
>     await self.middleware_stack(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
>     raise exc
>   File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
>     await self.app(scope, receive, _send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
>     raise exc
>   File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
>     await self.app(scope, receive, sender)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 680, in __call__
>     await route.handle(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
>     await self.app(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
>     response = await func(request)
>   File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", line 25, in statefun_handler
>     result = await handler.handle_async(request_body)
>   File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", line 262, in handle_async
>     msg = Message(target_typename=sdk_address.typename, target_id=sdk_address.id,
>   File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line 42, in __init__
>     raise ValueError("target_id can not be missing"){code}
> Interestingly, this has started to happen in three separate Flink deployments at the very same time. The only thing in common between the three deployments is that they consume the same Kafka topics.
> No deployments have happened when the issue started happening which was on July 28th 3:05PM. We have since been continuously seeing the error.
> We were also able to extract the request that Flink sends to the HTTP statefun endpoint:
> {code}{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': 'dummy'}, 'invocations': [{'argument': {'typename': 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': '-redicated-'}}]}}
> {code}
> As you can see, no `id` field is present in the `invocation.target` object or the `target_id` was an empty string.
>  
> This is our module.yaml from one of the Flink deployments:
>  
> {code}
> version: "3.0"
> module:
> meta:
> type: remote
> spec:
> endpoints:
>  - endpoint:
> meta:
> kind: io.statefun.endpoints.v1/http
> spec:
> functions: com.x.dummy/dummy
> urlPathTemplate: [http://x-worker-dummy.x-functions:9090/statefun]
> timeouts:
> call: 2 min
> read: 2 min
> write: 2 min
> maxNumBatchRequests: 100
> ingresses:
>  - ingress:
> meta:
> type: io.statefun.kafka/ingress
> id: com.x/ingress
> spec:
> address: x-kafka-0.x.ue1.x.net:9092
> consumerGroupId: x-worker-dummy
> topics:
>  - topic: v2_post_events
> valueType: type.googleapis.com/v2_event.Event
> targets:
>  - com.x.dummy/dummy
> startupPosition:
> type: group-offsets
> autoOffsetResetPosition: earliest
> {code}
>  
> Can you please help us investigate as this is critically impacting our prod setup?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)