You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Weinwurm (Jira)" <ji...@apache.org> on 2022/07/29 23:25:00 UTC

[jira] [Created] (FLINK-28747) "target_id can not be missing" in HTTP statefun request

Stephan Weinwurm created FLINK-28747:
----------------------------------------

             Summary: "target_id can not be missing" in HTTP statefun request
                 Key: FLINK-28747
                 URL: https://issues.apache.org/jira/browse/FLINK-28747
             Project: Flink
          Issue Type: Bug
          Components: Stateful Functions
    Affects Versions: statefun-3.2.0
            Reporter: Stephan Weinwurm


Hi all,

We've suddenly started to see the following exception in our HTTP statefun functions endpoints:

```

Traceback (most recent call last):
  File "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 37, in __call__
    await span_processor.execute()
  File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 61, in execute
    raise e
  File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 57, in execute
    await self.app(self.scope, self.receive, self.send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
    raise exc
  File "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
    await self.app(scope, receive, sender)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 680, in __call__
    await route.handle(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
    await self.app(scope, receive, send)
  File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", line 25, in statefun_handler
    result = await handler.handle_async(request_body)
  File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", line 262, in handle_async
    msg = Message(target_typename=sdk_address.typename, target_id=sdk_address.id,
  File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line 42, in __init__
    raise ValueError("target_id can not be missing")

```

Interestingly, this has started to happen in three separate Flink deployments at the very same time. The only thing in common between the three deployments is that they consume the same Kafka topics.

No deployments have happened when the issue started happening which was on July 28th 3:05PM. We have since been continuously seeing the error.

We were also able to extract the request that Flink sends to the HTTP statefun endpoint:


```
{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': 'dummy'}, 'invocations': [{'argument': {'typename': 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': '-redicated-'}}]}}
```

As you can see, no `id` field is present in the `invocation.target` object or the `target_id` was an empty string.

 

This is our module.yaml from one of the Flink deployments:

 
```
version: "3.0"
module:
meta:
type: remote
spec:
endpoints:
 - endpoint:
meta:
kind: io.statefun.endpoints.v1/http
spec:
functions: com.x.dummy/dummy
urlPathTemplate: [http://x-worker-dummy.x-functions:9090/statefun]
timeouts:
call: 2 min
read: 2 min
write: 2 min
maxNumBatchRequests: 100
ingresses:
 - ingress:
meta:
type: io.statefun.kafka/ingress
id: com.x/ingress
spec:
address: x-kafka-0.x.ue1.x.net:9092
consumerGroupId: x-worker-dummy
topics:
 - topic: v2_post_events
valueType: type.googleapis.com/v2_event.Event
targets:
 - com.x.dummy/dummy
startupPosition:
type: group-offsets
autoOffsetResetPosition: earliest
```

 

Can you please help us investigate as this is critically impacting our prod setup?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)