You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Rex Fenley <Re...@remind101.com> on 2020/09/09 01:32:49 UTC

Difficulties with Minio state storage

Hello!

I'm trying to test out Minio as state storage backend using docker-compose
on my local machine but keep running into errors that seem strange to me.
Any help would be much appreciated :)

The problem:
With the following environment:

environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: flink-jobmanager
parallelism.default: 2
s3.access-key: <key>
s3.secret-key: <key>
s3.path.style.access: true

And the following State Backend (with flink-jdbc-test_graph-minio_1 being
the container serving minio):

val bsEnv = StreamExecutionEnvironment.getExecutionEnvironment
bsEnv.setStateBackend(
new RocksDBStateBackend(
"s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000",
true
)
)

And submitting the flink job and saving from another docker container like
so:

flink run -m flink-jdbc-test_flink-jobmanager_1:8081 -c <Job Class Name>
<built code>.jar

flink savepoint -m flink-jdbc-test_flink-jobmanager_1:8081 <Job ID>
s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints

I end up with the following error:

Caused by:
com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service:
Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID:
A7E3BB7EEFB524FD; S3 Extended Request ID:
cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=),
S3 Extended Request ID:
cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=
(Path:
s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints/savepoint-5c4090-5f90e0cdc603/_metadata)
at
com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573)
at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
at
com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560)
at
com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
at
com.facebook.presto.hive.s3.PrestoS3FileSystem.create(PrestoS3FileSystem.java:356)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at
org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
at
org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
at
org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:169)
at
org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:65)
at
org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109)
at
org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:306)
... 10 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request
(Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request
ID: A7E3BB7EEFB524FD; S3 Extended Request ID:
cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)

If I add to the environment to include:
...
s3.endpoint: s3://flink-jdbc-test_graph-minio_1:9000
...

Then I end up with the following error just trying to submit the job:
Caused by: java.lang.IllegalArgumentException: Endpoint does not contain a
valid host name: s3://flink-jdbc-test_graph-minio_1:9000
at
com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
at
com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)

Changing s3: to http: like so:
s3.endpoint: http://flink-jdbc-test_graph-minio_1:9000

Then I receive the same error as before when trying to submit the job:
Caused by: java.lang.IllegalArgumentException: Endpoint does not contain a
valid host name: http://flink-jdbc-test_graph-minio_1:9000
at
com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
at
com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)

However, I can access the minio container via the minio client from docker
just fine:
./mc alias set minio http://flink-jdbc-test_graph-minio_1:9000 key key
--api S3v4
But there are no buckets, presumably because saving always fails:
./mc ls minio
<nothing returned>

Does anyone know how to resolve this issue?

Thanks!

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Difficulties with Minio state storage

Posted by Rex Fenley <Re...@remind101.com>.
Good news!

Eliminating
bsEnv.setStateBackend(
  new RocksDBStateBackend(
    "s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000",
    true
  )
)
moving all configuration into FLINK_PROPERTIES and switching to http seemed
to do the trick!

Thanks for all the help!



On Wed, Sep 9, 2020 at 9:45 AM Rex Fenley <Re...@remind101.com> wrote:

> Thanks yall,
>
> Yangze,
> > I've tried to use MinIO as state backend and everything seems works well
> For clarity, I'm using RocksDB state backend with Minio as state storage.
> > s3.endpoint: http://localhost:9000
> Also, I'm doing everything from docker-compose so localhost isn't going to
> work in my case.
>
>
> Arvid,
> > You definitely need to use an http endpoint.
> I always receive errors like the following when I use http:
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
> find a file system implementation for scheme 'http'. The scheme is not
> directly supported by Flink and no Hadoop file system to support this
> scheme could be loaded. For a full list of supported file systems, please
> see
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
> Whereas s3:// gives me Bad Request errors instead
>
> Thanks
>
>
> On Wed, Sep 9, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>
>> Hi Rex,
>>
>> you could also check the end to end tests that use minio in flink's repo.
>> You definitely need to use an http endpoint.
>>
>> The setup [1] uses also another way to specify the s3.path.style.access
>> (with dashes). I think we needed it especially for presto. It seems like
>> the settings differ a bit across the implementations, so give it a try. It
>> might also be something that we should translate.
>> For reference, the actual test using presto can be found here [2].
>>
>> [1]
>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_s3_minio.sh#L115
>> [2]
>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/test_batch_wordcount.sh#L64
>>
>> On Wed, Sep 9, 2020 at 4:17 AM Yangze Guo <ka...@gmail.com> wrote:
>>
>>> Hi, Rex,
>>>
>>> I've tried to use MinIO as state backend and everything seems works well.
>>> Just sharing my configuration:
>>> ```
>>> s3.access-key:
>>> s3.secret-key:
>>> s3.endpoint: http://localhost:9000
>>> s3.path.style.access: true
>>> state.checkpoints.dir: s3://flink/checkpoints
>>> ```
>>>
>>> I think the problem might be caused by the following reasons:
>>> - The MinIO is not well configured.
>>> - Maybe you need to create a bucket for it first. In my case, I create
>>> a bucket called "flink" first.
>>>
>>> Best,
>>> Yangze Guo
>>>
>>> On Wed, Sep 9, 2020 at 9:33 AM Rex Fenley <Re...@remind101.com> wrote:
>>> >
>>> > Hello!
>>> >
>>> > I'm trying to test out Minio as state storage backend using
>>> docker-compose on my local machine but keep running into errors that seem
>>> strange to me. Any help would be much appreciated :)
>>> >
>>> > The problem:
>>> > With the following environment:
>>> >
>>> > environment:
>>> > - |
>>> > FLINK_PROPERTIES=
>>> > jobmanager.rpc.address: flink-jobmanager
>>> > parallelism.default: 2
>>> > s3.access-key: <key>
>>> > s3.secret-key: <key>
>>> > s3.path.style.access: true
>>> >
>>> > And the following State Backend (with flink-jdbc-test_graph-minio_1
>>> being the container serving minio):
>>> >
>>> > val bsEnv = StreamExecutionEnvironment.getExecutionEnvironment
>>> > bsEnv.setStateBackend(
>>> > new RocksDBStateBackend(
>>> > "s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000",
>>> > true
>>> > )
>>> > )
>>> >
>>> > And submitting the flink job and saving from another docker container
>>> like so:
>>> >
>>> > flink run -m flink-jdbc-test_flink-jobmanager_1:8081 -c <Job Class
>>> Name> <built code>.jar
>>> >
>>> > flink savepoint -m flink-jdbc-test_flink-jobmanager_1:8081 <Job ID>
>>> s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints
>>> >
>>> > I end up with the following error:
>>> >
>>> > Caused by:
>>> com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
>>> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service:
>>> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID:
>>> A7E3BB7EEFB524FD; S3 Extended Request ID:
>>> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=),
>>> S3 Extended Request ID:
>>> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=
>>> (Path:
>>> s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints/savepoint-5c4090-5f90e0cdc603/_metadata)
>>> > at
>>> com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573)
>>> > at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
>>> > at
>>> com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560)
>>> > at
>>> com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311)
>>> > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
>>> > at
>>> com.facebook.presto.hive.s3.PrestoS3FileSystem.create(PrestoS3FileSystem.java:356)
>>> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>>> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>>> > at
>>> org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
>>> > at
>>> org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
>>> > at
>>> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:169)
>>> > at
>>> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:65)
>>> > at
>>> org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109)
>>> > at
>>> org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:306)
>>> > ... 10 more
>>> > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad
>>> Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request;
>>> Request ID: A7E3BB7EEFB524FD; S3 Extended Request ID:
>>> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=)
>>> > at
>>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)
>>> >
>>> > If I add to the environment to include:
>>> > ...
>>> > s3.endpoint: s3://flink-jdbc-test_graph-minio_1:9000
>>> > ...
>>> >
>>> > Then I end up with the following error just trying to submit the job:
>>> > Caused by: java.lang.IllegalArgumentException: Endpoint does not
>>> contain a valid host name: s3://flink-jdbc-test_graph-minio_1:9000
>>> > at
>>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
>>> > at
>>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
>>> >
>>> > Changing s3: to http: like so:
>>> > s3.endpoint: http://flink-jdbc-test_graph-minio_1:9000
>>> >
>>> > Then I receive the same error as before when trying to submit the job:
>>> > Caused by: java.lang.IllegalArgumentException: Endpoint does not
>>> contain a valid host name: http://flink-jdbc-test_graph-minio_1:9000
>>> > at
>>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
>>> > at
>>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
>>> >
>>> > However, I can access the minio container via the minio client from
>>> docker just fine:
>>> > ./mc alias set minio http://flink-jdbc-test_graph-minio_1:9000 key
>>> key --api S3v4
>>> > But there are no buckets, presumably because saving always fails:
>>> > ./mc ls minio
>>> > <nothing returned>
>>> >
>>> > Does anyone know how to resolve this issue?
>>> >
>>> > Thanks!
>>> >
>>> > --
>>> >
>>> > Rex Fenley  |  Software Engineer - Mobile and Backend
>>> >
>>> >
>>> > Remind.com |  BLOG  |  FOLLOW US  |  LIKE US
>>>
>>
>>
>> --
>>
>> Arvid Heise | Senior Java Developer
>>
>> <https://www.ververica.com/>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Ververica GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>> (Toni) Cheng
>>
>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Difficulties with Minio state storage

Posted by Rex Fenley <Re...@remind101.com>.
Thanks yall,

Yangze,
> I've tried to use MinIO as state backend and everything seems works well
For clarity, I'm using RocksDB state backend with Minio as state storage.
> s3.endpoint: http://localhost:9000
Also, I'm doing everything from docker-compose so localhost isn't going to
work in my case.


Arvid,
> You definitely need to use an http endpoint.
I always receive errors like the following when I use http:
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
find a file system implementation for scheme 'http'. The scheme is not
directly supported by Flink and no Hadoop file system to support this
scheme could be loaded. For a full list of supported file systems, please
see https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
Whereas s3:// gives me Bad Request errors instead

Thanks


On Wed, Sep 9, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:

> Hi Rex,
>
> you could also check the end to end tests that use minio in flink's repo.
> You definitely need to use an http endpoint.
>
> The setup [1] uses also another way to specify the s3.path.style.access
> (with dashes). I think we needed it especially for presto. It seems like
> the settings differ a bit across the implementations, so give it a try. It
> might also be something that we should translate.
> For reference, the actual test using presto can be found here [2].
>
> [1]
> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_s3_minio.sh#L115
> [2]
> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/test_batch_wordcount.sh#L64
>
> On Wed, Sep 9, 2020 at 4:17 AM Yangze Guo <ka...@gmail.com> wrote:
>
>> Hi, Rex,
>>
>> I've tried to use MinIO as state backend and everything seems works well.
>> Just sharing my configuration:
>> ```
>> s3.access-key:
>> s3.secret-key:
>> s3.endpoint: http://localhost:9000
>> s3.path.style.access: true
>> state.checkpoints.dir: s3://flink/checkpoints
>> ```
>>
>> I think the problem might be caused by the following reasons:
>> - The MinIO is not well configured.
>> - Maybe you need to create a bucket for it first. In my case, I create
>> a bucket called "flink" first.
>>
>> Best,
>> Yangze Guo
>>
>> On Wed, Sep 9, 2020 at 9:33 AM Rex Fenley <Re...@remind101.com> wrote:
>> >
>> > Hello!
>> >
>> > I'm trying to test out Minio as state storage backend using
>> docker-compose on my local machine but keep running into errors that seem
>> strange to me. Any help would be much appreciated :)
>> >
>> > The problem:
>> > With the following environment:
>> >
>> > environment:
>> > - |
>> > FLINK_PROPERTIES=
>> > jobmanager.rpc.address: flink-jobmanager
>> > parallelism.default: 2
>> > s3.access-key: <key>
>> > s3.secret-key: <key>
>> > s3.path.style.access: true
>> >
>> > And the following State Backend (with flink-jdbc-test_graph-minio_1
>> being the container serving minio):
>> >
>> > val bsEnv = StreamExecutionEnvironment.getExecutionEnvironment
>> > bsEnv.setStateBackend(
>> > new RocksDBStateBackend(
>> > "s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000",
>> > true
>> > )
>> > )
>> >
>> > And submitting the flink job and saving from another docker container
>> like so:
>> >
>> > flink run -m flink-jdbc-test_flink-jobmanager_1:8081 -c <Job Class
>> Name> <built code>.jar
>> >
>> > flink savepoint -m flink-jdbc-test_flink-jobmanager_1:8081 <Job ID>
>> s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints
>> >
>> > I end up with the following error:
>> >
>> > Caused by:
>> com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
>> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service:
>> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID:
>> A7E3BB7EEFB524FD; S3 Extended Request ID:
>> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=),
>> S3 Extended Request ID:
>> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=
>> (Path:
>> s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints/savepoint-5c4090-5f90e0cdc603/_metadata)
>> > at
>> com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573)
>> > at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
>> > at
>> com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560)
>> > at
>> com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311)
>> > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
>> > at
>> com.facebook.presto.hive.s3.PrestoS3FileSystem.create(PrestoS3FileSystem.java:356)
>> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>> > at
>> org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
>> > at
>> org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
>> > at
>> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:169)
>> > at
>> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:65)
>> > at
>> org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109)
>> > at
>> org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:306)
>> > ... 10 more
>> > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad
>> Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request;
>> Request ID: A7E3BB7EEFB524FD; S3 Extended Request ID:
>> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=)
>> > at
>> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)
>> >
>> > If I add to the environment to include:
>> > ...
>> > s3.endpoint: s3://flink-jdbc-test_graph-minio_1:9000
>> > ...
>> >
>> > Then I end up with the following error just trying to submit the job:
>> > Caused by: java.lang.IllegalArgumentException: Endpoint does not
>> contain a valid host name: s3://flink-jdbc-test_graph-minio_1:9000
>> > at
>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
>> > at
>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
>> >
>> > Changing s3: to http: like so:
>> > s3.endpoint: http://flink-jdbc-test_graph-minio_1:9000
>> >
>> > Then I receive the same error as before when trying to submit the job:
>> > Caused by: java.lang.IllegalArgumentException: Endpoint does not
>> contain a valid host name: http://flink-jdbc-test_graph-minio_1:9000
>> > at
>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
>> > at
>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
>> >
>> > However, I can access the minio container via the minio client from
>> docker just fine:
>> > ./mc alias set minio http://flink-jdbc-test_graph-minio_1:9000 key key
>> --api S3v4
>> > But there are no buckets, presumably because saving always fails:
>> > ./mc ls minio
>> > <nothing returned>
>> >
>> > Does anyone know how to resolve this issue?
>> >
>> > Thanks!
>> >
>> > --
>> >
>> > Rex Fenley  |  Software Engineer - Mobile and Backend
>> >
>> >
>> > Remind.com |  BLOG  |  FOLLOW US  |  LIKE US
>>
>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>


-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Difficulties with Minio state storage

Posted by Arvid Heise <ar...@ververica.com>.
Hi Rex,

you could also check the end to end tests that use minio in flink's repo.
You definitely need to use an http endpoint.

The setup [1] uses also another way to specify the s3.path.style.access
(with dashes). I think we needed it especially for presto. It seems like
the settings differ a bit across the implementations, so give it a try. It
might also be something that we should translate.
For reference, the actual test using presto can be found here [2].

[1]
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_s3_minio.sh#L115
[2]
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/test_batch_wordcount.sh#L64

On Wed, Sep 9, 2020 at 4:17 AM Yangze Guo <ka...@gmail.com> wrote:

> Hi, Rex,
>
> I've tried to use MinIO as state backend and everything seems works well.
> Just sharing my configuration:
> ```
> s3.access-key:
> s3.secret-key:
> s3.endpoint: http://localhost:9000
> s3.path.style.access: true
> state.checkpoints.dir: s3://flink/checkpoints
> ```
>
> I think the problem might be caused by the following reasons:
> - The MinIO is not well configured.
> - Maybe you need to create a bucket for it first. In my case, I create
> a bucket called "flink" first.
>
> Best,
> Yangze Guo
>
> On Wed, Sep 9, 2020 at 9:33 AM Rex Fenley <Re...@remind101.com> wrote:
> >
> > Hello!
> >
> > I'm trying to test out Minio as state storage backend using
> docker-compose on my local machine but keep running into errors that seem
> strange to me. Any help would be much appreciated :)
> >
> > The problem:
> > With the following environment:
> >
> > environment:
> > - |
> > FLINK_PROPERTIES=
> > jobmanager.rpc.address: flink-jobmanager
> > parallelism.default: 2
> > s3.access-key: <key>
> > s3.secret-key: <key>
> > s3.path.style.access: true
> >
> > And the following State Backend (with flink-jdbc-test_graph-minio_1
> being the container serving minio):
> >
> > val bsEnv = StreamExecutionEnvironment.getExecutionEnvironment
> > bsEnv.setStateBackend(
> > new RocksDBStateBackend(
> > "s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000",
> > true
> > )
> > )
> >
> > And submitting the flink job and saving from another docker container
> like so:
> >
> > flink run -m flink-jdbc-test_flink-jobmanager_1:8081 -c <Job Class Name>
> <built code>.jar
> >
> > flink savepoint -m flink-jdbc-test_flink-jobmanager_1:8081 <Job ID>
> s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints
> >
> > I end up with the following error:
> >
> > Caused by:
> com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service:
> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID:
> A7E3BB7EEFB524FD; S3 Extended Request ID:
> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=),
> S3 Extended Request ID:
> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=
> (Path:
> s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints/savepoint-5c4090-5f90e0cdc603/_metadata)
> > at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573)
> > at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
> > at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560)
> > at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311)
> > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
> > at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.create(PrestoS3FileSystem.java:356)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
> > at
> org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
> > at
> org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
> > at
> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:169)
> > at
> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:65)
> > at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109)
> > at
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:306)
> > ... 10 more
> > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad
> Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request;
> Request ID: A7E3BB7EEFB524FD; S3 Extended Request ID:
> cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=)
> > at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)
> >
> > If I add to the environment to include:
> > ...
> > s3.endpoint: s3://flink-jdbc-test_graph-minio_1:9000
> > ...
> >
> > Then I end up with the following error just trying to submit the job:
> > Caused by: java.lang.IllegalArgumentException: Endpoint does not contain
> a valid host name: s3://flink-jdbc-test_graph-minio_1:9000
> > at
> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
> > at
> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
> >
> > Changing s3: to http: like so:
> > s3.endpoint: http://flink-jdbc-test_graph-minio_1:9000
> >
> > Then I receive the same error as before when trying to submit the job:
> > Caused by: java.lang.IllegalArgumentException: Endpoint does not contain
> a valid host name: http://flink-jdbc-test_graph-minio_1:9000
> > at
> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
> > at
> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
> >
> > However, I can access the minio container via the minio client from
> docker just fine:
> > ./mc alias set minio http://flink-jdbc-test_graph-minio_1:9000 key key
> --api S3v4
> > But there are no buckets, presumably because saving always fails:
> > ./mc ls minio
> > <nothing returned>
> >
> > Does anyone know how to resolve this issue?
> >
> > Thanks!
> >
> > --
> >
> > Rex Fenley  |  Software Engineer - Mobile and Backend
> >
> >
> > Remind.com |  BLOG  |  FOLLOW US  |  LIKE US
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: Difficulties with Minio state storage

Posted by Yangze Guo <ka...@gmail.com>.
Hi, Rex,

I've tried to use MinIO as state backend and everything seems works well.
Just sharing my configuration:
```
s3.access-key:
s3.secret-key:
s3.endpoint: http://localhost:9000
s3.path.style.access: true
state.checkpoints.dir: s3://flink/checkpoints
```

I think the problem might be caused by the following reasons:
- The MinIO is not well configured.
- Maybe you need to create a bucket for it first. In my case, I create
a bucket called "flink" first.

Best,
Yangze Guo

On Wed, Sep 9, 2020 at 9:33 AM Rex Fenley <Re...@remind101.com> wrote:
>
> Hello!
>
> I'm trying to test out Minio as state storage backend using docker-compose on my local machine but keep running into errors that seem strange to me. Any help would be much appreciated :)
>
> The problem:
> With the following environment:
>
> environment:
> - |
> FLINK_PROPERTIES=
> jobmanager.rpc.address: flink-jobmanager
> parallelism.default: 2
> s3.access-key: <key>
> s3.secret-key: <key>
> s3.path.style.access: true
>
> And the following State Backend (with flink-jdbc-test_graph-minio_1 being the container serving minio):
>
> val bsEnv = StreamExecutionEnvironment.getExecutionEnvironment
> bsEnv.setStateBackend(
> new RocksDBStateBackend(
> "s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000",
> true
> )
> )
>
> And submitting the flink job and saving from another docker container like so:
>
> flink run -m flink-jdbc-test_flink-jobmanager_1:8081 -c <Job Class Name> <built code>.jar
>
> flink savepoint -m flink-jdbc-test_flink-jobmanager_1:8081 <Job ID> s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints
>
> I end up with the following error:
>
> Caused by: com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: A7E3BB7EEFB524FD; S3 Extended Request ID: cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=), S3 Extended Request ID: cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY= (Path: s3://flink-jdbc-test_graph-minio_1:9000/data/savepoints/savepoint-5c4090-5f90e0cdc603/_metadata)
> at com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573)
> at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
> at com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560)
> at com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
> at com.facebook.presto.hive.s3.PrestoS3FileSystem.create(PrestoS3FileSystem.java:356)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
> at org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
> at org.apache.flink.fs.s3presto.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
> at org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:169)
> at org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:65)
> at org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109)
> at org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:306)
> ... 10 more
> Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: A7E3BB7EEFB524FD; S3 Extended Request ID: cJOtc6E3Kb+U5hgbkA+09Dd/ouDHBGL2ftb1pGHpIwFgd6tE461nkaDtjOj40zbWEpFAcMOEmbY=)
> at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)
>
> If I add to the environment to include:
> ...
> s3.endpoint: s3://flink-jdbc-test_graph-minio_1:9000
> ...
>
> Then I end up with the following error just trying to submit the job:
> Caused by: java.lang.IllegalArgumentException: Endpoint does not contain a valid host name: s3://flink-jdbc-test_graph-minio_1:9000
> at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
> at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
>
> Changing s3: to http: like so:
> s3.endpoint: http://flink-jdbc-test_graph-minio_1:9000
>
> Then I receive the same error as before when trying to submit the job:
> Caused by: java.lang.IllegalArgumentException: Endpoint does not contain a valid host name: http://flink-jdbc-test_graph-minio_1:9000
> at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:426)
> at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:318)
>
> However, I can access the minio container via the minio client from docker just fine:
> ./mc alias set minio http://flink-jdbc-test_graph-minio_1:9000 key key --api S3v4
> But there are no buckets, presumably because saving always fails:
> ./mc ls minio
> <nothing returned>
>
> Does anyone know how to resolve this issue?
>
> Thanks!
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com |  BLOG  |  FOLLOW US  |  LIKE US