You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/04/24 09:20:19 UTC

[GitHub] [flink-table-store] JingsongLi opened a new pull request, #101: [FLINK-27366] Record metadata on filesystem path

JingsongLi opened a new pull request, #101:
URL: https://github.com/apache/flink-table-store/pull/101

   We can store the metadata on the path of the table store, which includes schema, options, etc.
   
   This schema should be in a format that supports evolution, which means that the fields contain id information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on pull request #101: [FLINK-27366] Record metadata on filesystem path

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#issuecomment-1107800479

   This is only a rough draft, please do not review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r874440109


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/utils/JsonSerializer.java:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.utils;
+
+import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonGenerator;
+import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonNode;
+
+import java.io.IOException;
+
+/** Json serializer for jackson. */
+public interface JsonSerializer<T> {
+
+    void serializer(T t, JsonGenerator generator) throws IOException;

Review Comment:
   `serialize`



##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaManager.java:
##########
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.operation.Lock;
+import org.apache.flink.table.store.file.utils.FileUtils;
+import org.apache.flink.table.store.file.utils.JsonSerdeUtil;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Callable;
+
+import static org.apache.flink.table.store.file.utils.FileUtils.listVersionedFiles;
+
+/** Schema Manager to manage schema versions. */
+public class SchemaManager {
+
+    private static final String SCHEMA_PREFIX = "schema-";
+
+    private final Path tableRoot;
+
+    /** Default no lock. */
+    private Lock lock = Callable::call;
+
+    public SchemaManager(Path tableRoot) {
+        this.tableRoot = tableRoot;
+    }
+
+    public SchemaManager withLock(Lock lock) {
+        this.lock = lock;
+        return this;
+    }
+
+    /** @return latest schema. */
+    public Optional<Schema> latest() {
+        try {
+            return listVersionedFiles(schemaDirectory(), SCHEMA_PREFIX)
+                    .reduce(Math::max)
+                    .map(this::schema);
+        } catch (IOException e) {
+            throw new UncheckedIOException(e);
+        }
+    }
+
+    /** Create a new schema from {@link UpdateSchema}. */
+    public Schema commitNewVersion(UpdateSchema updateSchema) throws Exception {
+        RowType rowType = updateSchema.rowType();
+        List<String> partitionKeys = updateSchema.partitionKeys();
+        List<String> primaryKeys = updateSchema.primaryKeys();
+        Map<String, String> options = updateSchema.options();
+
+        while (true) {
+            long id;
+            int highestFieldId;
+            List<DataField> fields;
+            Optional<Schema> latest = latest();
+            if (latest.isPresent()) {
+                Schema oldSchema = latest.get();
+                Preconditions.checkArgument(
+                        oldSchema.primaryKeys().equals(primaryKeys),
+                        "Primary key modification is not supported, "
+                                + "old primaryKeys is %s, new primaryKeys is %s",
+                        oldSchema.primaryKeys(),
+                        primaryKeys);
+
+                if (!updateSchema
+                                .rowType()
+                                .getFields()
+                                .equals(oldSchema.logicalRowType().getFields())
+                        || !updateSchema.partitionKeys().equals(oldSchema.partitionKeys())) {
+                    throw new UnsupportedOperationException(
+                            "TODO: support update field types and partition keys. ");
+                }
+
+                fields = oldSchema.fields();
+                id = oldSchema.id() + 1;
+                highestFieldId = oldSchema.highestFieldId();
+            } else {
+                fields = Schema.newFields(rowType);
+                highestFieldId = Schema.currentHighestFieldId(fields);
+                id = 0;
+            }
+
+            Schema schema =
+                    new Schema(id, fields, highestFieldId, partitionKeys, primaryKeys, options);
+
+            Path temp = toTmpSchemaPath(id);
+            Path finalFile = toSchemaPath(id);
+            FileUtils.writeFileUtf8(temp, schema.toString());
+
+            Boolean success = lock.runWithLock(() -> temp.getFileSystem().rename(temp, finalFile));
+            if (success) {
+                return schema;
+            } else {
+                // retry
+                FileUtils.deleteOrWarn(temp);

Review Comment:
   If this method fails with exception `temp` file will not be cleaned.



##########
flink-table-store-core/src/test/java/org/apache/flink/table/store/file/schema/SchemaManagerTest.java:
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.utils.FailingAtomicRenameFileSystem;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.flink.table.store.file.utils.FailingAtomicRenameFileSystem.retryArtificialException;
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+/** Test for {@link SchemaManager}. */
+public class SchemaManagerTest {

Review Comment:
   Lacks concurrent tests and cleanup tests for `commitNewVersion`.



##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaManager.java:
##########
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.operation.Lock;
+import org.apache.flink.table.store.file.utils.FileUtils;
+import org.apache.flink.table.store.file.utils.JsonSerdeUtil;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Callable;
+
+import static org.apache.flink.table.store.file.utils.FileUtils.listVersionedFiles;
+
+/** Schema Manager to manage schema versions. */
+public class SchemaManager {
+
+    private static final String SCHEMA_PREFIX = "schema-";
+
+    private final Path tableRoot;
+
+    /** Default no lock. */
+    private Lock lock = Callable::call;
+
+    public SchemaManager(Path tableRoot) {
+        this.tableRoot = tableRoot;
+    }
+
+    public SchemaManager withLock(Lock lock) {
+        this.lock = lock;
+        return this;
+    }
+
+    /** @return latest schema. */
+    public Optional<Schema> latest() {
+        try {
+            return listVersionedFiles(schemaDirectory(), SCHEMA_PREFIX)
+                    .reduce(Math::max)
+                    .map(this::schema);

Review Comment:
   Add hint files just like `SnapshotFinder`? Maybe extract common classes for both snapshot and schema.



##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/Schema.java:
##########
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.table.store.file.utils.JsonSerdeUtil;
+import org.apache.flink.table.types.logical.ArrayType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.MapType;
+import org.apache.flink.table.types.logical.MultisetType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.stream.Collectors;
+
+/** Schema of table store. */
+public class Schema {
+
+    private final long id;
+
+    private final List<DataField> fields;
+
+    /** Not available from fields, as some fields may have been deleted. */
+    private final int highestFieldId;
+
+    private final List<String> partitionKeys;
+
+    private final List<String> primaryKeys;
+
+    private final Map<String, String> options;
+
+    public Schema(
+            long id,
+            List<DataField> fields,
+            int highestFieldId,
+            List<String> partitionKeys,
+            List<String> primaryKeys,
+            Map<String, String> options) {
+        this.id = id;
+        this.fields = fields;
+        this.highestFieldId = highestFieldId;
+        this.partitionKeys = partitionKeys;
+        this.primaryKeys = primaryKeys;
+        this.options = Collections.unmodifiableMap(options);
+
+        // try to trim to validate primary keys
+        trimmedPrimaryKeys();
+    }
+
+    public long id() {
+        return id;
+    }
+
+    public List<DataField> fields() {
+        return fields;
+    }
+
+    public int highestFieldId() {
+        return highestFieldId;
+    }
+
+    public List<String> partitionKeys() {
+        return partitionKeys;
+    }
+
+    public List<String> primaryKeys() {
+        return primaryKeys;
+    }
+
+    public List<String> trimmedPrimaryKeys() {
+        if (primaryKeys.size() > 0) {
+            Preconditions.checkState(
+                    primaryKeys.containsAll(partitionKeys),
+                    String.format(
+                            "Primary key constraint %s should include all partition fields %s",
+                            primaryKeys, partitionKeys));
+            List<String> adjusted =
+                    primaryKeys.stream()
+                            .filter(pk -> !partitionKeys.contains(pk))
+                            .collect(Collectors.toList());
+
+            Preconditions.checkState(
+                    adjusted.size() > 0,
+                    String.format(
+                            "Primary key constraint %s should not be same with partition fields %s, this will result in only one record in a partition",
+                            primaryKeys, partitionKeys));
+
+            return adjusted;
+        }
+
+        return primaryKeys;
+    }
+
+    public Map<String, String> options() {
+        return options;
+    }
+
+    public RowType logicalRowType() {
+        return (RowType) new RowDataType(fields).logicalType;
+    }
+
+    @Override
+    public String toString() {
+        return JsonSerdeUtil.toJson(this);
+    }
+
+    @Override
+    public boolean equals(Object o) {
+        if (this == o) {
+            return true;
+        }
+        if (o == null || getClass() != o.getClass()) {
+            return false;
+        }
+        Schema schema = (Schema) o;
+        return Objects.equals(fields, schema.fields)
+                && Objects.equals(partitionKeys, schema.partitionKeys)
+                && Objects.equals(primaryKeys, schema.primaryKeys)
+                && Objects.equals(options, schema.options);
+    }
+
+    @Override
+    public int hashCode() {
+        return Objects.hash(fields, partitionKeys, primaryKeys, options);
+    }
+
+    public static List<DataField> newFields(RowType rowType) {
+        return ((RowDataType) toDataType(rowType, new AtomicInteger(-1))).fields();
+    }
+
+    private static DataType toDataType(LogicalType type, AtomicInteger currentHighestFieldId) {
+        if (type instanceof ArrayType) {
+            DataType element =
+                    toDataType(((ArrayType) type).getElementType(), currentHighestFieldId);
+            return new ArrayDataType(element);
+        } else if (type instanceof MultisetType) {
+            DataType element =
+                    toDataType(((MultisetType) type).getElementType(), currentHighestFieldId);
+            return new MultisetDataType(element);
+        } else if (type instanceof MapType) {
+            DataType key = toDataType(((MapType) type).getKeyType(), currentHighestFieldId);
+            DataType value = toDataType(((MapType) type).getValueType(), currentHighestFieldId);
+            return new MapDataType(key, value);
+        } else if (type instanceof RowType) {
+            List<DataField> fields = new ArrayList<>();
+            for (RowType.RowField field : ((RowType) type).getFields()) {
+                int id = currentHighestFieldId.incrementAndGet();
+                DataType fieldType = toDataType(field.getType(), currentHighestFieldId);
+                fields.add(
+                        new DataField(
+                                id,
+                                field.getName(),
+                                fieldType,
+                                field.getDescription().orElse(null)));
+            }
+            return new RowDataType(fields);
+        } else {
+            return new AtomicDataType(type);
+        }
+    }
+
+    public static int currentHighestFieldId(List<DataField> fields) {
+        Set<Integer> fieldIds = new HashSet<>();
+        collectFieldIds(fieldIds, new RowDataType(fields));
+        return fieldIds.stream().max(Integer::compareTo).orElse(-1);
+    }
+
+    private static void collectFieldIds(Set<Integer> fieldIds, DataType type) {

Review Comment:
   `DataType type, Set<Integer> fieldIds`. Input arguments should be in front of output arguments.



##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/utils/JsonSerializer.java:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.utils;
+
+import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonGenerator;
+import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonNode;
+
+import java.io.IOException;
+
+/** Json serializer for jackson. */
+public interface JsonSerializer<T> {
+
+    void serializer(T t, JsonGenerator generator) throws IOException;
+
+    T deserializer(JsonNode node);

Review Comment:
   `deserialize`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r875435232


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaManager.java:
##########
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.operation.Lock;
+import org.apache.flink.table.store.file.utils.FileUtils;
+import org.apache.flink.table.store.file.utils.JsonSerdeUtil;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Callable;
+
+import static org.apache.flink.table.store.file.utils.FileUtils.listVersionedFiles;
+
+/** Schema Manager to manage schema versions. */
+public class SchemaManager {
+
+    private static final String SCHEMA_PREFIX = "schema-";
+
+    private final Path tableRoot;
+
+    /** Default no lock. */
+    private Lock lock = Callable::call;
+
+    public SchemaManager(Path tableRoot) {
+        this.tableRoot = tableRoot;
+    }
+
+    public SchemaManager withLock(Lock lock) {
+        this.lock = lock;
+        return this;
+    }
+
+    /** @return latest schema. */
+    public Optional<Schema> latest() {
+        try {
+            return listVersionedFiles(schemaDirectory(), SCHEMA_PREFIX)
+                    .reduce(Math::max)
+                    .map(this::schema);

Review Comment:
   I don't think it's necessary for schema because there won't be too many versions of schema



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r874567219


##########
flink-table-store-core/src/test/java/org/apache/flink/table/store/file/schema/SchemaManagerTest.java:
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.utils.FailingAtomicRenameFileSystem;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.flink.table.store.file.utils.FailingAtomicRenameFileSystem.retryArtificialException;
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+/** Test for {@link SchemaManager}. */
+public class SchemaManagerTest {

Review Comment:
   I'll add after each check for cleanup tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r874562026


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaManager.java:
##########
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.operation.Lock;
+import org.apache.flink.table.store.file.utils.FileUtils;
+import org.apache.flink.table.store.file.utils.JsonSerdeUtil;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Callable;
+
+import static org.apache.flink.table.store.file.utils.FileUtils.listVersionedFiles;
+
+/** Schema Manager to manage schema versions. */
+public class SchemaManager {
+
+    private static final String SCHEMA_PREFIX = "schema-";
+
+    private final Path tableRoot;
+
+    /** Default no lock. */
+    private Lock lock = Callable::call;
+
+    public SchemaManager(Path tableRoot) {
+        this.tableRoot = tableRoot;
+    }
+
+    public SchemaManager withLock(Lock lock) {
+        this.lock = lock;
+        return this;
+    }
+
+    /** @return latest schema. */
+    public Optional<Schema> latest() {
+        try {
+            return listVersionedFiles(schemaDirectory(), SCHEMA_PREFIX)
+                    .reduce(Math::max)
+                    .map(this::schema);
+        } catch (IOException e) {
+            throw new UncheckedIOException(e);
+        }
+    }
+
+    /** Create a new schema from {@link UpdateSchema}. */
+    public Schema commitNewVersion(UpdateSchema updateSchema) throws Exception {
+        RowType rowType = updateSchema.rowType();
+        List<String> partitionKeys = updateSchema.partitionKeys();
+        List<String> primaryKeys = updateSchema.primaryKeys();
+        Map<String, String> options = updateSchema.options();
+
+        while (true) {
+            long id;
+            int highestFieldId;
+            List<DataField> fields;
+            Optional<Schema> latest = latest();
+            if (latest.isPresent()) {
+                Schema oldSchema = latest.get();
+                Preconditions.checkArgument(
+                        oldSchema.primaryKeys().equals(primaryKeys),
+                        "Primary key modification is not supported, "
+                                + "old primaryKeys is %s, new primaryKeys is %s",
+                        oldSchema.primaryKeys(),
+                        primaryKeys);
+
+                if (!updateSchema
+                                .rowType()
+                                .getFields()
+                                .equals(oldSchema.logicalRowType().getFields())
+                        || !updateSchema.partitionKeys().equals(oldSchema.partitionKeys())) {
+                    throw new UnsupportedOperationException(
+                            "TODO: support update field types and partition keys. ");
+                }
+
+                fields = oldSchema.fields();
+                id = oldSchema.id() + 1;
+                highestFieldId = oldSchema.highestFieldId();
+            } else {
+                fields = Schema.newFields(rowType);
+                highestFieldId = Schema.currentHighestFieldId(fields);
+                id = 0;
+            }
+
+            Schema schema =
+                    new Schema(id, fields, highestFieldId, partitionKeys, primaryKeys, options);
+
+            Path temp = toTmpSchemaPath(id);
+            Path finalFile = toSchemaPath(id);
+            FileUtils.writeFileUtf8(temp, schema.toString());
+
+            Boolean success = lock.runWithLock(() -> temp.getFileSystem().rename(temp, finalFile));
+            if (success) {
+                return schema;
+            } else {
+                // retry
+                FileUtils.deleteOrWarn(temp);

Review Comment:
   Yes, but we dont have other solutions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r875435428


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaManager.java:
##########
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.operation.Lock;
+import org.apache.flink.table.store.file.utils.FileUtils;
+import org.apache.flink.table.store.file.utils.JsonSerdeUtil;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Callable;
+
+import static org.apache.flink.table.store.file.utils.FileUtils.listVersionedFiles;
+
+/** Schema Manager to manage schema versions. */
+public class SchemaManager {
+
+    private static final String SCHEMA_PREFIX = "schema-";
+
+    private final Path tableRoot;
+
+    /** Default no lock. */
+    private Lock lock = Callable::call;
+
+    public SchemaManager(Path tableRoot) {
+        this.tableRoot = tableRoot;
+    }
+
+    public SchemaManager withLock(Lock lock) {
+        this.lock = lock;
+        return this;
+    }
+
+    /** @return latest schema. */
+    public Optional<Schema> latest() {
+        try {
+            return listVersionedFiles(schemaDirectory(), SCHEMA_PREFIX)
+                    .reduce(Math::max)
+                    .map(this::schema);

Review Comment:
   schema changes are a very low frequency thing (Compared to snapshot generation)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r874605069


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaManager.java:
##########
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.operation.Lock;
+import org.apache.flink.table.store.file.utils.FileUtils;
+import org.apache.flink.table.store.file.utils.JsonSerdeUtil;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Callable;
+
+import static org.apache.flink.table.store.file.utils.FileUtils.listVersionedFiles;
+
+/** Schema Manager to manage schema versions. */
+public class SchemaManager {
+
+    private static final String SCHEMA_PREFIX = "schema-";
+
+    private final Path tableRoot;
+
+    /** Default no lock. */
+    private Lock lock = Callable::call;
+
+    public SchemaManager(Path tableRoot) {
+        this.tableRoot = tableRoot;
+    }
+
+    public SchemaManager withLock(Lock lock) {
+        this.lock = lock;
+        return this;
+    }
+
+    /** @return latest schema. */
+    public Optional<Schema> latest() {
+        try {
+            return listVersionedFiles(schemaDirectory(), SCHEMA_PREFIX)
+                    .reduce(Math::max)
+                    .map(this::schema);
+        } catch (IOException e) {
+            throw new UncheckedIOException(e);
+        }
+    }
+
+    /** Create a new schema from {@link UpdateSchema}. */
+    public Schema commitNewVersion(UpdateSchema updateSchema) throws Exception {
+        RowType rowType = updateSchema.rowType();
+        List<String> partitionKeys = updateSchema.partitionKeys();
+        List<String> primaryKeys = updateSchema.primaryKeys();
+        Map<String, String> options = updateSchema.options();
+
+        while (true) {
+            long id;
+            int highestFieldId;
+            List<DataField> fields;
+            Optional<Schema> latest = latest();
+            if (latest.isPresent()) {
+                Schema oldSchema = latest.get();
+                Preconditions.checkArgument(
+                        oldSchema.primaryKeys().equals(primaryKeys),
+                        "Primary key modification is not supported, "
+                                + "old primaryKeys is %s, new primaryKeys is %s",
+                        oldSchema.primaryKeys(),
+                        primaryKeys);
+
+                if (!updateSchema
+                                .rowType()
+                                .getFields()
+                                .equals(oldSchema.logicalRowType().getFields())
+                        || !updateSchema.partitionKeys().equals(oldSchema.partitionKeys())) {
+                    throw new UnsupportedOperationException(
+                            "TODO: support update field types and partition keys. ");
+                }
+
+                fields = oldSchema.fields();
+                id = oldSchema.id() + 1;
+                highestFieldId = oldSchema.highestFieldId();
+            } else {
+                fields = Schema.newFields(rowType);
+                highestFieldId = Schema.currentHighestFieldId(fields);
+                id = 0;
+            }
+
+            Schema schema =
+                    new Schema(id, fields, highestFieldId, partitionKeys, primaryKeys, options);
+
+            Path temp = toTmpSchemaPath(id);
+            Path finalFile = toSchemaPath(id);
+            FileUtils.writeFileUtf8(temp, schema.toString());
+
+            Boolean success = lock.runWithLock(() -> temp.getFileSystem().rename(temp, finalFile));
+            if (success) {
+                return schema;
+            } else {
+                // retry
+                FileUtils.deleteOrWarn(temp);

Review Comment:
   You have. Wrap the code with `try... finally...`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r874580082


##########
flink-table-store-core/src/test/java/org/apache/flink/table/store/file/schema/SchemaManagerTest.java:
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.schema;
+
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.table.store.file.utils.FailingAtomicRenameFileSystem;
+import org.apache.flink.table.types.logical.BigIntType;
+import org.apache.flink.table.types.logical.IntType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.table.types.logical.VarCharType;
+
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.flink.table.store.file.utils.FailingAtomicRenameFileSystem.retryArtificialException;
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+/** Test for {@link SchemaManager}. */
+public class SchemaManagerTest {

Review Comment:
   I'll add one `testConcurrentCommit` test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi merged pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
JingsongLi merged PR #101:
URL: https://github.com/apache/flink-table-store/pull/101


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #101: [FLINK-27366] Record schema on filesystem path

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #101:
URL: https://github.com/apache/flink-table-store/pull/101#discussion_r875508139


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaManager.java:
##########
@@ -111,12 +123,16 @@ public Schema commitNewVersion(UpdateSchema updateSchema) throws Exception {
             Path finalFile = toSchemaPath(id);
             FileUtils.writeFileUtf8(temp, schema.toString());
 
-            Boolean success = lock.runWithLock(() -> temp.getFileSystem().rename(temp, finalFile));
-            if (success) {
-                return schema;
-            } else {
-                // retry
-                FileUtils.deleteOrWarn(temp);
+            boolean success = false;
+            try {

Review Comment:
   `try` should also cover file write. File write may be partial



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org