You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/10 08:35:31 UTC

[GitHub] [iceberg] lvyanquan opened a new pull request, #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

lvyanquan opened a new pull request, #5486:
URL: https://github.com/apache/iceberg/pull/5486

   as what stevenzwu said in https://github.com/apache/iceberg/pull/5050 , the table named 'test_upsert_query' in TestFlinkUpsert was not partitioned by date, which would bring about  confusion, needed to be refactored.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r950691131


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -203,40 +205,40 @@ public void testPrimaryKeyFieldsAtBeginningOfSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(data STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(name STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   Merging these two methods into one will make this method look long because I can't reuse the rows, which may bring out confusion. So I still wonder should I do this.  
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] stevenzwu commented on pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

stevenzwu commented on PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#issuecomment-1278998230

   thanks @lvyanquan for the contribution and @hililiwei and @pvary for reviews


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r950689430


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -127,33 +127,32 @@ public void testUpsertAndQuery() {
     LocalDate dt20220302 = LocalDate.of(2022, 3, 2);
 
     sql(
-        "CREATE TABLE %s(id INT NOT NULL, province STRING NOT NULL, dt DATE, PRIMARY KEY(id,province) NOT ENFORCED) "
-            + "PARTITIONED BY (province) WITH %s",
+        "CREATE TABLE %s(id INT NOT NULL, name STRING NOT NULL, dt DATE, PRIMARY KEY(id,dt) NOT ENFORCED) "
+            + "PARTITIONED BY (dt) WITH %s",
         tableName, toWithClause(tableUpsertProps));
 
     try {
       sql(
           "INSERT INTO %s VALUES "
-              + "(1, 'a', DATE '2022-03-01'),"
-              + "(2, 'b', DATE '2022-03-01'),"
-              + "(1, 'b', DATE '2022-03-01')",
+              + "(1, 'Bill', DATE '2022-03-01'),"
+              + "(1, 'Jane', DATE '2022-03-01'),"
+              + "(2, 'Jane', DATE '2022-03-01')",
           tableName);
 
       sql(
           "INSERT INTO %s VALUES "
-              + "(4, 'a', DATE '2022-03-02'),"
-              + "(5, 'b', DATE '2022-03-02'),"
-              + "(1, 'b', DATE '2022-03-02')",
+              + "(2, 'Bill', DATE '2022-03-01'),"
+              + "(1, 'Jane', DATE '2022-03-02'),"
+              + "(2, 'Jane', DATE '2022-03-02')",
           tableName);
 
       List<Row> rowsOn20220301 =
-          Lists.newArrayList(Row.of(2, "b", dt20220301), Row.of(1, "a", dt20220301));
+          Lists.newArrayList(Row.of(1, "Jane", dt20220301), Row.of(2, "Bill", dt20220301));
       TestHelpers.assertRows(
           sql("SELECT * FROM %s WHERE dt < '2022-03-02'", tableName), rowsOn20220301);
 
       List<Row> rowsOn20220302 =
-          Lists.newArrayList(
-              Row.of(1, "b", dt20220302), Row.of(4, "a", dt20220302), Row.of(5, "b", dt20220302));
+          Lists.newArrayList(Row.of(1, "Jane", dt20220302), Row.of(2, "Jane", dt20220302));

Review Comment:
   include two situation:
   1)  update rows in one transaction or rows between two transaction if they have the same value in primary columns.
   2) don't update rows if they have the same value in only part primary columns.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

hililiwei commented on PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#issuecomment-1210482344

   Thank you for your contribution. If you add `Close issuse id` to your body, it will automatically bound with the issuse.
   There are also some tables that are not partitioned by date, we can consider whether to modify them together. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r950691579


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -168,30 +167,31 @@ public void testUpsertAndQuery() {
   @Test
   public void testPrimaryKeyEqualToPartitionKey() {
     // This is an SQL based reproduction of TestFlinkIcebergSinkV2#testUpsertOnDataKey
-    String tableName = "upsert_on_data_key";
+    String tableName = "upsert_on_id_key";
     try {
       sql(
-          "CREATE TABLE %s(id INT NOT NULL, data STRING NOT NULL, PRIMARY KEY(data) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(id INT NOT NULL, name STRING NOT NULL, PRIMARY KEY(id) NOT ENFORCED) "
+              + "PARTITIONED BY (id) WITH %s",
           tableName, toWithClause(tableUpsertProps));
 
-      sql("INSERT INTO %s VALUES " + "(1, 'aaa')," + "(2, 'aaa')," + "(3, 'bbb')", tableName);
+      sql("INSERT INTO %s VALUES " + "(1, 'Bill')," + "(1, 'Jane')," + "(2, 'Bill')", tableName);
 
       TestHelpers.assertRows(
           sql("SELECT * FROM %s", tableName),
-          Lists.newArrayList(Row.of(2, "aaa"), Row.of(3, "bbb")));
+          Lists.newArrayList(Row.of(1, "Jane"), Row.of(2, "Bill")));
 
-      sql("INSERT INTO %s VALUES " + "(4, 'aaa')," + "(5, 'bbb')", tableName);
+      sql("INSERT INTO %s VALUES " + "(1, 'Bill')," + "(2, 'Jane')", tableName);
 
       TestHelpers.assertRows(
           sql("SELECT * FROM %s", tableName),
-          Lists.newArrayList(Row.of(4, "aaa"), Row.of(5, "bbb")));
+          Lists.newArrayList(Row.of(1, "Bill"), Row.of(2, "Jane")));
 
-      sql("INSERT INTO %s VALUES " + "(6, 'aaa')," + "(7, 'bbb')", tableName);
+      sql("INSERT INTO %s VALUES " + "(3, 'Bill')," + "(4, 'Jane')", tableName);

Review Comment:
   I think these two lines have the same function as before lines, so I changed it to the case where the column values are not the same



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r945108116


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -127,8 +127,8 @@ public void testUpsertAndQuery() {
     LocalDate dt20220302 = LocalDate.of(2022, 3, 2);
 
     sql(
-        "CREATE TABLE %s(id INT NOT NULL, province STRING NOT NULL, dt DATE, PRIMARY KEY(id,province) NOT ENFORCED) "
-            + "PARTITIONED BY (province) WITH %s",
+        "CREATE TABLE %s(id INT NOT NULL, province STRING NOT NULL, dt DATE, PRIMARY KEY(id,dt) NOT ENFORCED) "

Review Comment:
   Thank you for your suggestion. I have re-submitted the code to make column name and value more understandable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

pvary commented on PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#issuecomment-1216800292

   I am fine with the change, but I would like to ask @hililiwei to review this if she has time for it.
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] stevenzwu merged pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

stevenzwu merged PR #5486:
URL: https://github.com/apache/iceberg/pull/5486


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

stevenzwu commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r945413670


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -203,40 +220,40 @@ public void testPrimaryKeyFieldsAtBeginningOfSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(data STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(id INT, name STRING NOT NULL, dt DATE NOT NULL, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   this test is for primary key at beginning of schema. this changed the order of the fields. we probably should do `dt, name, id`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r945586699


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -203,40 +220,40 @@ public void testPrimaryKeyFieldsAtBeginningOfSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(data STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(id INT, name STRING NOT NULL, dt DATE NOT NULL, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   Thanks for your suggestion, my understanding of this test has improved.
    I have re-submitted the code and you may review again to check what else needs to be changed when you are free.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

hililiwei commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r942537731


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -147,7 +147,7 @@ public void testUpsertAndQuery() {
           tableName);
 
       List<Row> rowsOn20220301 =
-          Lists.newArrayList(Row.of(2, "b", dt20220301), Row.of(1, "a", dt20220301));
+          Lists.newArrayList(Row.of(1, "b", dt20220301), Row.of(2, "b", dt20220301));

Review Comment:
   We may be need to modify the ut data to match the purpose of the test.
   Let the second sql `upsert` the data in the first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

stevenzwu commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r944602570


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -127,8 +127,8 @@ public void testUpsertAndQuery() {
     LocalDate dt20220302 = LocalDate.of(2022, 3, 2);
 
     sql(
-        "CREATE TABLE %s(id INT NOT NULL, province STRING NOT NULL, dt DATE, PRIMARY KEY(id,province) NOT ENFORCED) "
-            + "PARTITIONED BY (province) WITH %s",
+        "CREATE TABLE %s(id INT NOT NULL, province STRING NOT NULL, dt DATE, PRIMARY KEY(id,dt) NOT ENFORCED) "

Review Comment:
   thanks for making the test more readable.
   
   Can we also change `province` to sth like `name`? Then we can use names like `Jane`, `Joe`. might be slightly easier to follow than `a`, `b`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#issuecomment-1221308475

   @hililiwei  I resubmitted the code to  use 'id dt' as the primary key of tables, you may review it if you have time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r942800417


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -147,7 +147,7 @@ public void testUpsertAndQuery() {
           tableName);
 
       List<Row> rowsOn20220301 =
-          Lists.newArrayList(Row.of(2, "b", dt20220301), Row.of(1, "a", dt20220301));
+          Lists.newArrayList(Row.of(1, "b", dt20220301), Row.of(2, "b", dt20220301));

Review Comment:
   thanks for your suggestion. I have verified the order of row and re-submitted the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

hililiwei commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r949038790


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -203,40 +205,40 @@ public void testPrimaryKeyFieldsAtBeginningOfSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(data STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(name STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   Also, I'd like to ask @kbendick that if we use the following sql, this test will be very similar to `testUpsertAndQuery`.
   `"CREATE TABLE %s(id INT NOT NULL, dt DATE NOT NULL, name STRING, PRIMARY KEY(id,dt) NOT ENFORCED) "`
   
   Can we merge the two into one?
   
   Anything that cuts down the UT time, it's worth it.



##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -203,40 +205,40 @@ public void testPrimaryKeyFieldsAtBeginningOfSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(data STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(name STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   Also, I'd like to ask @kbendick that if we use the following sql, this test will be very similar to `testUpsertAndQuery`.
   
   `"CREATE TABLE %s(id INT NOT NULL, dt DATE NOT NULL, name STRING, PRIMARY KEY(id,dt) NOT ENFORCED) "`
   
   Can we merge the two into one?
   
   Anything that cuts down the UT time, it's worth it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

hililiwei commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r949031169


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -203,40 +205,40 @@ public void testPrimaryKeyFieldsAtBeginningOfSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(data STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(name STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   It still confusing to use `name dt` as the primary key. We can put `id` at the begining and use `id dt` as the primary key.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

hililiwei commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r949031169


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -203,40 +205,40 @@ public void testPrimaryKeyFieldsAtBeginningOfSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(data STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(name STRING NOT NULL, dt DATE NOT NULL, id INT, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   It still confusing to use `name dt` as the primary key. We can put `id` at the begining and use it and use `id dt` as the primary key.



##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -251,40 +253,40 @@ public void testPrimaryKeyFieldsAtEndOfTableSchema() {
     LocalDate dt = LocalDate.of(2022, 3, 1);
     try {
       sql(
-          "CREATE TABLE %s(id INT, data STRING NOT NULL, dt DATE NOT NULL, PRIMARY KEY(data,dt) NOT ENFORCED) "
-              + "PARTITIONED BY (data) WITH %s",
+          "CREATE TABLE %s(id INT, name STRING NOT NULL, dt DATE NOT NULL, PRIMARY KEY(name,dt) NOT ENFORCED) "

Review Comment:
   Like the previous one, we can put `id` after `name` and use `id dt` as the primary key.
   



##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -127,33 +127,35 @@ public void testUpsertAndQuery() {
     LocalDate dt20220302 = LocalDate.of(2022, 3, 2);
 
     sql(
-        "CREATE TABLE %s(id INT NOT NULL, province STRING NOT NULL, dt DATE, PRIMARY KEY(id,province) NOT ENFORCED) "
-            + "PARTITIONED BY (province) WITH %s",
+        "CREATE TABLE %s(id INT NOT NULL, name STRING NOT NULL, dt DATE, PRIMARY KEY(id,dt) NOT ENFORCED) "
+            + "PARTITIONED BY (dt) WITH %s",
         tableName, toWithClause(tableUpsertProps));
 
     try {
       sql(
           "INSERT INTO %s VALUES "
-              + "(1, 'a', DATE '2022-03-01'),"
-              + "(2, 'b', DATE '2022-03-01'),"
-              + "(1, 'b', DATE '2022-03-01')",
+              + "(1, 'Bob', DATE '2022-03-01'),"
+              + "(2, 'Jane', DATE '2022-03-01'),"
+              + "(1, 'Jane', DATE '2022-03-01')",
           tableName);
 
       sql(
           "INSERT INTO %s VALUES "
-              + "(4, 'a', DATE '2022-03-02'),"
-              + "(5, 'b', DATE '2022-03-02'),"
-              + "(1, 'b', DATE '2022-03-02')",
+              + "(4, 'Bob', DATE '2022-03-02'),"
+              + "(5, 'Jane', DATE '2022-03-02'),"
+              + "(1, 'Jane', DATE '2022-03-02')",

Review Comment:
   It is not verified that the second SQL will upsert the data in the first. I think is incorrect.
   For example, adding "(1, 'Jane', DATE '2022-03-01')"
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

stevenzwu commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r945412786


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -169,29 +171,44 @@ public void testUpsertAndQuery() {
   public void testPrimaryKeyEqualToPartitionKey() {

Review Comment:
   I feel we probably don't need to change this test method. there is no need to use the same schema as the previous method. there is an extra column for cognitive burden.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] lvyanquan commented on pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

lvyanquan commented on PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#issuecomment-1210571712

   @hililiwei  Got it. I wrote a new version of the code and committed it, but I wonder should I rewrite the sql of testPrimaryKeyEqualToPartitionKey to unify the partition key. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5486: Flink: revise unit test of FlinkUpsert so the table is partitioned by date

Posted by GitBox <gi...@apache.org>.

stevenzwu commented on code in PR #5486:
URL: https://github.com/apache/iceberg/pull/5486#discussion_r946322759


##########
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkUpsert.java:
##########
@@ -171,27 +173,27 @@ public void testPrimaryKeyEqualToPartitionKey() {
     String tableName = "upsert_on_data_key";
     try {
       sql(
-          "CREATE TABLE %s(id INT NOT NULL, data STRING NOT NULL, PRIMARY KEY(data) NOT ENFORCED) "

Review Comment:
   I know I suggested `name` for the test method above. I am fine with expanding the scope to all test methods. 
   
   Let's also hear if other people have different take and think changing `data` to `name` is unnecessary here and below. I am fine either way. will approve the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org