You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/25 08:43:23 UTC

[GitHub] [iceberg] openinx opened a new issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

openinx opened a new issue #2738:
URL: https://github.com/apache/iceberg/issues/2738


   Let's say we have an iceberg schema: 
   
   ```java
       Schema schema = new Schema(
           Types.NestedField.required(0, "id", Types.LongType.get()),
           Types.NestedField.optional(3, "location", Types.StructType.of(
               Types.NestedField.required(1, "lat", Types.FloatType.get()),
               Types.NestedField.required(2, "long", Types.FloatType.get())
           ))
       );
   ```
   
   And if someone want to do the nested projection by using the project schema: 
   
   ```java
       Schema latOnly = new Schema(
           Types.NestedField.optional(3, "location", Types.StructType.of(
               Types.NestedField.required(1, "lat", Types.FloatType.get())
           ))
       );
   ```
   
   If the data row is : 
   
   ```
   {
      "id": 10001,
      "location": null
   }
   ```
   
   Then what's the expected projected value for the project schema `latOnly`  ?    Should we set the `location.lat` to be null although its field are defined `required`  in `Types.NestedField.required(1, "lat", Types.FloatType.get())` ? 
   
   I think the current [StructProjection](https://github.com/apache/iceberg/blob/90225d6c9413016d611e2ce5eff37db1bc1b4fc5/api/src/main/java/org/apache/iceberg/util/StructProjection.java#L115) did not handle this issue correctly because it will just throw a NullPointerException  when projecting the nested required field while providing a null value for the parent struct.
   
   
   This is related to the broken unit tests from [this PR](https://github.com/apache/iceberg/pull/2731/files#diff-8b18817c3263d1283b5c4f0f98f2201b51bec5a94a7bc0b4885a447cdcd7ccdbR104-R106).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #2738:
URL: https://github.com/apache/iceberg/issues/2738






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #2738:
URL: https://github.com/apache/iceberg/issues/2738#issuecomment-868349637


   I had a test in spark3: 
   
   ```bash
   CREATE TABLE loc(
       id       LONG NOT NULL,
       location STRUCT<lat:DOUBLE NOT NULL, long:DOUBLE NOT NULL>
   );
   
   INSERT INTO loc VALUES (1, null);
   
   > SELECT * FROM loc;
   1	NULL
   Time taken: 0.374 seconds, Fetched 1 row(s)
   
   > SELECT location.lat FROM loc;
   NULL
   Time taken: 0.22 seconds, Fetched 1 row(s)
   ```
   
   So it will get a `NULL` value when projecting the required `location.lat` from optional `location` struct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2738:
URL: https://github.com/apache/iceberg/issues/2738#issuecomment-959695524


   This was fixed in #3240.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2738:
URL: https://github.com/apache/iceberg/issues/2738#issuecomment-959695524






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx edited a comment on issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
openinx edited a comment on issue #2738:
URL: https://github.com/apache/iceberg/issues/2738#issuecomment-868349637


   I had a test in spark3: 
   
   ```sql
   CREATE TABLE loc(
       id       LONG NOT NULL,
       location STRUCT<lat:DOUBLE NOT NULL, long:DOUBLE NOT NULL>
   );
   
   INSERT INTO loc VALUES (1, null);
   
   SELECT * FROM loc;
   1	NULL
   Time taken: 0.374 seconds, Fetched 1 row(s)
   
   SELECT location.lat FROM loc;
   NULL
   Time taken: 0.22 seconds, Fetched 1 row(s)
   ```
   
   So it will get a `NULL` value when projecting the required `location.lat` from optional `location` struct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2738:
URL: https://github.com/apache/iceberg/issues/2738#issuecomment-959695524


   This was fixed in #3240.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #2738:
URL: https://github.com/apache/iceberg/issues/2738


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #2738: What's the correct semantic when projecting a required nested field from an optional struct ?

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #2738:
URL: https://github.com/apache/iceberg/issues/2738


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org