You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/29 09:58:49 UTC

[GitHub] [iceberg] lvyanquan opened a new pull request, #5662: Doc: Add doc to display the results of the table partitions query

lvyanquan opened a new pull request, #5662:
URL: https://github.com/apache/iceberg/pull/5662

   In the pr of https://github.com/apache/iceberg/pull/4516, we added specId for partitions metadata table, but the document of query(https://iceberg.apache.org/docs/latest/spark-queries/#partitions ) hasn't been changed to adapt to that. 
   What's more, the ResultSet structures are not the same between partitioned and not-partitioned tables.
   
   query example:
   ```shell
   0: jdbc:hive2://host:10000> create table p (id int, age int) using iceberg partitioned by (id);
   0: jdbc:hive2://host:10000> create table np (id int, age int) using iceberg;
   0: jdbc:hive2://host:10000> insert into p(id, age) values (1,1),(2,2),(3,3);
   0: jdbc:hive2://host:10000> insert into np(id, age) values (1,1),(2,2),(3,3);
   0: jdbc:hive2://host:10000> select * from spark_catalog.kunni_db.p.partitions;
   +------------+---------------+-------------+----------+
   | partition  | record_count  | file_count  | spec_id  |
   +------------+---------------+-------------+----------+
   | {"id":2}   | 1             | 1           | 0        |
   | {"id":3}   | 1             | 1           | 0        |
   | {"id":1}   | 1             | 1           | 0        |
   +------------+---------------+-------------+----------+
   3 rows selected (0.185 seconds)
   0: jdbc:hive2://host:10000> select * from spark_catalog.kunni_db.np.partitions;
   +---------------+-------------+
   | record_count  | file_count  |
   +---------------+-------------+
   | 3             | 3           |
   +---------------+-------------+
   1 row selected (0.126 seconds)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r960061764


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,15 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+| partition | record_count | file_count | spec_id |
+| -- | -- | -- | -- |
+|  {20211001, 11}|           1|         1|         0|
+|  {20211002, 11}|           1|         1|         0|
+|  {20211001, 10}|           1|         1|         0|
+|  {20211002, 10}|           1|         1|         0|
+
+Note:
+If this table is non-partitioned, it will contain only the record_count and file_count columns.

Review Comment:
   Actually sorry about it, I re-read it and can you make one more fix?    I just realized "this table... it" will be wrong because we mean different tables (data table and metadata table)
   
   "If the table is unpartitioned, the partitions table will contain only the record_count and file_count columns."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r957918372


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,20 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+If this table is not partitioned    

Review Comment:
   Im not sure about adding non-partitioned example for "partitions" table , as I think its something few users would need, but it is right under the header so most people have to scroll past to see what they really want to see (the schema).  Id suggest removing it for now, which would be consistent with the other tables, if it sounds reasonable to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho merged PR #5662:
URL: https://github.com/apache/iceberg/pull/5662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
lvyanquan commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r957969992


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,20 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+If this table is not partitioned    

Review Comment:
   Yeah, I I keep the result of partitioned table and display the result of non-partitioned table using note to make it more understandable. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
lvyanquan commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r959213698


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,15 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+| partition | record_count | file_count | spec_id |
+| -- | -- | -- | -- |
+|  {20211001, 11}|           1|         1|         0|
+|  {20211002, 11}|           1|         1|         0|
+|  {20211001, 10}|           1|         1|         0|
+|  {20211002, 10}|           1|         1|         0|
+
+Note:
+If this table is non-partitioned, the resultSet will contain record_count and file_count only.

Review Comment:
   Thanks for your suggestion, addressed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r960061764


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,15 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+| partition | record_count | file_count | spec_id |
+| -- | -- | -- | -- |
+|  {20211001, 11}|           1|         1|         0|
+|  {20211002, 11}|           1|         1|         0|
+|  {20211001, 10}|           1|         1|         0|
+|  {20211002, 10}|           1|         1|         0|
+
+Note:
+If this table is non-partitioned, it will contain only the record_count and file_count columns.

Review Comment:
    I just realized "this table... it" will be wrong because we mean different tables (data table and metadata table)
   
   How about this fix?  
   
   "For unpartitioned tables, the partitions table will contain only the record_count and file_count columns."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#issuecomment-1235752395

   Merged, thanks @lvyanquan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
lvyanquan commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r960181287


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,15 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+| partition | record_count | file_count | spec_id |
+| -- | -- | -- | -- |
+|  {20211001, 11}|           1|         1|         0|
+|  {20211002, 11}|           1|         1|         0|
+|  {20211001, 10}|           1|         1|         0|
+|  {20211002, 10}|           1|         1|         0|
+
+Note:
+If this table is non-partitioned, it will contain only the record_count and file_count columns.

Review Comment:
   Thanks, addressed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lvyanquan commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
lvyanquan commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r959097698


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,20 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+If this table is not partitioned    

Review Comment:
   addressed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r957918372


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,20 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+If this table is not partitioned    

Review Comment:
   Im not sure about adding non-partitioned example for "partitions" table , as I think its few users would think to do this, but it is right under the header so most people have to scroll past to see what they really want to see (the schema).  Id suggest removing it for now, which would be consistent with the other tables, if it sounds reasonable to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r960061764


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,15 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+| partition | record_count | file_count | spec_id |
+| -- | -- | -- | -- |
+|  {20211001, 11}|           1|         1|         0|
+|  {20211002, 11}|           1|         1|         0|
+|  {20211001, 10}|           1|         1|         0|
+|  {20211002, 10}|           1|         1|         0|
+
+Note:
+If this table is non-partitioned, it will contain only the record_count and file_count columns.

Review Comment:
    I just realized "this table... it" will be wrong because we mean different tables (data table and metadata table)
   
   "If the table is unpartitioned, the partitions table will contain only the record_count and file_count columns."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5662: Doc: Update doc to display the results of the table partitions query

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #5662:
URL: https://github.com/apache/iceberg/pull/5662#discussion_r959194801


##########
docs/spark-queries.md:
##########
@@ -318,12 +318,15 @@ To show a table's current partitions:
 SELECT * FROM prod.db.table.partitions
 ```
 
-| partition | record_count | file_count |
-| -- | -- | -- |
-|  {20211001, 11}|           1|         1|
-|  {20211002, 11}|           1|         1|
-|  {20211001, 10}|           1|         1|
-|  {20211002, 10}|           1|         1|
+| partition | record_count | file_count | spec_id |
+| -- | -- | -- | -- |
+|  {20211001, 11}|           1|         1|         0|
+|  {20211002, 11}|           1|         1|         0|
+|  {20211001, 10}|           1|         1|         0|
+|  {20211002, 10}|           1|         1|         0|
+
+Note:
+If this table is non-partitioned, the resultSet will contain record_count and file_count only.

Review Comment:
   Nit: resultSet is kind of specific , as this is just about a table, can we just omit it like:  
   "..., it will contain only the record_count and file_count columns."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org