You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/22 17:05:00 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6250: Docs: Remove redundant configuration from spark docs

ajantha-bhat opened a new pull request, #6250:
URL: https://github.com/apache/iceberg/pull/6250

   The section description mentions that we are creating a catalog called `local`, but it also includes redundant configuration for `spark_catalog` of type hive. Also, hive URI is not mentioned for that hive catalog configuration. 
   So, it is a dummy configuration which confuses the users.  Hence, removing it. 
   
   P.S: This is on the getting started front page. So, please verify once more before merging. 
   I did verify and all the examples worked for me without this configuration too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#issuecomment-1356908254

   I don't think this is a good idea and is pretty minor either way. I'm going to close it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#discussion_r1039705908


##########
docs/spark-getting-started.md:
##########
@@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w
 ```sh
 spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:{{% icebergVersion %}}\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
-    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \

Review Comment:
   This makes line 55 Incorrect



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#discussion_r1039836905


##########
docs/spark-getting-started.md:
##########
@@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w
 ```sh
 spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:{{% icebergVersion %}}\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
-    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \

Review Comment:
   "spark_catalog" is the built in catalog for Spark. The one you get without specifying anything else. This configuration would work for any user who has already defined a HMS for Spark and would like it to also work with Iceberg Tables. So I would think this would be useful to most folks who are using Iceberg with an existing Spark Setup.
   
   I think it's fine to skip "spark_catalog" in this particular usage though since all the examples use a separate "local" SparkCatalog. If we remove it we should probably add some more details SparkSessionCatalog in some different section perhaps? Or maybe add another example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#discussion_r1039836905


##########
docs/spark-getting-started.md:
##########
@@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w
 ```sh
 spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:{{% icebergVersion %}}\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
-    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \

Review Comment:
   "spark_catalog" is the built in catalog for Spark. The one you get without specifying anything else. This configuration would work for any user who has already defined a HMS for Spark and would like it to also work with Iceberg Tables.
   
   I think it's fine to skip this usage though since all the examples use a separate "SparkCatalog"
   
   In this case I think we should probably add some more details SparkSessionCatalog in some different section perhaps?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#discussion_r1039848442


##########
docs/spark-getting-started.md:
##########
@@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w
 ```sh
 spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:{{% icebergVersion %}}\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
-    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \

Review Comment:
   https://iceberg.apache.org/docs/latest/spark-configuration/#catalog-configuration already talks about the `SparkSessionCatalog`.  So, we can remove it from here?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#issuecomment-1337013120

   cc: @pvary 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#discussion_r1039751700


##########
docs/spark-getting-started.md:
##########
@@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w
 ```sh
 spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:{{% icebergVersion %}}\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
-    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \

Review Comment:
   Do you mean `adds support for Iceberg tables to Spark's built-in catalog` was achieved by adding a `SparkSessionCatalog`?
   
   But then it is of type hive and URI is not mentioned. Which may confuse the users.
   Also `spark_catalog` was not used in any example snippets. 
   
   So, do you have any suggestions here? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#discussion_r1051689729


##########
docs/spark-getting-started.md:
##########
@@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w
 ```sh
 spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:{{% icebergVersion %}}\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
-    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \

Review Comment:
   I would rather leave it here. This is useful to anyone starting with an existing Spark deployment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #6250: Docs: Remove redundant configuration from spark docs
URL: https://github.com/apache/iceberg/pull/6250


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#discussion_r1039831769


##########
docs/spark-getting-started.md:
##########
@@ -57,8 +57,6 @@ This command creates a path-based catalog named `local` for tables under `$PWD/w
 ```sh
 spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:{{% icebergVersion %}}\
     --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
-    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \

Review Comment:
   which one can be better?
   a) Remove "built-in catalog" mentioned from the line 55 
   b) or change  `spark_catalog` type also to `Hadoop`
   c) or add some hive URI = xxxxxx in the example snippet 
   
   I don't think we can leave it as it is because the catalog-type is hive but without URI will just confuse the users. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6250: Docs: Remove redundant configuration from spark docs

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #6250:
URL: https://github.com/apache/iceberg/pull/6250#issuecomment-1323991146

   cc: @Fokko, @RussellSpitzer  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org