You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/21 21:13:51 UTC

[GitHub] [iceberg] findinpath opened a new issue #3951: Getting started with Iceberg & Spark

findinpath opened a new issue #3951:
URL: https://github.com/apache/iceberg/issues/3951


   As a newbie on Apache Iceberg universe, I am eager to try out the functionality exposed by the framework.
   
   It is not quite straightforward to get to setup an Icerberg environment on Spark.
   After downloading the spark 3.1.2 distribution, I configured spark-defaults.conf
   
   ```
   spark.jars.packages                    org.apache.iceberg:iceberg-spark3-runtime:0.12.1
   spark.sql.extensions                   org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   spark.sql.catalog.demo                 org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.demo.catalog-impl    org.apache.iceberg.jdbc.JdbcCatalog
   spark.sql.catalog.demo.uri             jdbc:postgresql://postgres:5432/demo_catalog
   spark.sql.catalog.demo.jdbc.user       admin
   spark.sql.catalog.demo.jdbc.password   password
   spark.sql.catalog.demo.io-impl         org.apache.iceberg.hadoop.HadoopFileIO
   spark.sql.catalog.demo.warehouse       /home/iceberg/warehouse
   spark.sql.defaultCatalog               demo
   ```
   
   Afterwards I did setup postgres to run on a docker container
   
   ```
   docker run --name iceberg-spark-postgres -e POSTGRES_USER=admin -e POSTGRES_PASSWORD=password -e POSTGRES_DB=demo_catalog -p 5432:5432 -d postgres
   ```
   
   While trying out the scenarios exposed on the page https://iceberg.apache.org/#maintenance/
   
   it is mentioned in the code snippets:
   
   ```
   Table table = ...
   ```
   
   Getting the Iceberg table for a Spark Catalog is not that straightforward. 
   After digging up though the Iceberg source code I stitched together this snippet for obtaining the table:
   
   ```
   import org.apache.spark.sql.connector.catalog.Identifier
   
   val sparkCatalog = spark.sessionState.catalogManager.currentCatalog.asInstanceOf[org.apache.iceberg.spark.SparkCatalog]
   
   val sparkTableTest1 = sparkCatalog.loadTable(Identifier.of(Array[String](""), "test1"))
   
   val icebergTableTest1 = sparkTableTest1.table
   ```
   
   
   What I'd like to have (as a newbie) on Iceberg is a Docker image / Docker compose to get started with Spark. Having everything packed together and ready to be used is much easier for a newbie to get started.
   
   For the code samples I'd very much appreciate having also the `SparkCatalog` in java/scala/python examples for a series of general usage scenarios that are not covered by SQL commands for Iceberg.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on issue #3951: Getting started with Iceberg & Spark

Posted by GitBox <gi...@apache.org>.

hililiwei commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1034667620


   try this:   [Docker, Spark, and Iceberg: The Fastest Way to Try Iceberg!](https://tabular.io/blog/docker-spark-and-iceberg/)           


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] findinpath commented on issue #3951: Getting started with Iceberg & Spark

Posted by GitBox <gi...@apache.org>.

findinpath commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1018885168


   @RussellSpitzer indeed. I came across by `loadTable` method of `Spark3Util`, but being at the moment of trying stuff out on spark 3.1.2 (AFAIK iceberg 0.12.1 doesn't work fully atm with spark 3.2) I didn't have this method available in the `spark-shell`.
   
   In any case, simple examples that just work (this comes obviously with a maintenance cost) would be a definite win to grow the community around Iceberg.
   
   Thank you for the feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #3951: Getting started with Iceberg & Spark

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1018867083


   We actually have a helper for getting the underlying table see
   https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java#L644
   
   But I believe we would probably recommend using the procedure api
   https://iceberg.apache.org/spark-procedures/#_top
   
   From spark. 
   
   That said it would be great if we could have some better samples and such and a more complete docker image
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] flyrain commented on issue #3951: Getting started with Iceberg & Spark

Posted by GitBox <gi...@apache.org>.

flyrain commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1019594662


   > In any case, simple examples that just work (this comes obviously with a maintenance cost) would be a definite win to grow the community around Iceberg.
   
   Cannot agree more. The community is working on the docker image. It will be released pretty soon. Cc @samredai and @kbendick for more details.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org