You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/21 21:13:51 UTC
[GitHub] [iceberg] findinpath opened a new issue #3951: Getting started with Iceberg & Spark
findinpath opened a new issue #3951:
URL: https://github.com/apache/iceberg/issues/3951
As a newbie on Apache Iceberg universe, I am eager to try out the functionality exposed by the framework.
It is not quite straightforward to get to setup an Icerberg environment on Spark.
After downloading the spark 3.1.2 distribution, I configured spark-defaults.conf
```
spark.jars.packages org.apache.iceberg:iceberg-spark3-runtime:0.12.1
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.demo org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.demo.catalog-impl org.apache.iceberg.jdbc.JdbcCatalog
spark.sql.catalog.demo.uri jdbc:postgresql://postgres:5432/demo_catalog
spark.sql.catalog.demo.jdbc.user admin
spark.sql.catalog.demo.jdbc.password password
spark.sql.catalog.demo.io-impl org.apache.iceberg.hadoop.HadoopFileIO
spark.sql.catalog.demo.warehouse /home/iceberg/warehouse
spark.sql.defaultCatalog demo
```
Afterwards I did setup postgres to run on a docker container
```
docker run --name iceberg-spark-postgres -e POSTGRES_USER=admin -e POSTGRES_PASSWORD=password -e POSTGRES_DB=demo_catalog -p 5432:5432 -d postgres
```
While trying out the scenarios exposed on the page https://iceberg.apache.org/#maintenance/
it is mentioned in the code snippets:
```
Table table = ...
```
Getting the Iceberg table for a Spark Catalog is not that straightforward.
After digging up though the Iceberg source code I stitched together this snippet for obtaining the table:
```
import org.apache.spark.sql.connector.catalog.Identifier
val sparkCatalog = spark.sessionState.catalogManager.currentCatalog.asInstanceOf[org.apache.iceberg.spark.SparkCatalog]
val sparkTableTest1 = sparkCatalog.loadTable(Identifier.of(Array[String](""), "test1"))
val icebergTableTest1 = sparkTableTest1.table
```
What I'd like to have (as a newbie) on Iceberg is a Docker image / Docker compose to get started with Spark. Having everything packed together and ready to be used is much easier for a newbie to get started.
For the code samples I'd very much appreciate having also the `SparkCatalog` in java/scala/python examples for a series of general usage scenarios that are not covered by SQL commands for Iceberg.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] hililiwei commented on issue #3951: Getting started with Iceberg & Spark
Posted by GitBox <gi...@apache.org>.
hililiwei commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1034667620
try this: [Docker, Spark, and Iceberg: The Fastest Way to Try Iceberg!](https://tabular.io/blog/docker-spark-and-iceberg/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] findinpath commented on issue #3951: Getting started with Iceberg & Spark
Posted by GitBox <gi...@apache.org>.
findinpath commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1018885168
@RussellSpitzer indeed. I came across by `loadTable` method of `Spark3Util`, but being at the moment of trying stuff out on spark 3.1.2 (AFAIK iceberg 0.12.1 doesn't work fully atm with spark 3.2) I didn't have this method available in the `spark-shell`.
In any case, simple examples that just work (this comes obviously with a maintenance cost) would be a definite win to grow the community around Iceberg.
Thank you for the feedback.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] RussellSpitzer commented on issue #3951: Getting started with Iceberg & Spark
Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1018867083
We actually have a helper for getting the underlying table see
https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java#L644
But I believe we would probably recommend using the procedure api
https://iceberg.apache.org/spark-procedures/#_top
From spark.
That said it would be great if we could have some better samples and such and a more complete docker image
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] flyrain commented on issue #3951: Getting started with Iceberg & Spark
Posted by GitBox <gi...@apache.org>.
flyrain commented on issue #3951:
URL: https://github.com/apache/iceberg/issues/3951#issuecomment-1019594662
> In any case, simple examples that just work (this comes obviously with a maintenance cost) would be a definite win to grow the community around Iceberg.
Cannot agree more. The community is working on the docker image. It will be released pretty soon. Cc @samredai and @kbendick for more details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org