You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/04 10:25:29 UTC

[GitHub] [iceberg] kbendick opened a new issue #2676: Create a docker based demo environment for user testing and community onboarding

kbendick opened a new issue #2676:
URL: https://github.com/apache/iceberg/issues/2676


   We currently lack any kind of docker based demo environment.
   
   We have some sample notebooks, and some JMH tests, but we don't have a playground to provide to potential users to easily download and evaluate Iceberg.
   
   Having a docker demo or testing environment has become somewhat the standard for many open source projects. Especially given how many parts are involved with using Iceberg (at least one distributed computing environment, a catalog, somewhere to store data, etc), the barrier to entry can seem somewhat high for people - especially people who don't have a hive metastore at all (which is probably more common than many people think.... there's a lot of data engineers out there just working with files on S3 and maintaining them as tables themselves in whatever ad-hoc fashion).
   
   I've opened this ticket as a follow up to https://github.com/apache/iceberg/issues/1081, since that issue is very old and has not seen any progress in a long time.
   
   I have a pretty decent, basic docker-compose environment that I can push that has spark, a hive metastore, and HDFS.
   
   I know many people have expressed interest in this. Given that my environment is already usable and is pretty complete, I would like to push that to use as a starting point to get the discussion going and then people can continue to work on bringing in other frameworks, helper shell scripts (which I have some though I'll probably keep it simple to start), as well as other catalogs (I know the Nessie folks are likely very interested in integrating Nessie into the demo environment).
   
   Since my initial work on this is mostly done, I will try to push by early next week and then we can go from there!
   
   @rymurr mentioned that he and some of the Nessie folks have a pretty decent setup currently for a notebook environment that can run on Google's colab w/o extra infrastructure.
   
   My approach admittedly is more bare bones / lowest common denominator, and uses a local environment, so I will push what I have by early next week and then we can collaborate from there!
   
   cc @rdblue @rymurr @flyrain @RussellSpitzer @aokolnychyi  @nastra @jasonhughes248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kingeasternsun commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kingeasternsun commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-958774636


   Hello everyone, Is any basic docker-compose file We can try?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
rymurr commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-854605694


   Thanks for starting this @kbendick! Super useful! you can find a short description of what we do in this comment: https://github.com/apache/iceberg/pull/2674#issuecomment-854568624 
   
   Our goal w/ our demos was to have zero extra set-up. So a notebook is hosted on google colab and it locally starts spark and needed infra. Admittedly its not distributed and it wouldn't be easy to get Hive started in that way. @snazy also [added testing](https://github.com/projectnessie/nessie-demos/tree/main/notebook-tests) to our demo notebooks which has turned into a great pre-release smoke test
   
   Ive seen some pretty involved docker and docker-compose demos that are cool but require a dozen docker images to be started locally. My request is that a new user don't have to start up half the world just to see how iceberg works, however maybe I am unique in that I find those types of demos frustrating. Maybe there is room for both too: quick, short, focused demos and a bigger integrated one to get a feel for waht its like to run in prod


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vajaw commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
vajaw commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-993298612


   Hello everyone, Is any basic docker-compose file We can try?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dremio-brock commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
dremio-brock commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-993948840


   If you want to test out my docker-compose script here is a link. https://github.com/dremio-brock/IcebergDremio This will spin up hive-metastore, minio, jupyter with spark and dremio. You should be able to use a variety of supported catalogs and storage platforms. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kingeasternsun commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kingeasternsun commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-958774636


   Hello everyone, Is any basic docker-compose file We can try?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bambrow commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
bambrow commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-870350621


   Hello everyone, do we have an update or a working branch on this? I'm currently trying to dockerize this, and would like to know if there's already a docker demo in progress. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kingeasternsun commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kingeasternsun commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-958774636


   Hello everyone, Is any basic docker-compose file We can try?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bambrow commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
bambrow commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-870350621


   Hello everyone, do we have an update or a working branch on this? I'm currently trying to dockerize this, and would like to know if there's already a docker demo in progress. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-854604651


   Also cc @raptond and @holdenk who might have some interest as well 🙂.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-854918867


   > Ive seen some pretty involved docker and docker-compose demos that are cool but require a dozen docker images to be started locally. My request is that a new user don't have to start up half the world just to see how iceberg works, however maybe I am unique in that I find those types of demos frustrating. Maybe there is room for both too: quick, short, focused demos and a bigger integrated one to get a feel for waht its like to run in prod
   
   I agree. When myself and @rdblue were discussing this a while back, we wanted to keep things as simple as possible, particularly for the demo. So ideally just the bare mininum.
   
   I know there are issues when using derby in memory (though possibly those have been corrected).
   
   The setup I have is definitely more of a dev environment in that sense (3 or 4 images), but I'll see what I can do about getting it paired down. Once I've pushed the stuff I have, we can go from there. I'm by no means married to the stuff I have currently, but do think it would be a good starting off point.
   
   I'll also take a look at the smoke tests etc in yours and see what can be done.
   
   I agree that it might make sense to have both: something simple that people can just stand up without any nonsense, and then execute Iceberg commands, as well as something that might be a bit more full featured (e.g. with Hive, etc) that people can look to to get a basic understanding of a slightly more full setup. It would also allow for showing people in a more practical sense the various catalogs (i.e. something that runs with the Glue Catalog as well, etc).
   
   Perhaps some of it might be better suited for the wiki too, where a base setup that's slightly more complex might be desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kingeasternsun commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kingeasternsun commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-958774636


   Hello everyone, Is any basic docker-compose file We can try?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-854918867


   > Ive seen some pretty involved docker and docker-compose demos that are cool but require a dozen docker images to be started locally. My request is that a new user don't have to start up half the world just to see how iceberg works, however maybe I am unique in that I find those types of demos frustrating. Maybe there is room for both too: quick, short, focused demos and a bigger integrated one to get a feel for waht its like to run in prod
   
   I agree. When myself and @rdblue were discussing this a while back, we wanted to keep things as simple as possible, particularly for the demo. So ideally just the bare mininum (a runtime, even if not distributed, and a catalog - ideally the simplest possible).
   
   I believe there are issues when using derby in memory (though possibly those have been corrected and I only recently reencountered them when using an older iceberg version).
   
   The setup I have is definitely more of a dev environment in that sense (3 or 4 images), but I'll see what I can do about getting it paired down. Once I've pushed the stuff I have early next week, we can go from there. I'm by no means married to the stuff I have currently, but do think it would be a good starting off point.
   
   I'll also take a look at the smoke tests etc in yours and see what can be done to help simplify mine.
   
   I agree that it might make sense to have both: something simple that people can just stand up without any nonsense, and then execute the basic Iceberg commands to get a feeling for what it's all about, as well as something that might be a bit more full featured (e.g. with Hive, etc) that people can look to to get an understanding of a more full setup. It would also allow for showing people in a more practical sense the various catalogs (i.e. something that runs with the Glue Catalog as well, etc).
   
   Perhaps some of it might be better suited for the wiki too, where a base setup that's slightly more complex might be desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-854604098


   @rymurr and I had some initial discussion in this ticket https://github.com/apache/iceberg/pull/2674, where there are also links to some of the stuff that he and some of the Nessie folks have put together that runs a notebook for the basics.
   
   So there will likely be some overlap, but at least from a first pass, there's less overlap than one might think. Mine runs entirely locally via docker-compose (HDFS, Hive Metastore backed by Postgres, Spark, Zookeeper, and WIP on Kafka for streaming) and has been really useful for mounting in my local iceberg codebase to test things without having to publish jars beyond maven local repo. Theirs also uses docker-compose, but runs via notebooks (which I think is great). There's is more Nessie focused, but has the added advantage of being able to run on Google code lab without the need for your own infra as local docker development can definitely slow down a laptop.
   
   Please feel free to comment here and I will prioritize pushing what I have (I just need to remove some internal stuff) so that we can then iterate from there or regroup and come up with a better design. I'm sure given the large number of components that this will be an on going effort. I would love to champion this as I do a lot of container work already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
flyrain commented on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-860884196


   Hi Kyle, thanks for bringing this up. It would be super helpful for beginners and anyone who want to try Iceberg for the first time. Agreed to start with a simple one. Just to brainstorm ideas:
   1. Even just a runtime without any catalog still be helpful for people who run Iceberg first time. 
   2. The runtime with catalog would be closer to the real use case. 
   3. The runtime with catalog and notebook.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on issue #2676: Create a docker based demo environment for user testing and community onboarding

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on issue #2676:
URL: https://github.com/apache/iceberg/issues/2676#issuecomment-854604098


   @rymurr and I had some initial discussion in this PR https://github.com/apache/iceberg/pull/2674, where there are also links to some of the stuff that he and some of the Nessie folks have put together that runs a notebook for the basics.
   
   So there will likely be some overlap, but at least from a first pass, there's less overlap than one might think. Mine runs entirely locally via docker-compose (HDFS, Hive Metastore backed by Postgres, Spark, Zookeeper, and WIP on Kafka for streaming) and has been really useful for mounting in my local iceberg codebase to test things without having to publish jars beyond maven local repo. Theirs also uses docker-compose, but runs via notebooks (which I think is great). There's is more Nessie focused, but has the added advantage of being able to run on Google code lab without the need for your own infra as local docker development can definitely slow down a laptop.
   
   Please feel free to comment here and I will prioritize pushing what I have (I just need to remove some internal stuff) so that we can then iterate from there or regroup and come up with a better design. I'm sure given the large number of components that this will be an on going effort. I would love to champion this as I do a lot of container work already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org