You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Christine Mathiesen <t-...@hotels.com.INVALID> on 2019/11/19 13:18:45 UTC

'Examples' code contribution

Hello!

Recently, I’ve been researching Iceberg with the goal of developing some simple code exemplifying how to use the Iceberg Java API. The goal was to share this internally with developers along with information we’ve gained about Iceberg to start discussions on whether we could use Iceberg in our systems. On reviewing the documentation and code we thought this could be useful for anyone interested in learning more about Iceberg so we would like to open source it.  We noticed that Iceberg has a folder for examples (https://github.com/apache/incubator-iceberg/tree/master/examples) - there isn’t much there right now but it could be a good location for our examples and documentation.

Our project is currently structured as many small JUnit tests that target the different functionality of Iceberg (such as the reading/writing of partitioned/unpartitioned tables, schema evolution, time travel etc). We went for this approach so we could use it as a sort of quickstart guide to using Iceberg with different use cases in mind.

The code we have currently focuses mainly on using HadoopTables with Spark (in Java) and contains tests that follow this sort of pattern:

@Test
  public void writeToTableFromFile() {
    Dataset<Row> df = spark.read().json(dataLocation + "/employees.json");

    df.select("name", "salary").write()
      .format("iceberg")
      .mode("append")
      .save(tableLocation.toString());

    table.refresh();

    df.createOrReplaceTempView("table");

    Dataset<Row> sqlDF = spark.sql("select * from table");
    assertEquals(sqlDF.count(), 10);
}

Could the developers on the project let us know if they think the above would be a useful contribution and if so, what the next steps would be? We’re happy to answer any questions and provide more info etc.

Thank you and all the best,

Christine Mathiesen
Software Development Intern
BDP – Hotels.com
Expedia Group

Re: 'Examples' code contribution

Posted by Christine Mathiesen <t-...@hotels.com.INVALID>.

Hey Ryan,

That’s great, I’ll get started on a PR right away!
Thanks 😊

From: Ryan Blue <rb...@netflix.com.INVALID>
Reply to: "dev@iceberg.apache.org" <de...@iceberg.apache.org>, "rblue@netflix.com" <rb...@netflix.com>
Date: Tuesday, 19 November 2019 at 19:00
To: Iceberg Dev List <de...@iceberg.apache.org>
Subject: Re: 'Examples' code contribution

Hi Christine,

It would be great for you to submit your code examples! I think that would be really helpful for other people as well.

For some things, it might also be a good idea to update the documentation on the ASF site, iceberg.apache.org<http://iceberg.apache.org>. The source for the site is in the `site` folder in github, if you think there are missing examples that would be beneficial to have on the site.

On Tue, Nov 19, 2019 at 5:19 AM Christine Mathiesen <t-...@hotels.com.invalid> wrote:
Hello!

Recently, I’ve been researching Iceberg with the goal of developing some simple code exemplifying how to use the Iceberg Java API. The goal was to share this internally with developers along with information we’ve gained about Iceberg to start discussions on whether we could use Iceberg in our systems. On reviewing the documentation and code we thought this could be useful for anyone interested in learning more about Iceberg so we would like to open source it.  We noticed that Iceberg has a folder for examples (https://github.com/apache/incubator-iceberg/tree/master/examples) - there isn’t much there right now but it could be a good location for our examples and documentation.

Our project is currently structured as many small JUnit tests that target the different functionality of Iceberg (such as the reading/writing of partitioned/unpartitioned tables, schema evolution, time travel etc). We went for this approach so we could use it as a sort of quickstart guide to using Iceberg with different use cases in mind.

The code we have currently focuses mainly on using HadoopTables with Spark (in Java) and contains tests that follow this sort of pattern:

@Test
  public void writeToTableFromFile() {
    Dataset<Row> df = spark.read().json(dataLocation + "/employees.json");

    df.select("name", "salary").write()
      .format("iceberg")
      .mode("append")
      .save(tableLocation.toString());

    table.refresh();

    df.createOrReplaceTempView("table");

    Dataset<Row> sqlDF = spark.sql("select * from table");
    assertEquals(sqlDF.count(), 10);
}

Could the developers on the project let us know if they think the above would be a useful contribution and if so, what the next steps would be? We’re happy to answer any questions and provide more info etc.

Thank you and all the best,

Christine Mathiesen
Software Development Intern
BDP – Hotels.com
Expedia Group

--
Ryan Blue
Software Engineer
Netflix

Re: 'Examples' code contribution

Posted by Christine Mathiesen <t-...@hotels.com.INVALID>.

Hi!

Just wanted to share that we’ve opened up a PR for adding the examples mentioned in this thread: https://github.com/apache/incubator-iceberg/pull/678 :)

Have a great day!
Christine

From: Ryan Blue <rb...@netflix.com.INVALID>
Reply to: "dev@iceberg.apache.org" <de...@iceberg.apache.org>, "rblue@netflix.com" <rb...@netflix.com>
Date: Tuesday, 19 November 2019 at 19:00
To: Iceberg Dev List <de...@iceberg.apache.org>
Subject: Re: 'Examples' code contribution

Hi Christine,

It would be great for you to submit your code examples! I think that would be really helpful for other people as well.

For some things, it might also be a good idea to update the documentation on the ASF site, iceberg.apache.org<http://iceberg.apache.org>. The source for the site is in the `site` folder in github, if you think there are missing examples that would be beneficial to have on the site.

On Tue, Nov 19, 2019 at 5:19 AM Christine Mathiesen <t-...@hotels.com.invalid> wrote:
Hello!

Recently, I’ve been researching Iceberg with the goal of developing some simple code exemplifying how to use the Iceberg Java API. The goal was to share this internally with developers along with information we’ve gained about Iceberg to start discussions on whether we could use Iceberg in our systems. On reviewing the documentation and code we thought this could be useful for anyone interested in learning more about Iceberg so we would like to open source it.  We noticed that Iceberg has a folder for examples (https://github.com/apache/incubator-iceberg/tree/master/examples) - there isn’t much there right now but it could be a good location for our examples and documentation.

Our project is currently structured as many small JUnit tests that target the different functionality of Iceberg (such as the reading/writing of partitioned/unpartitioned tables, schema evolution, time travel etc). We went for this approach so we could use it as a sort of quickstart guide to using Iceberg with different use cases in mind.

The code we have currently focuses mainly on using HadoopTables with Spark (in Java) and contains tests that follow this sort of pattern:

@Test
  public void writeToTableFromFile() {
    Dataset<Row> df = spark.read().json(dataLocation + "/employees.json");

    df.select("name", "salary").write()
      .format("iceberg")
      .mode("append")
      .save(tableLocation.toString());

    table.refresh();

    df.createOrReplaceTempView("table");

    Dataset<Row> sqlDF = spark.sql("select * from table");
    assertEquals(sqlDF.count(), 10);
}

Could the developers on the project let us know if they think the above would be a useful contribution and if so, what the next steps would be? We’re happy to answer any questions and provide more info etc.

Thank you and all the best,

Christine Mathiesen
Software Development Intern
BDP – Hotels.com
Expedia Group

--
Ryan Blue
Software Engineer
Netflix

Re: 'Examples' code contribution

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Hi Christine,

It would be great for you to submit your code examples! I think that would
be really helpful for other people as well.

For some things, it might also be a good idea to update the documentation
on the ASF site, iceberg.apache.org. The source for the site is in the
`site` folder in github, if you think there are missing examples that would
be beneficial to have on the site.

On Tue, Nov 19, 2019 at 5:19 AM Christine Mathiesen
<t-...@hotels.com.invalid> wrote:

> Hello!
>
>
>
> Recently, I’ve been researching Iceberg with the goal of developing some
> simple code exemplifying how to use the Iceberg Java API. The goal was to
> share this internally with developers along with information we’ve gained
> about Iceberg to start discussions on whether we could use Iceberg in our
> systems. On reviewing the documentation and code we thought this could be
> useful for anyone interested in learning more about Iceberg so we would
> like to open source it.  We noticed that Iceberg has a folder for examples (
> https://github.com/apache/incubator-iceberg/tree/master/examples) - there
> isn’t much there right now but it could be a good location for our examples
> and documentation.
>
>
>
> Our project is currently structured as many small JUnit tests that target
> the different functionality of Iceberg (such as the reading/writing of
> partitioned/unpartitioned tables, schema evolution, time travel etc). We
> went for this approach so we could use it as a sort of quickstart guide to
> using Iceberg with different use cases in mind.
>
>
>
> The code we have currently focuses mainly on using HadoopTables with Spark
> (in Java) and contains tests that follow this sort of pattern:
>
>
>
> @Test
>
>   public void writeToTableFromFile() {
>
>     Dataset<Row> df = spark.read().json(dataLocation + "/employees.json");
>
>
>
>     df.select("name", "salary").write()
>
>       .format("iceberg")
>
>       .mode("append")
>
>       .save(tableLocation.toString());
>
>
>
>     table.refresh();
>
>
>
>     df.createOrReplaceTempView("table");
>
>
>
>     Dataset<Row> sqlDF = spark.sql("select * from table");
>
>     assertEquals(sqlDF.count(), 10);
>
> }
>
>
>
> Could the developers on the project let us know if they think the above
> would be a useful contribution and if so, what the next steps would be?
> We’re happy to answer any questions and provide more info etc.
>
>
>
> Thank you and all the best,
>
>
>
> *Christine Mathiesen *
>
> Software Development Intern
>
> BDP – Hotels.com
>
> Expedia Group
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix