You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Saulius Pakalka <sa...@oxylabs.io.INVALID> on 2022/05/27 11:32:10 UTC
Problem with partitioned table creation in scala
Hi,
I am trying to create partitioned iceberg table using scala code below based on example in docs.
df_c.writeTo(output_table)
.partitionBy(days(col("last_updated")))
.createOrReplace()
However, this code does not compile and throws two errors:
value partitionBy is not a member of org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
[error] possible cause: maybe a semicolon is missing before `value partitionBy'?
[error] .partitionBy(days(col("last_updated")))
[error] ^
[error] not found: value days
[error] .partitionBy(days(col("last_updated")))
[error] ^
[error] two errors found
Not sure where to look for a problem. Any advice appreciated.
Best regards,
Saulius Pakalka
Re: Problem with partitioned table creation in scala
Posted by Saulius Pakalka <sa...@oxylabs.io.INVALID>.
Thanks. The latest example clarifies a few things.
Saulius Pakalka
> On 2022-05-27, at 23:27, Wing Yew Poon <wy...@cloudera.com.invalid> wrote:
>
> The partitionedBy typo in the doc is already fixed in the master branch of the Iceberg repo.
> I filed a PR to add `using("iceberg")` to the `writeTo` examples for creating a table (if you want to create an *Iceberg* table).
>
> On Fri, May 27, 2022 at 12:58 PM Wing Yew Poon <wypoon@cloudera.com <ma...@cloudera.com>> wrote:
> One other note:
> When creating the table, you need `using("iceberg")`. The example should read
>
> data.writeTo("prod.db.table")
> .using("iceberg")
> .tableProperty("write.format.default", "orc")
> .partitionedBy($"level", days($"ts"))
> .createOrReplace()
>
> - Wing Yew
>
>
> On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon <wypoon@cloudera.com <ma...@cloudera.com>> wrote:
> That is a typo in the sample code. The doc itself (https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables <https://www.google.com/url?q=https://iceberg.apache.org/docs/latest/spark-writes/%23creating-tables&source=gmail-imap&ust=1654288075000000&usg=AOvVaw3Z4D71Qdmxh8BKvW1XzJrg>) says:
> "Create and replace operations support table configuration methods, like partitionedBy and tableProperty"
> You could also have looked up the API in Spark documentation:
> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html <https://www.google.com/url?q=https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html&source=gmail-imap&ust=1654288075000000&usg=AOvVaw1CUcrwqHqIYkfYSkjkRZ33>
> There you would have found that the method is partitionedBy, not partitionBy.
>
> - Wing Yew
>
>
> On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka <sa...@oxylabs.io.invalid> wrote:
> Hi,
>
> I am trying to create partitioned iceberg table using scala code below based on example in docs.
> df_c.writeTo(output_table)
> .partitionBy(days(col("last_updated")))
> .createOrReplace()
> However, this code does not compile and throws two errors:
>
> value partitionBy is not a member of org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
> [error] possible cause: maybe a semicolon is missing before `value partitionBy'?
> [error] .partitionBy(days(col("last_updated")))
> [error] ^
> [error] not found: value days
> [error] .partitionBy(days(col("last_updated")))
> [error] ^
> [error] two errors found
>
> Not sure where to look for a problem. Any advice appreciated.
>
> Best regards,
>
> Saulius Pakalka
>
Re: Problem with partitioned table creation in scala
Posted by Wing Yew Poon <wy...@cloudera.com.INVALID>.
The partitionedBy typo in the doc is already fixed in the master branch of
the Iceberg repo.
I filed a PR to add `using("iceberg")` to the `writeTo` examples for
creating a table (if you want to create an *Iceberg* table).
On Fri, May 27, 2022 at 12:58 PM Wing Yew Poon <wy...@cloudera.com> wrote:
> One other note:
> When creating the table, you need `using("iceberg")`. The example should
> read
>
> data.writeTo("prod.db.table")
> .using("iceberg")
> .tableProperty("write.format.default", "orc")
> .partitionedBy($"level", days($"ts"))
> .createOrReplace()
>
> - Wing Yew
>
>
> On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon <wy...@cloudera.com>
> wrote:
>
>> That is a typo in the sample code. The doc itself (
>> https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables)
>> says:
>> "Create and replace operations support table configuration methods, like
>> partitionedBy and tableProperty"
>> You could also have looked up the API in Spark documentation:
>>
>> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html
>> There you would have found that the method is partitionedBy, not
>> partitionBy.
>>
>> - Wing Yew
>>
>>
>> On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka
>> <sa...@oxylabs.io.invalid> wrote:
>>
>>> Hi,
>>>
>>> I am trying to create partitioned iceberg table using scala code below
>>> based on example in docs.
>>>
>>> df_c.writeTo(output_table)
>>> .partitionBy(days(col("last_updated")))
>>> .createOrReplace()
>>>
>>> However, this code does not compile and throws two errors:
>>>
>>> value partitionBy is not a member of
>>> org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
>>> [error] possible cause: maybe a semicolon is missing before `value
>>> partitionBy'?
>>> [error] .partitionBy(days(col("last_updated")))
>>> [error] ^
>>> [error] not found: value days
>>> [error] .partitionBy(days(col("last_updated")))
>>> [error] ^
>>> [error] two errors found
>>>
>>> Not sure where to look for a problem. Any advice appreciated.
>>>
>>> Best regards,
>>>
>>> Saulius Pakalka
>>>
>>>
Re: Problem with partitioned table creation in scala
Posted by Wing Yew Poon <wy...@cloudera.com.INVALID>.
One other note:
When creating the table, you need `using("iceberg")`. The example should
read
data.writeTo("prod.db.table")
.using("iceberg")
.tableProperty("write.format.default", "orc")
.partitionedBy($"level", days($"ts"))
.createOrReplace()
- Wing Yew
On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon <wy...@cloudera.com> wrote:
> That is a typo in the sample code. The doc itself (
> https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables)
> says:
> "Create and replace operations support table configuration methods, like
> partitionedBy and tableProperty"
> You could also have looked up the API in Spark documentation:
>
> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html
> There you would have found that the method is partitionedBy, not
> partitionBy.
>
> - Wing Yew
>
>
> On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka
> <sa...@oxylabs.io.invalid> wrote:
>
>> Hi,
>>
>> I am trying to create partitioned iceberg table using scala code below
>> based on example in docs.
>>
>> df_c.writeTo(output_table)
>> .partitionBy(days(col("last_updated")))
>> .createOrReplace()
>>
>> However, this code does not compile and throws two errors:
>>
>> value partitionBy is not a member of
>> org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
>> [error] possible cause: maybe a semicolon is missing before `value
>> partitionBy'?
>> [error] .partitionBy(days(col("last_updated")))
>> [error] ^
>> [error] not found: value days
>> [error] .partitionBy(days(col("last_updated")))
>> [error] ^
>> [error] two errors found
>>
>> Not sure where to look for a problem. Any advice appreciated.
>>
>> Best regards,
>>
>> Saulius Pakalka
>>
>>
Re: Problem with partitioned table creation in scala
Posted by Wing Yew Poon <wy...@cloudera.com.INVALID>.
That is a typo in the sample code. The doc itself (
https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables) says:
"Create and replace operations support table configuration methods, like
partitionedBy and tableProperty"
You could also have looked up the API in Spark documentation:
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html
There you would have found that the method is partitionedBy, not
partitionBy.
- Wing Yew
On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka
<sa...@oxylabs.io.invalid> wrote:
> Hi,
>
> I am trying to create partitioned iceberg table using scala code below
> based on example in docs.
>
> df_c.writeTo(output_table)
> .partitionBy(days(col("last_updated")))
> .createOrReplace()
>
> However, this code does not compile and throws two errors:
>
> value partitionBy is not a member of
> org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
> [error] possible cause: maybe a semicolon is missing before `value
> partitionBy'?
> [error] .partitionBy(days(col("last_updated")))
> [error] ^
> [error] not found: value days
> [error] .partitionBy(days(col("last_updated")))
> [error] ^
> [error] two errors found
>
> Not sure where to look for a problem. Any advice appreciated.
>
> Best regards,
>
> Saulius Pakalka
>
>