You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Saulius Pakalka <sa...@oxylabs.io.INVALID> on 2022/05/27 11:32:10 UTC

Problem with partitioned table creation in scala

Hi,

I am trying to create partitioned iceberg table using scala code below based on example in docs.
df_c.writeTo(output_table)
  .partitionBy(days(col("last_updated")))
  .createOrReplace()
However, this code does not compile and throws two errors:

value partitionBy is not a member of org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
[error] possible cause: maybe a semicolon is missing before `value partitionBy'?
[error]       .partitionBy(days(col("last_updated")))
[error]        ^
[error]  not found: value days
[error]       .partitionBy(days(col("last_updated")))
[error]                    ^
[error] two errors found

Not sure where to look for a problem. Any advice appreciated.

Best regards,

Saulius Pakalka

Re: Problem with partitioned table creation in scala

Posted by Saulius Pakalka <sa...@oxylabs.io.INVALID>.

Thanks. The latest example clarifies a few things. 

Saulius Pakalka

> On 2022-05-27, at 23:27, Wing Yew Poon <wy...@cloudera.com.invalid> wrote:
> 
> The partitionedBy typo in the doc is already fixed in the master branch of the Iceberg repo.
> I filed a PR to add `using("iceberg")` to the `writeTo` examples for creating a table (if you want to create an *Iceberg* table).
> 
> On Fri, May 27, 2022 at 12:58 PM Wing Yew Poon <wypoon@cloudera.com <ma...@cloudera.com>> wrote:
> One other note:
> When creating the table, you need `using("iceberg")`. The example should read
> 
> data.writeTo("prod.db.table")
>     .using("iceberg")
>     .tableProperty("write.format.default", "orc")
>     .partitionedBy($"level", days($"ts"))
>     .createOrReplace()
> 
> - Wing Yew
> 
> 
> On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon <wypoon@cloudera.com <ma...@cloudera.com>> wrote:
> That is a typo in the sample code. The doc itself (https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables <https://www.google.com/url?q=https://iceberg.apache.org/docs/latest/spark-writes/%23creating-tables&source=gmail-imap&ust=1654288075000000&usg=AOvVaw3Z4D71Qdmxh8BKvW1XzJrg>) says:
> "Create and replace operations support table configuration methods, like partitionedBy and tableProperty"
> You could also have looked up the API in Spark documentation:
> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html <https://www.google.com/url?q=https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html&source=gmail-imap&ust=1654288075000000&usg=AOvVaw1CUcrwqHqIYkfYSkjkRZ33>
> There you would have found that the method is partitionedBy, not partitionBy.
> 
> - Wing Yew
> 
> 
> On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka <sa...@oxylabs.io.invalid> wrote:
> Hi,
> 
> I am trying to create partitioned iceberg table using scala code below based on example in docs.
> df_c.writeTo(output_table)
>   .partitionBy(days(col("last_updated")))
>   .createOrReplace()
> However, this code does not compile and throws two errors:
> 
> value partitionBy is not a member of org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
> [error] possible cause: maybe a semicolon is missing before `value partitionBy'?
> [error]       .partitionBy(days(col("last_updated")))
> [error]        ^
> [error]  not found: value days
> [error]       .partitionBy(days(col("last_updated")))
> [error]                    ^
> [error] two errors found
> 
> Not sure where to look for a problem. Any advice appreciated.
> 
> Best regards,
> 
> Saulius Pakalka
>

Re: Problem with partitioned table creation in scala

Posted by Wing Yew Poon <wy...@cloudera.com.INVALID>.

The partitionedBy typo in the doc is already fixed in the master branch of
the Iceberg repo.
I filed a PR to add `using("iceberg")` to the `writeTo` examples for
creating a table (if you want to create an *Iceberg* table).

On Fri, May 27, 2022 at 12:58 PM Wing Yew Poon <wy...@cloudera.com> wrote:

> One other note:
> When creating the table, you need `using("iceberg")`. The example should
> read
>
> data.writeTo("prod.db.table")
>     .using("iceberg")
>     .tableProperty("write.format.default", "orc")
>     .partitionedBy($"level", days($"ts"))
>     .createOrReplace()
>
> - Wing Yew
>
>
> On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon <wy...@cloudera.com>
> wrote:
>
>> That is a typo in the sample code. The doc itself (
>> https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables)
>> says:
>> "Create and replace operations support table configuration methods, like
>> partitionedBy and tableProperty"
>> You could also have looked up the API in Spark documentation:
>>
>> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html
>> There you would have found that the method is partitionedBy, not
>> partitionBy.
>>
>> - Wing Yew
>>
>>
>> On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka
>> <sa...@oxylabs.io.invalid> wrote:
>>
>>> Hi,
>>>
>>> I am trying to create partitioned iceberg table using scala code below
>>> based on example in docs.
>>>
>>> df_c.writeTo(output_table)
>>>   .partitionBy(days(col("last_updated")))
>>>   .createOrReplace()
>>>
>>> However, this code does not compile and throws two errors:
>>>
>>> value partitionBy is not a member of
>>> org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
>>> [error] possible cause: maybe a semicolon is missing before `value
>>> partitionBy'?
>>> [error]       .partitionBy(days(col("last_updated")))
>>> [error]        ^
>>> [error]  not found: value days
>>> [error]       .partitionBy(days(col("last_updated")))
>>> [error]                    ^
>>> [error] two errors found
>>>
>>> Not sure where to look for a problem. Any advice appreciated.
>>>
>>> Best regards,
>>>
>>> Saulius Pakalka
>>>
>>>

Re: Problem with partitioned table creation in scala

Posted by Wing Yew Poon <wy...@cloudera.com.INVALID>.

One other note:
When creating the table, you need `using("iceberg")`. The example should
read

data.writeTo("prod.db.table")
    .using("iceberg")
    .tableProperty("write.format.default", "orc")
    .partitionedBy($"level", days($"ts"))
    .createOrReplace()

- Wing Yew


On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon <wy...@cloudera.com> wrote:

> That is a typo in the sample code. The doc itself (
> https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables)
> says:
> "Create and replace operations support table configuration methods, like
> partitionedBy and tableProperty"
> You could also have looked up the API in Spark documentation:
>
> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html
> There you would have found that the method is partitionedBy, not
> partitionBy.
>
> - Wing Yew
>
>
> On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka
> <sa...@oxylabs.io.invalid> wrote:
>
>> Hi,
>>
>> I am trying to create partitioned iceberg table using scala code below
>> based on example in docs.
>>
>> df_c.writeTo(output_table)
>>   .partitionBy(days(col("last_updated")))
>>   .createOrReplace()
>>
>> However, this code does not compile and throws two errors:
>>
>> value partitionBy is not a member of
>> org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
>> [error] possible cause: maybe a semicolon is missing before `value
>> partitionBy'?
>> [error]       .partitionBy(days(col("last_updated")))
>> [error]        ^
>> [error]  not found: value days
>> [error]       .partitionBy(days(col("last_updated")))
>> [error]                    ^
>> [error] two errors found
>>
>> Not sure where to look for a problem. Any advice appreciated.
>>
>> Best regards,
>>
>> Saulius Pakalka
>>
>>

Re: Problem with partitioned table creation in scala

Posted by Wing Yew Poon <wy...@cloudera.com.INVALID>.

That is a typo in the sample code. The doc itself (
https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables) says:
"Create and replace operations support table configuration methods, like
partitionedBy and tableProperty"
You could also have looked up the API in Spark documentation:
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html
There you would have found that the method is partitionedBy, not
partitionBy.

- Wing Yew

On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka
<sa...@oxylabs.io.invalid> wrote:

> Hi,
>
> I am trying to create partitioned iceberg table using scala code below
> based on example in docs.
>
> df_c.writeTo(output_table)
>   .partitionBy(days(col("last_updated")))
>   .createOrReplace()
>
> However, this code does not compile and throws two errors:
>
> value partitionBy is not a member of
> org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
> [error] possible cause: maybe a semicolon is missing before `value
> partitionBy'?
> [error]       .partitionBy(days(col("last_updated")))
> [error]        ^
> [error]  not found: value days
> [error]       .partitionBy(days(col("last_updated")))
> [error]                    ^
> [error] two errors found
>
> Not sure where to look for a problem. Any advice appreciated.
>
> Best regards,
>
> Saulius Pakalka
>
>