You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@phoenix.apache.org by "Istvan Toth (Jira)" <ji...@apache.org> on 2022/09/14 08:50:00 UTC

[jira] [Commented] (PHOENIX-6667) Spark3 connector requires that all columns are specified when writing

    [ https://issues.apache.org/jira/browse/PHOENIX-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603967#comment-17603967 ] 

Istvan Toth commented on PHOENIX-6667:
--------------------------------------

Additional information:

This is the test works on Spark2, but not on Spark3:
[https://github.com/apache/phoenix-connectors/blob/7e2c40f672d6ee5203dc48fc9adf95e66eea6938/phoenix5-spark3-it/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L367]

And this is the same test modified to work arounf the  Spark3 limitation.
[https://github.com/apache/phoenix-connectors/blob/7e2c40f672d6ee5203dc48fc9adf95e66eea6938/phoenix5-spark3-it/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L395]

> Spark3 connector requires that all columns are specified when writing
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-6667
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6667
>             Project: Phoenix
>          Issue Type: Bug
>          Components: connectors, spark-connector
>    Affects Versions: connectors-6.0.0
>            Reporter: Istvan Toth
>            Priority: Major
>
> For Spark 2, it was possible to omit some columns from the dataframe, the same way it is not mandatory to specify all columns when upserting via SQL.
> Spark3 has added new checks, which require that EVERY sql column is specifed in the DataFrame.
> Consequently, when using the current API, writing will fail unless you specify all columns.
> This is a loss of functionality WRT Phoenix (and other SQL datastores) compared to Spark2.
> I don't think that we can do anything from the Phoenix side, just documenting the regression here.
> Maybe future Spark versions will make this configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)