You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "haydenflinner (via GitHub)" <gi...@apache.org> on 2023/02/02 22:15:26 UTC

[GitHub] [iceberg] haydenflinner commented on issue #2040: Partial data ingestion to Iceberg in failing with Spark 3.0.x

haydenflinner commented on issue #2040:
URL: https://github.com/apache/iceberg/issues/2040#issuecomment-1414446899

   Same thing here, happening whether I use INSERT INTO or the dataframe API. How annoying. Is there really no solution besides messing with the dataframe schema to ensure it has the same number of columns as the iceberg table? Seems annoying to evolve the table this way.
   
   ```
       # spark.sql(
       #     f"""CREATE OR REPLACE TEMPORARY VIEW myview USING parquet
       #             OPTIONS (path "{path}")"""
       # )
       # log.info("calling-insert")
       # spark.sql(f"INSERT INTO {tablename}({', '.join(df.columns)}) SELECT * FROM myview")
   
   ---> leads to:
       # Table columns: 'server_name', 'backed_up_path', 'backed_up_filesize', 'num_lines', 'backed_up_ts', 'start_ts', 'end_ts', 'first_x_ts', 'last_x_ts'
       # Data columns: 'server_name', 'backed_up_path', 'backed_up_filesize'
   ```
   or
   ```
       df.to_parquet(path, index=False, allow_truncated_timestamps=True, coerce_timestamps='us')
       spark = _get_spark()
       spark.sql("use dev_catalog")
       sdf = spark.read.parquet(path)
       sdf.writeTo(f"dev_catalog.{tablename}").append()
   --> leads to:
   AnalysisException: Cannot write incompatible data to table 'dev_catalog.logfiles':
   - Cannot write nullable values to non-null column 'backed_up_path'
   - Cannot find data for output column 'num_lines'
   - Cannot find data for output column 'backed_up_ts'
   - Cannot find data for output column 'start_ts'
   - Cannot find data for output column 'end_ts'
   - Cannot find data for output column 'first_x_ts'
   - Cannot find data for output column 'last_x_ts'
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org