You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "MaxShevchenkoIKEA (via GitHub)" <gi...@apache.org> on 2023/03/13 13:35:49 UTC

[GitHub] [beam] MaxShevchenkoIKEA opened a new issue, #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

MaxShevchenkoIKEA opened a new issue, #25819:
URL: https://github.com/apache/beam/issues/25819

   ### What happened?
   
   We've been trying out DataFlow as an option to ingest data from JDBC Source (CloudSQL SQL Server instance) into BigQuery. And stumbled upon a rather weird behaviour that feels like a bug. Once I've started to read table that contains a column of a type numeric(20,0) nullable - job failed  with a following exception.
   ```Caused by: java.lang.IllegalStateException org.apache.beam.sdk.util.Preconditions.checkStateNotNull(Preconditions.java:452) org.apache.beam.sdk.io.jdbc.SchemaUtil.lambda$createLogicalTypeExtractor$83184fac$1(SchemaUtil.java:307) org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:380) org.apache.beam.sdk.io.jdbc.SchemaUtil$BeamRowMapper.mapRow(SchemaUtil.java:358) org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:1498)```
   
   I am sure that this is happening because of the said column - I tried querying a view of the table where column with said datatype was omitted and it worked perfectly well. So it seems that default mappers when extracting nullable numeric column value some-why perform a check for checkStateNotNull. I am only learning apache-beam so far, so I very well might be getting something wrong about proper pipeline setup or anything else so please feel free to comment and suggest changes / educational materials to read&watch. Here is a sample code of my main pipeline logic .
   ```
          PCollection<GenericRecord> records = p.apply(
               JdbcIO.<GenericRecord>readWithPartitions()
               .withDataSourceProviderFn(JdbcIO.PoolableDataSourceProvider.of(
                   JdbcIO.DataSourceConfiguration.create("com.microsoft.sqlserver.jdbc.SQLServerDriver",jdbc)
                   .withUsername(username)
                   .withPassword(password)
                   .withConnectionProperties(properties)
               ))
               .withTable("test")
               .withPartitionColumn("ID")
               .withRowOutput()
   
           //step 2 write to BigQuery
           );
           System.out.println(records.getSchema());
           records.apply(
               BigQueryIO.<GenericRecord>write()
               .to(destination)
               .withExtendedErrorInfo()
               .useBeamSchema()
               .withSchemaUpdateOptions(schema_options)
               .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
               .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
           );
           p.run();```
   
   
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MaxShevchenkoIKEA commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "MaxShevchenkoIKEA (via GitHub)" <gi...@apache.org>.
MaxShevchenkoIKEA commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1469563693

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "aromanenko-dev (via GitHub)" <gi...@apache.org>.
aromanenko-dev commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1466268965

   What is output of 
   ```
   System.out.println(records.getSchema());
   ```
   and what is actual SQL schema of that table?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn closed issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn closed issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column
URL: https://github.com/apache/beam/issues/25819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MaxShevchenkoIKEA commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "MaxShevchenkoIKEA (via GitHub)" <gi...@apache.org>.
MaxShevchenkoIKEA commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1466163695

   issue seems to also exist upon working with timestamp type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1466861902

   This looks like a bug, however I cannot reproduce it in my side, because Java alone treats decimal type as a primitive Field Type DECIMAL (not LogicalType.FixedPrecisionNumeric) and the affected codes is not run in my test.
   
   Given that other cases in SchemaUtil.createLogicalTypeExtractor all considered nulls
   
   https://github.com/apache/beam/blob/0d515df652067f54b5a34d03b276cb4ba94d0e1c/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/SchemaUtil.java#L292
   
   I think this is a valid fix. Would you be interested in adding a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MaxShevchenkoIKEA commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "MaxShevchenkoIKEA (via GitHub)" <gi...@apache.org>.
MaxShevchenkoIKEA commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1467593841

   Here is create table Statement
   ```
   SET ANSI_NULLS ON
   GO
   SET QUOTED_IDENTIFIER ON
   GO
   CREATE TABLE [dbo].[test](
   	[ID] [numeric](19, 0) NOT NULL,
   	[NULLABLE_ID] [numeric](20, 0) NULL
   ) ON [PRIMARY]
   GO
   ALTER TABLE [dbo].[test] ADD  CONSTRAINT [PK_test] PRIMARY KEY NONCLUSTERED 
   (
   	[ID] ASC
   )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
   GO
   ```
   
   And here is the output of the schema print
   
   ```
   Fields:
   Field{name=ID, description=, type=LOGICAL_TYPE<beam:logical_type:fixed_decimal:v1> NOT NULL, options={{}}}
   Field{name=NULLABLE_ID, description=, type=LOGICAL_TYPE<beam:logical_type:fixed_decimal:v1>, options={{}}}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MaxShevchenkoIKEA commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "MaxShevchenkoIKEA (via GitHub)" <gi...@apache.org>.
MaxShevchenkoIKEA commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1469743672

   Just found out that issue also seemingly affects nvarcar types


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MaxShevchenkoIKEA commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "MaxShevchenkoIKEA (via GitHub)" <gi...@apache.org>.
MaxShevchenkoIKEA commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1469862381

   For anybody else struggling with the same issue. Workaround while waiting for the fix is to cast the problematic column into varchar and replace NULLs with empty strings or whatever else.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] MaxShevchenkoIKEA commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "MaxShevchenkoIKEA (via GitHub)" <gi...@apache.org>.
MaxShevchenkoIKEA commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1467595590

   <img width="268" alt="Screenshot 2023-03-14 at 09 06 18" src="https://user-images.githubusercontent.com/114060985/224935603-c29f5de6-18b9-4fdf-9720-57d7fba2563b.png">
   And here is what is atm in the test table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1468670232

   Thanks, entered #25819 for the bug fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on issue #25819: [Bug]: JdbcIO.readWithPartitions() performs checkStateNotNull on nullable numeric column

Posted by "aromanenko-dev (via GitHub)" <gi...@apache.org>.
aromanenko-dev commented on issue #25819:
URL: https://github.com/apache/beam/issues/25819#issuecomment-1471521776

   @Abacn Thank you for taking and fixing this issue!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org