You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Kengo Seki (Jira)" <ji...@apache.org> on 2021/01/20 22:57:00 UTC

[jira] [Created] (GOBBLIN-1367) Fix PostgresqlExtractor's unnecessary string replacement

Kengo Seki created GOBBLIN-1367:
-----------------------------------

             Summary: Fix PostgresqlExtractor's unnecessary string replacement
                 Key: GOBBLIN-1367
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1367
             Project: Apache Gobblin
          Issue Type: Bug
          Components: gobblin-sql
            Reporter: Kengo Seki
            Assignee: Shirshanka Das


I tried to run PostgreSQL Source/Extractor with the following minimal config,

{code}
$ psql
psql (12.5 (Ubuntu 12.5-0ubuntu0.20.04.1))
Type "help" for help.

sekikn=# \c
You are now connected to database "sekikn" as user "sekikn".
sekikn=# \d
       List of relations
 Schema | Name | Type  | Owner  
--------+------+-------+--------
 public | t    | table | sekikn
(1 row)
{code}

{code}
$ cat ~/postgres_to_console.job 
source.class=org.apache.gobblin.source.extractor.extract.jdbc.PostgresqlSource

source.conn.driver=org.postgresql.Driver
source.conn.username=sekikn
source.conn.password=********
source.conn.host=localhost
source.conn.port=5432
source.conn.database=sekikn
source.querybased.schema=public
source.entity=t

extract.table.type=SNAPSHOT_APPEND

writer.builder.class=org.apache.gobblin.writer.ConsoleWriterBuilder

data.publisher.type=org.apache.gobblin.publisher.NoopPublisher
{code}

but came across the following error.

{code}
$ bin/gobblin cli run -jobName postgres_to_console -jobFile ~/postgres_to_console.job

(snip)

2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Extract metadata using JDBC
2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Executing query:select col.column_name, col.data_type, case when CHARACTER_OCTET_LENGTH is null then 0 else 0 end as length, case when NUMERIC_PRECISION is null then 0 else NUMERIC_PRECISION end as precesion, case when NUMERIC_SCALE is null then 0 else NUMERIC_SCALE end as scale, case when is_nullable='NO' then 'false' else 'true' end as nullable, '' as format, '' as comment from information_schema.COLUMNS col WHERE upper(col.table_name)=upper(?) AND upper(col.table_schema)=upper(?) order by col.ORDINAL_POSITION
2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Schema:[{"columnName":"c","dataType":{"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNullable":true,"format":"","comment":"","isUnique":false}]
2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Extract query: SELECT c FROM public.t where ('$WATERMARK')
2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Get source record count using JDBC
2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Executing query:SELECT COUNT(1) FROM publiCOUNT(1).t where (1=1)
2021-01-21 07:45:40 JST ERROR [gobblin-instance-driver] org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Failed to execute sql:SELECT COUNT(1) FROM publiCOUNT(1).t where (1=1) ;error-ERROR: syntax error at or near "."
  Position: 35
org.postgresql.util.PSQLException: ERROR: syntax error at or near "."
  Position: 35
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2477)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2190)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)
	at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)
	at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
	at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
	at org.apache.gobblin.source.jdbc.JdbcExtractor.executeSql(JdbcExtractor.java:666)
	at org.apache.gobblin.source.jdbc.JdbcExtractor.getSourceCount(JdbcExtractor.java:797)
	at org.apache.gobblin.source.extractor.extract.QueryBasedExtractor.build(QueryBasedExtractor.java:326)
	at org.apache.gobblin.source.extractor.extract.jdbc.PostgresqlSource.getExtractor(PostgresqlSource.java:47)
	at org.apache.gobblin.runtime.TaskContext.getExtractor(TaskContext.java:120)
	at org.apache.gobblin.runtime.Task.<init>(Task.java:196)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.createTaskRunnable(GobblinMultiTaskAttempt.java:485)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.access$100(GobblinMultiTaskAttempt.java:86)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$2.call(GobblinMultiTaskAttempt.java:511)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$2.call(GobblinMultiTaskAttempt.java:505)
	at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
	at com.github.rholder.retry.Retryer.call(Retryer.java:160)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.createTaskWithRetry(GobblinMultiTaskAttempt.java:505)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUnits(GobblinMultiTaskAttempt.java:421)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.run(GobblinMultiTaskAttempt.java:157)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runAndOptionallyCommitTaskAttempt(GobblinMultiTaskAttempt.java:521)
	at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUnits(GobblinMultiTaskAttempt.java:563)
	at org.apache.gobblin.runtime.local.LocalJobLauncher.runWorkUnitStream(LocalJobLauncher.java:156)
	at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:508)
	at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:164)
	at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:152)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
2021-01-21 07:45:40 JST ERROR [gobblin-instance-driver] org.apache.gobblin.source.extractor.extract.jdbc.PostgresqlSource [public_t_1611182740226_0] - Failed to prepare extractor: error - Failed to get record count; error - Failed to get source record count using JDBC; error - Failed to get source record count from database; error - null
{code}

This is because the target schema and table (publi"c".t) contains the name of column (c), so it was replaced wrongly. PostgresqlExtractor should replace only the first occurrence of the target substring here.
https://github.com/apache/incubator-gobblin/blob/release-0.15.0/gobblin-modules/gobblin-sql/src/main/java/org/apache/gobblin/source/jdbc/PostgresqlExtractor.java#L113




--
This message was sent by Atlassian Jira
(v8.3.4#803005)