You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gobblin.apache.org by Abhishek Tiwari <ab...@apache.org> on 2018/03/23 18:41:22 UTC

Re: Query about using gobblin.

I think Sudarshan answered this on Gitter. Still tagging him.

tldr: this was a known bug.

Abhishek

On Mon, Jan 29, 2018 at 9:54 AM, Rohit Kalhans <ro...@gmail.com>
wrote:

> *Hello,*
>
> *We have recently started evaluating gobblin for ingestion purpose. As it
> turns out, we specifically hit these road-blocks.*
>
> *1. When using Kafka to Kafka streaming, I keep hitting this error post
> which the ingestion streaming stops. *
>
> ERROR  [22:54:35.306] [kafka-producer-network-thread | gobblin]
> o.a.k.c.u.KafkaThread [KafkaThread.java:30]  -  Uncaught exception in
> kafka-producer-network-thread | gobblin:
> java.lang.AssertionError: The acknowledgement counter for this watermark
> went negative. Please file a bug!
> at org.apache.gobblin.writer.AcknowledgableWatermark.ack(Acknow
> ledgableWatermark.java:42)
> at org.apache.gobblin.stream.StreamEntity.ack(StreamEntity.java:82)
> at org.apache.gobblin.writer.AsyncWriterManager$1.onSuccess(
> AsyncWriterManager.java:321)
> at org.apache.gobblin.writer.AsyncWriterManager$1.onSuccess(
> AsyncWriterManager.java:316)
> at org.apache.gobblin.kafka.writer.Kafka09DataWriter$2.onComple
> tion(Kafka09DataWriter.java:124)
> at org.apache.kafka.clients.producer.internals.RecordBatch.
> done(RecordBatch.java:97)
> at org.apache.kafka.clients.producer.internals.Sender.completeB
> atch(Sender.java:299)
> at org.apache.kafka.clients.producer.internals.Sender.handlePro
> duceResponse(Sender.java:260)
> at org.apache.kafka.clients.producer.internals.Sender.access$
> 100(Sender.java:56)
> at org.apache.kafka.clients.producer.internals.Sender$1.onCompl
> ete(Sender.java:342)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
> at java.lang.Thread.run(Thread.java:748)
>
>
> *Here are the key points:*
> *1. Gobblin is running in the embedded mode.  It is a part of a  bigger
> application.*
> *2. here is the config for the job in Json format. the json is converted
> to properties internally*
>    "config":{
>   "job.lock.enabled": "false",
>   "task.executionMode": "STREAMING",
>   "gobblin.streaming.kafka.topic.key.deserializer":
> "org.apache.kafka.common.serialization.StringDeserializer",
>   "gobblin.streaming.kafka.topic.value.deserializer":
> "org.apache.kafka.common.serialization.ByteArrayDeserializer",
>   "source.class": "org.apache.gobblin.source.ext
> ractor.extract.kafka.KafkaSimpleStreamingSource",
>   "gobblin.streaming.kafka.topic.singleton": "archive",
>   "kafka.brokers": "localhost:9092",
>   "streaming.watermarkStateStore.type": "mysql",
>   "state.store.db.url": "jdbc:mysql://localhost:3306/Saboo",
>   "state.store.db.user": "saboo_root",
>   "state.store.db.password": "sabootest",
>   "streaming.watermarkStateStore.config.state.store.zk.connectString":
> "localhost:2181",
>   "streaming.watermark.commitIntervalMillis": "2000",
>   "converter.classes": "org.apache.gobblin.converter.SamplingConverter",
>   "converter.sample.ratio": "0.10",
>   "writer.builder.class": "org.apache.gobblin.kafka.writ
> er.KafkaDataWriterBuilder",
>   "writer.kafka.topic": "archive2",
>   "writer.kafka.producerConfig.bootstrap.servers": "localhost:9092",
>   "writer.kafka.producerConfig.value.serializer":
> "org.apache.kafka.common.serialization.ByteArraySerializer",
>   "data.publisher.type": "org.apache.gobblin.publisher.NoopPublisher"
> }
>
>
> *2. When using zk for state store, I keep getting this error which
> terminates the job*
>
> o.a.g.r.AbstractJobLauncher [AbstractJobLauncher.java:468]  -  Failed to
> launch and run job job_kafkaStreaming_1517248081731:
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.log4j.LogManager
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.log4j.LogManager
> at org.apache.log4j.Logger.getLogger(Logger.java:117)
> at org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.<clinit>
> (ZkCacheBaseDataAccessor.java:50)
> at org.apache.gobblin.metastore.ZkStateStore.<init>(ZkStateStore.java:89)
> at org.apache.gobblin.metastore.ZkStateStoreFactory.createState
> Store(ZkStateStoreFactory.java:38)
> at org.apache.gobblin.runtime.StateStoreBasedWatermarkStorage.<
> init>(StateStoreBasedWatermarkStorage.java:101)
> at org.apache.gobblin.runtime.TaskContext.getWatermarkStorage(T
> askContext.java:389)
> at org.apache.gobblin.runtime.Task.<init>(Task.java:234)
> at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.createTas
> kRunnable(GobblinMultiTaskAttempt.java:363)
> at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUn
> its(GobblinMultiTaskAttempt.java:344)
> at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.run(Gobbl
> inMultiTaskAttempt.java:134)
> at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runAndOpt
> ionallyCommitTaskAttempt(GobblinMultiTaskAttempt.java:369)
> at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUn
> its(GobblinMultiTaskAttempt.java:391)
> at org.apache.gobblin.runtime.local.LocalJobLauncher.runWorkUni
> tStream(LocalJobLauncher.java:142)
> at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(Abs
> tractJobLauncher.java:443)
> at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
> er$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
> at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
> er$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> --
> Cheerio!
>
> *Rohit*
>
>
>
> --
> Cheerio!
>
> *Rohit*
>