You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@plc4x.apache.org by Christofer Dutz <ch...@c-ware.de> on 2019/08/01 13:45:45 UTC

Re: [KAFKA] Refactoring the Kafka Connect plugin?

Hi Kai,

that document is exactly the one I'm currently using. 
What I'm currently working on is updating the current plugin to not schedule and handle the connection stuff manually, but use the scraper component of PLC4X.
Also is the current configuration not production ready and I'll be working on to make it more easily usable.

But it will definitely not hurt to have some Kafka Pro have a look at what we did and propose improvements. After all we want the thing to be rock-solid :-)

Chris



Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:

    Hi Chris,
    
    great that you will work on the connector.
    
    I am not deep technical, but if you need guidance from Kafka Connect
    experts, I can connect you to a Confluent colleague to can help with best
    practices for building the connector.
    
    For example, we have implemented a wildcard option into our MQTT Connector
    to map MQTT Topics to Kafka Topics in a more flexible way (e.g. 1000s of
    cars from different MQTT Topics can be routed into 1 Kafka Topic). This
    might also be interesting for this connector as you expect to various PLCs.
    
    This guide might also help:
    https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
    
    
    
    On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <ch...@c-ware.de>
    wrote:
    
    > Hi all,
    >
    > I am currently planning on cleaning up the Kafka Connect adapter a little
    > as this was implemented as part of a proof of concept and is still I a
    > state I wouldn’t use in production ;-)
    > But a lot has happened since then and I’m planning on making it a really
    > usable tool in the next few days.
    >
    > A lot has changed since we created the integration module QT3 2018 and I
    > would like to refactor it to use the Scraper for the heavy lifting.
    >
    > Currently a user has to provide a parameter “query” which contains a
    > comma-separated list of connection-strings with appended address. This is
    > purely unmanageable.
    >
    > I would like to make it configurable via JSON or Yaml file.
    >
    > I think it would make sense to define groups of fields that are collected
    > on one device at an equal rate. So it’s pretty similar to the scraper
    > example, however I would like to not specify the source in the job, but the
    > other way around.
    > When specifying the “sources” I would also provide which jobs should run
    > on a given collection.
    > As the connector was initially showcased in a scenario where data had to
    > be collected on a big number of PLCs with equal specs,
    > I think this is the probably most important use-case and in this it is
    > also probably more common to add new devices to collect standard data on
    > than the other way around.
    >
    > Also should we provide the means to also set per connection to which
    > kafka-topic the data should be sent to.
    > We could provide the means to set a default and make it optional however.
    > When posting to a topic we also need to provide means for partitioning, so
    > I would provide sources with an optional “name”.
    > Each message would not only have the data requested, but also the
    > source-url, source-name and the job-name with a timestamp.
    >
    > So I guess it would look something like this:
    >
    > #
    > ----------------------------------------------------------------------------
    > # Licensed to the Apache Software Foundation (ASF) under one
    > # or more contributor license agreements.  See the NOTICE file
    > # distributed with this work for additional information
    > # regarding copyright ownership.  The ASF licenses this file
    > # to you under the Apache License, Version 2.0 (the
    > # "License"); you may not use this file except in compliance
    > # with the License.  You may obtain a copy of the License at
    > #
    > #    http://www.apache.org/licenses/LICENSE-2.0
    > #
    > # Unless required by applicable law or agreed to in writing,
    > # software distributed under the License is distributed on an
    > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    > # KIND, either express or implied.  See the License for the
    > # specific language governing permissions and limitations
    > # under the License.
    > #
    > ----------------------------------------------------------------------------
    > ---
    > # Defaults used throughout all collections
    > defaults:
    >   # If not specified, all data goes to this topic (optional)
    >   default-topic: some/default
    >
    > # Defines connections to PLCs
    > sources:
    >   # Connection to a S7 device
    >  - name: machineA
    >     # PLC4X connection URL
    >     url: s7://1.2.3.4/1/1
    >     jobs:
    >       # Just references the job "s7-dashboard". All data will be published
    > to the default topic
    >       - name: s7-dashboard
    >       # References the job "s7-heartbeat", however is configures the
    > output to go to the topic "heartbeat"
    >       - name: s7-heartbeat
    >         topic: heartbeat
    >
    >   # Connection to a second S7 device
    >   - name: machineB
    >     url: s7://10.20.30.40/1/1
    >     # Sets the default topic for this connection. All jobs data will go to
    > "heartbeat"
    >     topic: heartbeat
    >     jobs:
    >       - s7-heartbeat
    >
    >   # Connection to a Beckhoff device
    >   - name: machineC
    >     url: ads://1.2.3.4.5.6
    >     topic: heartbeat
    >     jobs:
    >       - ads-heartbeat
    >
    > # Defines what should be collected how often
    > jobs:
    >   # Defines a job to collect a set of fields on s7 devices every 500ms
    >   - name: s7-dashboard
    >     scrapeRate: 500
    >     fields:
    >       # The key will be used in the Kafka message to identify this field,
    > the value here contains the PLC4X address
    >       inputPreasure: %DB.DB1.4:INT
    >       outputPreasure: %Q1:BYTE
    >       temperature: %I3:INT
    >
    >   # Defines a second job to collect a set of fields on s7 devices every
    > 1000ms
    >   - name: s7-heartbeat
    >     scrapeRate: 1000
    >     fields:
    >       active: %I0.2:BOOL
    >
    >   # Defines a third job that collects data on Beckhoff devices
    >   - name: ads-heartbeat
    >     scrapeRate: 1000
    >     fields:
    >       active: Main.running
    >
    > I think it should be self-explanatory with my comments inline.
    >
    > What do you think?
    >
    > Chris
    >
    >
    


Re: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi all,

today I managed to find the last major issues and fix them ... now my Scraper-Based Kafka Connect adapter is successfully scraping data from a PLC and pumping that to Kafka.
I have to work out some quirks however. Currently if the scrape rate is too short, there seems to be a surge of connect requests going out where the PLC simply hangs up.
And currently every field would be posted in one Kafka message, this isn't ideal as we intentionally grouped fields by job and would like to send a map of fields instead of each field on it's own.

But that's now just fine-tuning. I really hope to have it working till Thursday.

Chris



Am 15.08.19, 19:12 schrieb "Bjoern Hoeper" <ho...@ltsoft.de>:

    Hi Chris,
    No we are not using any existing solution. So you can proceed as outlined by you.
    It was just a remark for future direction.
    Greetings (now from Sylt)
    Björn
    
    Outlook für Android<https://aka.ms/ghei36> herunterladen
    
    ________________________________
    From: Christofer Dutz <ch...@c-ware.de>
    Sent: Thursday, August 15, 2019 5:26:13 PM
    To: dev@plc4x.apache.org <de...@plc4x.apache.org>
    Subject: Re: [KAFKA] Refactoring the Kafka Connect plugin?
    
    Hi Björn,
    
    But you weren't using the existing solution, we're you? So me removing that wouldn't cause you any pain. But us adding one later on, would make you happy?
    
    Chris
    
    Holen Sie sich Outlook für Android<https://aka.ms/ghei36>
    
    ________________________________
    From: Bjoern Hoeper <ho...@ltsoft.de>
    Sent: Thursday, August 15, 2019 11:30:34 AM
    To: dev@plc4x.apache.org <de...@plc4x.apache.org>
    Subject: Re: [KAFKA] Refactoring the Kafka Connect plugin?
    
    Hi everyone,
    I am on holiday currently so just a short note:
    We have a use case at our customer that would need the sink functionality to exchange data between distinct DCSes controlling different areas. In a current implementation that provides this exchange the receiving dcs is an active pull component getting the data from the remote DCS.
    I see two possible options for sink implementation. Some proxy component actively throttles writing to the DCS/PLC or the DCS pulling data itself as a custom Kafka Consumer. The only option that could be implemented independent of the control vendor addressed would be the first one.
    After my holiday I will ask our customer if I can share some details.
    Greetings from the Harbor of Rømø
    Björn
    
    Outlook für Android<https://aka.ms/ghei36> herunterladen
    
    ________________________________
    From: Julian Feinauer <j....@pragmaticminds.de>
    Sent: Tuesday, August 13, 2019 9:05:26 PM
    To: dev@plc4x.apache.org <de...@plc4x.apache.org>; megachucky@gmail.com <me...@gmail.com>
    Subject: AW: [KAFKA] Refactoring the Kafka Connect plugin?
    
    Hi,
    
    I agree with what you say.
    We use Kafka and plc4x exactly the same way in prod :)
    
    Prepaid Björn can also comment here as he also consists a Kafka based plc communication layer..
    
    Julian
    
    Von meinem Mobiltelefon gesendet
    
    
    -------- Ursprüngliche Nachricht --------
    Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
    Von: Kai Wähner
    An: dev@plc4x.apache.org
    Cc:
    
    Hi Chris,
    
    here a few thoughts:
    
    *I haven't seen a single use case of someone writing to a PLC with PLC4X *
    => This sounds similar to most IoT MQTT scenarios I see today. The main
    challenge is to get data out of machines to analyze and use it. Writing
    back (e.g. to control the systems) is the second step which comes later,
    sometimes years later - or even never if not needed.
    
    * I doubt such a scenario is ideal for the Kafka setup*
    => If you don't see it in other PLC4X projects, then I would not worry
    about it too much for Kafka integration today.
    However, I would not think too much about Kafka here. Either you see many
    use cases for Sinks in general today for PLC4X, or you don't see them yet.
    Some use Kafka, some use Java, some use Nifi or whatever.
    
    *A PLC is not able to consume a lot of data until it chokes and stops
    working*
    => This is an argument we hear often in Kafka community. Often it is *not*
    valid (and we see more and more use cases where Kafka is just used for
    transactional data like bank payments which are very low volume).
    Therefore, Kafka is not just used for big data sets. In IoT scenarios (like
    connected cars), we see more and more use cases where people also want to
    send information back to the device (e.g. alerts or recommendations to the
    driver, though, this is just a few messages, not comparable to sensor
    egress messages).
    In IoT scenarios, in almost all cases, you ingest a lot of IoT data from
    the devices / machines to other clusters (kafka, spark, cloud, AI,
    whatever), but if you send data back to the device / machine, it is just
    control data or other limited, small data sets. So a high throughput egress
    from IoT devices does not imply a high throughput ingress. Therefore, I
    recommend to not use this argument.
    
    * So in a first round I'll concentrate on implementing a robust
    Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely. If we
    find out that a Sink makes sense we'll be able to add it in a later version*
    I would recommend exactly the same. Focus on the important direction, and
    implement the other direction later (if people ask for it and share their
    use cases). Confluent did exactly the same for many other connectors, e.g.
    AWS S3 Sink or MQTT Source. Then we also built MQTT Sink and S3 Source
    later because we saw more demand for these, too.
    
    Best regards,
    Kai Waehner
    
    On Tue, Aug 13, 2019 at 3:52 PM Christofer Dutz <ch...@c-ware.de>
    wrote:
    
    > Hi all,
    >
    > so while I was working on the Kafka Sink I sort of started thinking if
    > such a thing is a good idea to have at all.
    > A Kafka system would be able to provide a vast stream of data that would
    > have to be routed to a PLCs.
    > On the one side, I haven't seen a single usecase of someone writing to a
    > PLC with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
    > A PLC is not able to consume a lot of data until it chokes and stops
    > working.
    >
    > So in a first round I'll concentrate on implementing a robust
    > Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely.
    >
    > If we find out that a Sink makes sense we'll be able to add it in a later
    > version.
    >
    > What do you think?
    >
    > Chris
    >
    >
    > Am 02.08.19, 11:24 schrieb "Julian Feinauer" <
    > j.feinauer@pragmaticminds.de>:
    >
    >     Hi,
    >
    >     I think that's the best way then.
    >
    >     I agree with your suggestion.
    >
    >     Julian
    >
    >     Von meinem Mobiltelefon gesendet
    >
    >
    >     -------- Ursprüngliche Nachricht --------
    >     Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
    >     Von: Christofer Dutz
    >     An: dev@plc4x.apache.org
    >     Cc:
    >
    >     Hi all,
    >
    >     Ok so it seems that Kafka only supports configuration via simple
    > unstructured maps.
    >     So if we want some hierarchical configuration like the proposed one,
    > we'll have to do it log4j.properties-style:
    >
    >     name=plc-source-test
    >     connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
    >
    >     defaults.default-topic=some/default
    >
    >     sources.machineA.url=s7://1.2.3.4/1/1
    >     sources.machineA.jobs.s7-dashboard.enabled=true
    >     sources.machineA.jobs.s7-heartbeat.enabled=true
    >     sources.machineA.jobs.s7-heartbeat.topic=heartbeat
    >
    >     sources.machineB.url=s7://10.20.30.40/1/1
    >     sources.machineB.topic=heartbeat
    >     sources.machineB.jobs.s7-heartbeat.enabled=true
    >
    >     sources.machineC.url=ads://1.2.3.4.5.6
    >     sources.machineC.topic=heartbeat
    >     sources.machineC.jobs.ads-heartbeat.enabled=true
    >
    >     jobs.s7-dashboard.scrapeRate=500
    >     jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
    >     jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
    >     jobs.s7-dashboard.fields.temperature=%I3:INT
    >
    >     jobs.s7-heartbeat.scrapeRate=1000
    >     jobs.s7-heartbeat.fields.active=%I0.2:BOOL
    >
    >     jobs.ads-heartbeat.scrapeRate=1000
    >     jobs.ads-heartbeat.active=Main.running
    >
    >
    >     Chris
    >
    >
    >
    >     Am 01.08.19, 15:46 schrieb "Christofer Dutz" <
    > christofer.dutz@c-ware.de>:
    >
    >         Hi Kai,
    >
    >         that document is exactly the one I'm currently using.
    >         What I'm currently working on is updating the current plugin to
    > not schedule and handle the connection stuff manually, but use the scraper
    > component of PLC4X.
    >         Also is the current configuration not production ready and I'll be
    > working on to make it more easily usable.
    >
    >         But it will definitely not hurt to have some Kafka Pro have a look
    > at what we did and propose improvements. After all we want the thing to be
    > rock-solid :-)
    >
    >         Chris
    >
    >
    >
    >         Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
    >
    >             Hi Chris,
    >
    >             great that you will work on the connector.
    >
    >             I am not deep technical, but if you need guidance from Kafka
    > Connect
    >             experts, I can connect you to a Confluent colleague to can
    > help with best
    >             practices for building the connector.
    >
    >             For example, we have implemented a wildcard option into our
    > MQTT Connector
    >             to map MQTT Topics to Kafka Topics in a more flexible way
    > (e.g. 1000s of
    >             cars from different MQTT Topics can be routed into 1 Kafka
    > Topic). This
    >             might also be interesting for this connector as you expect to
    > various PLCs.
    >
    >             This guide might also help:
    >
    > https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
    >
    >
    >
    >             On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <
    > christofer.dutz@c-ware.de>
    >             wrote:
    >
    >             > Hi all,
    >             >
    >             > I am currently planning on cleaning up the Kafka Connect
    > adapter a little
    >             > as this was implemented as part of a proof of concept and is
    > still I a
    >             > state I wouldn’t use in production ;-)
    >             > But a lot has happened since then and I’m planning on making
    > it a really
    >             > usable tool in the next few days.
    >             >
    >             > A lot has changed since we created the integration module
    > QT3 2018 and I
    >             > would like to refactor it to use the Scraper for the heavy
    > lifting.
    >             >
    >             > Currently a user has to provide a parameter “query” which
    > contains a
    >             > comma-separated list of connection-strings with appended
    > address. This is
    >             > purely unmanageable.
    >             >
    >             > I would like to make it configurable via JSON or Yaml file.
    >             >
    >             > I think it would make sense to define groups of fields that
    > are collected
    >             > on one device at an equal rate. So it’s pretty similar to
    > the scraper
    >             > example, however I would like to not specify the source in
    > the job, but the
    >             > other way around.
    >             > When specifying the “sources” I would also provide which
    > jobs should run
    >             > on a given collection.
    >             > As the connector was initially showcased in a scenario where
    > data had to
    >             > be collected on a big number of PLCs with equal specs,
    >             > I think this is the probably most important use-case and in
    > this it is
    >             > also probably more common to add new devices to collect
    > standard data on
    >             > than the other way around.
    >             >
    >             > Also should we provide the means to also set per connection
    > to which
    >             > kafka-topic the data should be sent to.
    >             > We could provide the means to set a default and make it
    > optional however.
    >             > When posting to a topic we also need to provide means for
    > partitioning, so
    >             > I would provide sources with an optional “name”.
    >             > Each message would not only have the data requested, but
    > also the
    >             > source-url, source-name and the job-name with a timestamp.
    >             >
    >             > So I guess it would look something like this:
    >             >
    >             > #
    >             >
    > ----------------------------------------------------------------------------
    >             > # Licensed to the Apache Software Foundation (ASF) under one
    >             > # or more contributor license agreements.  See the NOTICE
    > file
    >             > # distributed with this work for additional information
    >             > # regarding copyright ownership.  The ASF licenses this file
    >             > # to you under the Apache License, Version 2.0 (the
    >             > # "License"); you may not use this file except in compliance
    >             > # with the License.  You may obtain a copy of the License at
    >             > #
    >             > #    http://www.apache.org/licenses/LICENSE-2.0
    >             > #
    >             > # Unless required by applicable law or agreed to in writing,
    >             > # software distributed under the License is distributed on an
    >             > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    >             > # KIND, either express or implied.  See the License for the
    >             > # specific language governing permissions and limitations
    >             > # under the License.
    >             > #
    >             >
    > ----------------------------------------------------------------------------
    >             > ---
    >             > # Defaults used throughout all collections
    >             > defaults:
    >             >   # If not specified, all data goes to this topic (optional)
    >             >   default-topic: some/default
    >             >
    >             > # Defines connections to PLCs
    >             > sources:
    >             >   # Connection to a S7 device
    >             >  - name: machineA
    >             >     # PLC4X connection URL
    >             >     url: s7://1.2.3.4/1/1
    >             >     jobs:
    >             >       # Just references the job "s7-dashboard". All data
    > will be published
    >             > to the default topic
    >             >       - name: s7-dashboard
    >             >       # References the job "s7-heartbeat", however is
    > configures the
    >             > output to go to the topic "heartbeat"
    >             >       - name: s7-heartbeat
    >             >         topic: heartbeat
    >             >
    >             >   # Connection to a second S7 device
    >             >   - name: machineB
    >             >     url: s7://10.20.30.40/1/1
    >             >     # Sets the default topic for this connection. All jobs
    > data will go to
    >             > "heartbeat"
    >             >     topic: heartbeat
    >             >     jobs:
    >             >       - s7-heartbeat
    >             >
    >             >   # Connection to a Beckhoff device
    >             >   - name: machineC
    >             >     url: ads://1.2.3.4.5.6
    >             >     topic: heartbeat
    >             >     jobs:
    >             >       - ads-heartbeat
    >             >
    >             > # Defines what should be collected how often
    >             > jobs:
    >             >   # Defines a job to collect a set of fields on s7 devices
    > every 500ms
    >             >   - name: s7-dashboard
    >             >     scrapeRate: 500
    >             >     fields:
    >             >       # The key will be used in the Kafka message to
    > identify this field,
    >             > the value here contains the PLC4X address
    >             >       inputPreasure: %DB.DB1.4:INT
    >             >       outputPreasure: %Q1:BYTE
    >             >       temperature: %I3:INT
    >             >
    >             >   # Defines a second job to collect a set of fields on s7
    > devices every
    >             > 1000ms
    >             >   - name: s7-heartbeat
    >             >     scrapeRate: 1000
    >             >     fields:
    >             >       active: %I0.2:BOOL
    >             >
    >             >   # Defines a third job that collects data on Beckhoff
    > devices
    >             >   - name: ads-heartbeat
    >             >     scrapeRate: 1000
    >             >     fields:
    >             >       active: Main.running
    >             >
    >             > I think it should be self-explanatory with my comments
    > inline.
    >             >
    >             > What do you think?
    >             >
    >             > Chris
    >             >
    >             >
    >
    >
    >
    >
    >
    >
    >
    


Re: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Bjoern Hoeper <ho...@ltsoft.de>.
Hi Chris,
No we are not using any existing solution. So you can proceed as outlined by you.
It was just a remark for future direction.
Greetings (now from Sylt)
Björn

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Christofer Dutz <ch...@c-ware.de>
Sent: Thursday, August 15, 2019 5:26:13 PM
To: dev@plc4x.apache.org <de...@plc4x.apache.org>
Subject: Re: [KAFKA] Refactoring the Kafka Connect plugin?

Hi Björn,

But you weren't using the existing solution, we're you? So me removing that wouldn't cause you any pain. But us adding one later on, would make you happy?

Chris

Holen Sie sich Outlook für Android<https://aka.ms/ghei36>

________________________________
From: Bjoern Hoeper <ho...@ltsoft.de>
Sent: Thursday, August 15, 2019 11:30:34 AM
To: dev@plc4x.apache.org <de...@plc4x.apache.org>
Subject: Re: [KAFKA] Refactoring the Kafka Connect plugin?

Hi everyone,
I am on holiday currently so just a short note:
We have a use case at our customer that would need the sink functionality to exchange data between distinct DCSes controlling different areas. In a current implementation that provides this exchange the receiving dcs is an active pull component getting the data from the remote DCS.
I see two possible options for sink implementation. Some proxy component actively throttles writing to the DCS/PLC or the DCS pulling data itself as a custom Kafka Consumer. The only option that could be implemented independent of the control vendor addressed would be the first one.
After my holiday I will ask our customer if I can share some details.
Greetings from the Harbor of Rømø
Björn

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Julian Feinauer <j....@pragmaticminds.de>
Sent: Tuesday, August 13, 2019 9:05:26 PM
To: dev@plc4x.apache.org <de...@plc4x.apache.org>; megachucky@gmail.com <me...@gmail.com>
Subject: AW: [KAFKA] Refactoring the Kafka Connect plugin?

Hi,

I agree with what you say.
We use Kafka and plc4x exactly the same way in prod :)

Prepaid Björn can also comment here as he also consists a Kafka based plc communication layer..

Julian

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
Von: Kai Wähner
An: dev@plc4x.apache.org
Cc:

Hi Chris,

here a few thoughts:

*I haven't seen a single use case of someone writing to a PLC with PLC4X *
=> This sounds similar to most IoT MQTT scenarios I see today. The main
challenge is to get data out of machines to analyze and use it. Writing
back (e.g. to control the systems) is the second step which comes later,
sometimes years later - or even never if not needed.

* I doubt such a scenario is ideal for the Kafka setup*
=> If you don't see it in other PLC4X projects, then I would not worry
about it too much for Kafka integration today.
However, I would not think too much about Kafka here. Either you see many
use cases for Sinks in general today for PLC4X, or you don't see them yet.
Some use Kafka, some use Java, some use Nifi or whatever.

*A PLC is not able to consume a lot of data until it chokes and stops
working*
=> This is an argument we hear often in Kafka community. Often it is *not*
valid (and we see more and more use cases where Kafka is just used for
transactional data like bank payments which are very low volume).
Therefore, Kafka is not just used for big data sets. In IoT scenarios (like
connected cars), we see more and more use cases where people also want to
send information back to the device (e.g. alerts or recommendations to the
driver, though, this is just a few messages, not comparable to sensor
egress messages).
In IoT scenarios, in almost all cases, you ingest a lot of IoT data from
the devices / machines to other clusters (kafka, spark, cloud, AI,
whatever), but if you send data back to the device / machine, it is just
control data or other limited, small data sets. So a high throughput egress
from IoT devices does not imply a high throughput ingress. Therefore, I
recommend to not use this argument.

* So in a first round I'll concentrate on implementing a robust
Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely. If we
find out that a Sink makes sense we'll be able to add it in a later version*
I would recommend exactly the same. Focus on the important direction, and
implement the other direction later (if people ask for it and share their
use cases). Confluent did exactly the same for many other connectors, e.g.
AWS S3 Sink or MQTT Source. Then we also built MQTT Sink and S3 Source
later because we saw more demand for these, too.

Best regards,
Kai Waehner

On Tue, Aug 13, 2019 at 3:52 PM Christofer Dutz <ch...@c-ware.de>
wrote:

> Hi all,
>
> so while I was working on the Kafka Sink I sort of started thinking if
> such a thing is a good idea to have at all.
> A Kafka system would be able to provide a vast stream of data that would
> have to be routed to a PLCs.
> On the one side, I haven't seen a single usecase of someone writing to a
> PLC with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
> A PLC is not able to consume a lot of data until it chokes and stops
> working.
>
> So in a first round I'll concentrate on implementing a robust
> Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely.
>
> If we find out that a Sink makes sense we'll be able to add it in a later
> version.
>
> What do you think?
>
> Chris
>
>
> Am 02.08.19, 11:24 schrieb "Julian Feinauer" <
> j.feinauer@pragmaticminds.de>:
>
>     Hi,
>
>     I think that's the best way then.
>
>     I agree with your suggestion.
>
>     Julian
>
>     Von meinem Mobiltelefon gesendet
>
>
>     -------- Ursprüngliche Nachricht --------
>     Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
>     Von: Christofer Dutz
>     An: dev@plc4x.apache.org
>     Cc:
>
>     Hi all,
>
>     Ok so it seems that Kafka only supports configuration via simple
> unstructured maps.
>     So if we want some hierarchical configuration like the proposed one,
> we'll have to do it log4j.properties-style:
>
>     name=plc-source-test
>     connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
>
>     defaults.default-topic=some/default
>
>     sources.machineA.url=s7://1.2.3.4/1/1
>     sources.machineA.jobs.s7-dashboard.enabled=true
>     sources.machineA.jobs.s7-heartbeat.enabled=true
>     sources.machineA.jobs.s7-heartbeat.topic=heartbeat
>
>     sources.machineB.url=s7://10.20.30.40/1/1
>     sources.machineB.topic=heartbeat
>     sources.machineB.jobs.s7-heartbeat.enabled=true
>
>     sources.machineC.url=ads://1.2.3.4.5.6
>     sources.machineC.topic=heartbeat
>     sources.machineC.jobs.ads-heartbeat.enabled=true
>
>     jobs.s7-dashboard.scrapeRate=500
>     jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
>     jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
>     jobs.s7-dashboard.fields.temperature=%I3:INT
>
>     jobs.s7-heartbeat.scrapeRate=1000
>     jobs.s7-heartbeat.fields.active=%I0.2:BOOL
>
>     jobs.ads-heartbeat.scrapeRate=1000
>     jobs.ads-heartbeat.active=Main.running
>
>
>     Chris
>
>
>
>     Am 01.08.19, 15:46 schrieb "Christofer Dutz" <
> christofer.dutz@c-ware.de>:
>
>         Hi Kai,
>
>         that document is exactly the one I'm currently using.
>         What I'm currently working on is updating the current plugin to
> not schedule and handle the connection stuff manually, but use the scraper
> component of PLC4X.
>         Also is the current configuration not production ready and I'll be
> working on to make it more easily usable.
>
>         But it will definitely not hurt to have some Kafka Pro have a look
> at what we did and propose improvements. After all we want the thing to be
> rock-solid :-)
>
>         Chris
>
>
>
>         Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
>
>             Hi Chris,
>
>             great that you will work on the connector.
>
>             I am not deep technical, but if you need guidance from Kafka
> Connect
>             experts, I can connect you to a Confluent colleague to can
> help with best
>             practices for building the connector.
>
>             For example, we have implemented a wildcard option into our
> MQTT Connector
>             to map MQTT Topics to Kafka Topics in a more flexible way
> (e.g. 1000s of
>             cars from different MQTT Topics can be routed into 1 Kafka
> Topic). This
>             might also be interesting for this connector as you expect to
> various PLCs.
>
>             This guide might also help:
>
> https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
>
>
>
>             On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <
> christofer.dutz@c-ware.de>
>             wrote:
>
>             > Hi all,
>             >
>             > I am currently planning on cleaning up the Kafka Connect
> adapter a little
>             > as this was implemented as part of a proof of concept and is
> still I a
>             > state I wouldn’t use in production ;-)
>             > But a lot has happened since then and I’m planning on making
> it a really
>             > usable tool in the next few days.
>             >
>             > A lot has changed since we created the integration module
> QT3 2018 and I
>             > would like to refactor it to use the Scraper for the heavy
> lifting.
>             >
>             > Currently a user has to provide a parameter “query” which
> contains a
>             > comma-separated list of connection-strings with appended
> address. This is
>             > purely unmanageable.
>             >
>             > I would like to make it configurable via JSON or Yaml file.
>             >
>             > I think it would make sense to define groups of fields that
> are collected
>             > on one device at an equal rate. So it’s pretty similar to
> the scraper
>             > example, however I would like to not specify the source in
> the job, but the
>             > other way around.
>             > When specifying the “sources” I would also provide which
> jobs should run
>             > on a given collection.
>             > As the connector was initially showcased in a scenario where
> data had to
>             > be collected on a big number of PLCs with equal specs,
>             > I think this is the probably most important use-case and in
> this it is
>             > also probably more common to add new devices to collect
> standard data on
>             > than the other way around.
>             >
>             > Also should we provide the means to also set per connection
> to which
>             > kafka-topic the data should be sent to.
>             > We could provide the means to set a default and make it
> optional however.
>             > When posting to a topic we also need to provide means for
> partitioning, so
>             > I would provide sources with an optional “name”.
>             > Each message would not only have the data requested, but
> also the
>             > source-url, source-name and the job-name with a timestamp.
>             >
>             > So I guess it would look something like this:
>             >
>             > #
>             >
> ----------------------------------------------------------------------------
>             > # Licensed to the Apache Software Foundation (ASF) under one
>             > # or more contributor license agreements.  See the NOTICE
> file
>             > # distributed with this work for additional information
>             > # regarding copyright ownership.  The ASF licenses this file
>             > # to you under the Apache License, Version 2.0 (the
>             > # "License"); you may not use this file except in compliance
>             > # with the License.  You may obtain a copy of the License at
>             > #
>             > #    http://www.apache.org/licenses/LICENSE-2.0
>             > #
>             > # Unless required by applicable law or agreed to in writing,
>             > # software distributed under the License is distributed on an
>             > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>             > # KIND, either express or implied.  See the License for the
>             > # specific language governing permissions and limitations
>             > # under the License.
>             > #
>             >
> ----------------------------------------------------------------------------
>             > ---
>             > # Defaults used throughout all collections
>             > defaults:
>             >   # If not specified, all data goes to this topic (optional)
>             >   default-topic: some/default
>             >
>             > # Defines connections to PLCs
>             > sources:
>             >   # Connection to a S7 device
>             >  - name: machineA
>             >     # PLC4X connection URL
>             >     url: s7://1.2.3.4/1/1
>             >     jobs:
>             >       # Just references the job "s7-dashboard". All data
> will be published
>             > to the default topic
>             >       - name: s7-dashboard
>             >       # References the job "s7-heartbeat", however is
> configures the
>             > output to go to the topic "heartbeat"
>             >       - name: s7-heartbeat
>             >         topic: heartbeat
>             >
>             >   # Connection to a second S7 device
>             >   - name: machineB
>             >     url: s7://10.20.30.40/1/1
>             >     # Sets the default topic for this connection. All jobs
> data will go to
>             > "heartbeat"
>             >     topic: heartbeat
>             >     jobs:
>             >       - s7-heartbeat
>             >
>             >   # Connection to a Beckhoff device
>             >   - name: machineC
>             >     url: ads://1.2.3.4.5.6
>             >     topic: heartbeat
>             >     jobs:
>             >       - ads-heartbeat
>             >
>             > # Defines what should be collected how often
>             > jobs:
>             >   # Defines a job to collect a set of fields on s7 devices
> every 500ms
>             >   - name: s7-dashboard
>             >     scrapeRate: 500
>             >     fields:
>             >       # The key will be used in the Kafka message to
> identify this field,
>             > the value here contains the PLC4X address
>             >       inputPreasure: %DB.DB1.4:INT
>             >       outputPreasure: %Q1:BYTE
>             >       temperature: %I3:INT
>             >
>             >   # Defines a second job to collect a set of fields on s7
> devices every
>             > 1000ms
>             >   - name: s7-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: %I0.2:BOOL
>             >
>             >   # Defines a third job that collects data on Beckhoff
> devices
>             >   - name: ads-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: Main.running
>             >
>             > I think it should be self-explanatory with my comments
> inline.
>             >
>             > What do you think?
>             >
>             > Chris
>             >
>             >
>
>
>
>
>
>
>

Re: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Björn,

But you weren't using the existing solution, we're you? So me removing that wouldn't cause you any pain. But us adding one later on, would make you happy?

Chris

Holen Sie sich Outlook für Android<https://aka.ms/ghei36>

________________________________
From: Bjoern Hoeper <ho...@ltsoft.de>
Sent: Thursday, August 15, 2019 11:30:34 AM
To: dev@plc4x.apache.org <de...@plc4x.apache.org>
Subject: Re: [KAFKA] Refactoring the Kafka Connect plugin?

Hi everyone,
I am on holiday currently so just a short note:
We have a use case at our customer that would need the sink functionality to exchange data between distinct DCSes controlling different areas. In a current implementation that provides this exchange the receiving dcs is an active pull component getting the data from the remote DCS.
I see two possible options for sink implementation. Some proxy component actively throttles writing to the DCS/PLC or the DCS pulling data itself as a custom Kafka Consumer. The only option that could be implemented independent of the control vendor addressed would be the first one.
After my holiday I will ask our customer if I can share some details.
Greetings from the Harbor of Rømø
Björn

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Julian Feinauer <j....@pragmaticminds.de>
Sent: Tuesday, August 13, 2019 9:05:26 PM
To: dev@plc4x.apache.org <de...@plc4x.apache.org>; megachucky@gmail.com <me...@gmail.com>
Subject: AW: [KAFKA] Refactoring the Kafka Connect plugin?

Hi,

I agree with what you say.
We use Kafka and plc4x exactly the same way in prod :)

Prepaid Björn can also comment here as he also consists a Kafka based plc communication layer..

Julian

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
Von: Kai Wähner
An: dev@plc4x.apache.org
Cc:

Hi Chris,

here a few thoughts:

*I haven't seen a single use case of someone writing to a PLC with PLC4X *
=> This sounds similar to most IoT MQTT scenarios I see today. The main
challenge is to get data out of machines to analyze and use it. Writing
back (e.g. to control the systems) is the second step which comes later,
sometimes years later - or even never if not needed.

* I doubt such a scenario is ideal for the Kafka setup*
=> If you don't see it in other PLC4X projects, then I would not worry
about it too much for Kafka integration today.
However, I would not think too much about Kafka here. Either you see many
use cases for Sinks in general today for PLC4X, or you don't see them yet.
Some use Kafka, some use Java, some use Nifi or whatever.

*A PLC is not able to consume a lot of data until it chokes and stops
working*
=> This is an argument we hear often in Kafka community. Often it is *not*
valid (and we see more and more use cases where Kafka is just used for
transactional data like bank payments which are very low volume).
Therefore, Kafka is not just used for big data sets. In IoT scenarios (like
connected cars), we see more and more use cases where people also want to
send information back to the device (e.g. alerts or recommendations to the
driver, though, this is just a few messages, not comparable to sensor
egress messages).
In IoT scenarios, in almost all cases, you ingest a lot of IoT data from
the devices / machines to other clusters (kafka, spark, cloud, AI,
whatever), but if you send data back to the device / machine, it is just
control data or other limited, small data sets. So a high throughput egress
from IoT devices does not imply a high throughput ingress. Therefore, I
recommend to not use this argument.

* So in a first round I'll concentrate on implementing a robust
Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely. If we
find out that a Sink makes sense we'll be able to add it in a later version*
I would recommend exactly the same. Focus on the important direction, and
implement the other direction later (if people ask for it and share their
use cases). Confluent did exactly the same for many other connectors, e.g.
AWS S3 Sink or MQTT Source. Then we also built MQTT Sink and S3 Source
later because we saw more demand for these, too.

Best regards,
Kai Waehner

On Tue, Aug 13, 2019 at 3:52 PM Christofer Dutz <ch...@c-ware.de>
wrote:

> Hi all,
>
> so while I was working on the Kafka Sink I sort of started thinking if
> such a thing is a good idea to have at all.
> A Kafka system would be able to provide a vast stream of data that would
> have to be routed to a PLCs.
> On the one side, I haven't seen a single usecase of someone writing to a
> PLC with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
> A PLC is not able to consume a lot of data until it chokes and stops
> working.
>
> So in a first round I'll concentrate on implementing a robust
> Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely.
>
> If we find out that a Sink makes sense we'll be able to add it in a later
> version.
>
> What do you think?
>
> Chris
>
>
> Am 02.08.19, 11:24 schrieb "Julian Feinauer" <
> j.feinauer@pragmaticminds.de>:
>
>     Hi,
>
>     I think that's the best way then.
>
>     I agree with your suggestion.
>
>     Julian
>
>     Von meinem Mobiltelefon gesendet
>
>
>     -------- Ursprüngliche Nachricht --------
>     Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
>     Von: Christofer Dutz
>     An: dev@plc4x.apache.org
>     Cc:
>
>     Hi all,
>
>     Ok so it seems that Kafka only supports configuration via simple
> unstructured maps.
>     So if we want some hierarchical configuration like the proposed one,
> we'll have to do it log4j.properties-style:
>
>     name=plc-source-test
>     connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
>
>     defaults.default-topic=some/default
>
>     sources.machineA.url=s7://1.2.3.4/1/1
>     sources.machineA.jobs.s7-dashboard.enabled=true
>     sources.machineA.jobs.s7-heartbeat.enabled=true
>     sources.machineA.jobs.s7-heartbeat.topic=heartbeat
>
>     sources.machineB.url=s7://10.20.30.40/1/1
>     sources.machineB.topic=heartbeat
>     sources.machineB.jobs.s7-heartbeat.enabled=true
>
>     sources.machineC.url=ads://1.2.3.4.5.6
>     sources.machineC.topic=heartbeat
>     sources.machineC.jobs.ads-heartbeat.enabled=true
>
>     jobs.s7-dashboard.scrapeRate=500
>     jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
>     jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
>     jobs.s7-dashboard.fields.temperature=%I3:INT
>
>     jobs.s7-heartbeat.scrapeRate=1000
>     jobs.s7-heartbeat.fields.active=%I0.2:BOOL
>
>     jobs.ads-heartbeat.scrapeRate=1000
>     jobs.ads-heartbeat.active=Main.running
>
>
>     Chris
>
>
>
>     Am 01.08.19, 15:46 schrieb "Christofer Dutz" <
> christofer.dutz@c-ware.de>:
>
>         Hi Kai,
>
>         that document is exactly the one I'm currently using.
>         What I'm currently working on is updating the current plugin to
> not schedule and handle the connection stuff manually, but use the scraper
> component of PLC4X.
>         Also is the current configuration not production ready and I'll be
> working on to make it more easily usable.
>
>         But it will definitely not hurt to have some Kafka Pro have a look
> at what we did and propose improvements. After all we want the thing to be
> rock-solid :-)
>
>         Chris
>
>
>
>         Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
>
>             Hi Chris,
>
>             great that you will work on the connector.
>
>             I am not deep technical, but if you need guidance from Kafka
> Connect
>             experts, I can connect you to a Confluent colleague to can
> help with best
>             practices for building the connector.
>
>             For example, we have implemented a wildcard option into our
> MQTT Connector
>             to map MQTT Topics to Kafka Topics in a more flexible way
> (e.g. 1000s of
>             cars from different MQTT Topics can be routed into 1 Kafka
> Topic). This
>             might also be interesting for this connector as you expect to
> various PLCs.
>
>             This guide might also help:
>
> https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
>
>
>
>             On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <
> christofer.dutz@c-ware.de>
>             wrote:
>
>             > Hi all,
>             >
>             > I am currently planning on cleaning up the Kafka Connect
> adapter a little
>             > as this was implemented as part of a proof of concept and is
> still I a
>             > state I wouldn’t use in production ;-)
>             > But a lot has happened since then and I’m planning on making
> it a really
>             > usable tool in the next few days.
>             >
>             > A lot has changed since we created the integration module
> QT3 2018 and I
>             > would like to refactor it to use the Scraper for the heavy
> lifting.
>             >
>             > Currently a user has to provide a parameter “query” which
> contains a
>             > comma-separated list of connection-strings with appended
> address. This is
>             > purely unmanageable.
>             >
>             > I would like to make it configurable via JSON or Yaml file.
>             >
>             > I think it would make sense to define groups of fields that
> are collected
>             > on one device at an equal rate. So it’s pretty similar to
> the scraper
>             > example, however I would like to not specify the source in
> the job, but the
>             > other way around.
>             > When specifying the “sources” I would also provide which
> jobs should run
>             > on a given collection.
>             > As the connector was initially showcased in a scenario where
> data had to
>             > be collected on a big number of PLCs with equal specs,
>             > I think this is the probably most important use-case and in
> this it is
>             > also probably more common to add new devices to collect
> standard data on
>             > than the other way around.
>             >
>             > Also should we provide the means to also set per connection
> to which
>             > kafka-topic the data should be sent to.
>             > We could provide the means to set a default and make it
> optional however.
>             > When posting to a topic we also need to provide means for
> partitioning, so
>             > I would provide sources with an optional “name”.
>             > Each message would not only have the data requested, but
> also the
>             > source-url, source-name and the job-name with a timestamp.
>             >
>             > So I guess it would look something like this:
>             >
>             > #
>             >
> ----------------------------------------------------------------------------
>             > # Licensed to the Apache Software Foundation (ASF) under one
>             > # or more contributor license agreements.  See the NOTICE
> file
>             > # distributed with this work for additional information
>             > # regarding copyright ownership.  The ASF licenses this file
>             > # to you under the Apache License, Version 2.0 (the
>             > # "License"); you may not use this file except in compliance
>             > # with the License.  You may obtain a copy of the License at
>             > #
>             > #    http://www.apache.org/licenses/LICENSE-2.0
>             > #
>             > # Unless required by applicable law or agreed to in writing,
>             > # software distributed under the License is distributed on an
>             > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>             > # KIND, either express or implied.  See the License for the
>             > # specific language governing permissions and limitations
>             > # under the License.
>             > #
>             >
> ----------------------------------------------------------------------------
>             > ---
>             > # Defaults used throughout all collections
>             > defaults:
>             >   # If not specified, all data goes to this topic (optional)
>             >   default-topic: some/default
>             >
>             > # Defines connections to PLCs
>             > sources:
>             >   # Connection to a S7 device
>             >  - name: machineA
>             >     # PLC4X connection URL
>             >     url: s7://1.2.3.4/1/1
>             >     jobs:
>             >       # Just references the job "s7-dashboard". All data
> will be published
>             > to the default topic
>             >       - name: s7-dashboard
>             >       # References the job "s7-heartbeat", however is
> configures the
>             > output to go to the topic "heartbeat"
>             >       - name: s7-heartbeat
>             >         topic: heartbeat
>             >
>             >   # Connection to a second S7 device
>             >   - name: machineB
>             >     url: s7://10.20.30.40/1/1
>             >     # Sets the default topic for this connection. All jobs
> data will go to
>             > "heartbeat"
>             >     topic: heartbeat
>             >     jobs:
>             >       - s7-heartbeat
>             >
>             >   # Connection to a Beckhoff device
>             >   - name: machineC
>             >     url: ads://1.2.3.4.5.6
>             >     topic: heartbeat
>             >     jobs:
>             >       - ads-heartbeat
>             >
>             > # Defines what should be collected how often
>             > jobs:
>             >   # Defines a job to collect a set of fields on s7 devices
> every 500ms
>             >   - name: s7-dashboard
>             >     scrapeRate: 500
>             >     fields:
>             >       # The key will be used in the Kafka message to
> identify this field,
>             > the value here contains the PLC4X address
>             >       inputPreasure: %DB.DB1.4:INT
>             >       outputPreasure: %Q1:BYTE
>             >       temperature: %I3:INT
>             >
>             >   # Defines a second job to collect a set of fields on s7
> devices every
>             > 1000ms
>             >   - name: s7-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: %I0.2:BOOL
>             >
>             >   # Defines a third job that collects data on Beckhoff
> devices
>             >   - name: ads-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: Main.running
>             >
>             > I think it should be self-explanatory with my comments
> inline.
>             >
>             > What do you think?
>             >
>             > Chris
>             >
>             >
>
>
>
>
>
>
>

Re: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Bjoern Hoeper <ho...@ltsoft.de>.
Hi everyone,
I am on holiday currently so just a short note:
We have a use case at our customer that would need the sink functionality to exchange data between distinct DCSes controlling different areas. In a current implementation that provides this exchange the receiving dcs is an active pull component getting the data from the remote DCS.
I see two possible options for sink implementation. Some proxy component actively throttles writing to the DCS/PLC or the DCS pulling data itself as a custom Kafka Consumer. The only option that could be implemented independent of the control vendor addressed would be the first one.
After my holiday I will ask our customer if I can share some details.
Greetings from the Harbor of Rømø
Björn

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Julian Feinauer <j....@pragmaticminds.de>
Sent: Tuesday, August 13, 2019 9:05:26 PM
To: dev@plc4x.apache.org <de...@plc4x.apache.org>; megachucky@gmail.com <me...@gmail.com>
Subject: AW: [KAFKA] Refactoring the Kafka Connect plugin?

Hi,

I agree with what you say.
We use Kafka and plc4x exactly the same way in prod :)

Prepaid Björn can also comment here as he also consists a Kafka based plc communication layer..

Julian

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
Von: Kai Wähner
An: dev@plc4x.apache.org
Cc:

Hi Chris,

here a few thoughts:

*I haven't seen a single use case of someone writing to a PLC with PLC4X *
=> This sounds similar to most IoT MQTT scenarios I see today. The main
challenge is to get data out of machines to analyze and use it. Writing
back (e.g. to control the systems) is the second step which comes later,
sometimes years later - or even never if not needed.

* I doubt such a scenario is ideal for the Kafka setup*
=> If you don't see it in other PLC4X projects, then I would not worry
about it too much for Kafka integration today.
However, I would not think too much about Kafka here. Either you see many
use cases for Sinks in general today for PLC4X, or you don't see them yet.
Some use Kafka, some use Java, some use Nifi or whatever.

*A PLC is not able to consume a lot of data until it chokes and stops
working*
=> This is an argument we hear often in Kafka community. Often it is *not*
valid (and we see more and more use cases where Kafka is just used for
transactional data like bank payments which are very low volume).
Therefore, Kafka is not just used for big data sets. In IoT scenarios (like
connected cars), we see more and more use cases where people also want to
send information back to the device (e.g. alerts or recommendations to the
driver, though, this is just a few messages, not comparable to sensor
egress messages).
In IoT scenarios, in almost all cases, you ingest a lot of IoT data from
the devices / machines to other clusters (kafka, spark, cloud, AI,
whatever), but if you send data back to the device / machine, it is just
control data or other limited, small data sets. So a high throughput egress
from IoT devices does not imply a high throughput ingress. Therefore, I
recommend to not use this argument.

* So in a first round I'll concentrate on implementing a robust
Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely. If we
find out that a Sink makes sense we'll be able to add it in a later version*
I would recommend exactly the same. Focus on the important direction, and
implement the other direction later (if people ask for it and share their
use cases). Confluent did exactly the same for many other connectors, e.g.
AWS S3 Sink or MQTT Source. Then we also built MQTT Sink and S3 Source
later because we saw more demand for these, too.

Best regards,
Kai Waehner

On Tue, Aug 13, 2019 at 3:52 PM Christofer Dutz <ch...@c-ware.de>
wrote:

> Hi all,
>
> so while I was working on the Kafka Sink I sort of started thinking if
> such a thing is a good idea to have at all.
> A Kafka system would be able to provide a vast stream of data that would
> have to be routed to a PLCs.
> On the one side, I haven't seen a single usecase of someone writing to a
> PLC with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
> A PLC is not able to consume a lot of data until it chokes and stops
> working.
>
> So in a first round I'll concentrate on implementing a robust
> Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely.
>
> If we find out that a Sink makes sense we'll be able to add it in a later
> version.
>
> What do you think?
>
> Chris
>
>
> Am 02.08.19, 11:24 schrieb "Julian Feinauer" <
> j.feinauer@pragmaticminds.de>:
>
>     Hi,
>
>     I think that's the best way then.
>
>     I agree with your suggestion.
>
>     Julian
>
>     Von meinem Mobiltelefon gesendet
>
>
>     -------- Ursprüngliche Nachricht --------
>     Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
>     Von: Christofer Dutz
>     An: dev@plc4x.apache.org
>     Cc:
>
>     Hi all,
>
>     Ok so it seems that Kafka only supports configuration via simple
> unstructured maps.
>     So if we want some hierarchical configuration like the proposed one,
> we'll have to do it log4j.properties-style:
>
>     name=plc-source-test
>     connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
>
>     defaults.default-topic=some/default
>
>     sources.machineA.url=s7://1.2.3.4/1/1
>     sources.machineA.jobs.s7-dashboard.enabled=true
>     sources.machineA.jobs.s7-heartbeat.enabled=true
>     sources.machineA.jobs.s7-heartbeat.topic=heartbeat
>
>     sources.machineB.url=s7://10.20.30.40/1/1
>     sources.machineB.topic=heartbeat
>     sources.machineB.jobs.s7-heartbeat.enabled=true
>
>     sources.machineC.url=ads://1.2.3.4.5.6
>     sources.machineC.topic=heartbeat
>     sources.machineC.jobs.ads-heartbeat.enabled=true
>
>     jobs.s7-dashboard.scrapeRate=500
>     jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
>     jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
>     jobs.s7-dashboard.fields.temperature=%I3:INT
>
>     jobs.s7-heartbeat.scrapeRate=1000
>     jobs.s7-heartbeat.fields.active=%I0.2:BOOL
>
>     jobs.ads-heartbeat.scrapeRate=1000
>     jobs.ads-heartbeat.active=Main.running
>
>
>     Chris
>
>
>
>     Am 01.08.19, 15:46 schrieb "Christofer Dutz" <
> christofer.dutz@c-ware.de>:
>
>         Hi Kai,
>
>         that document is exactly the one I'm currently using.
>         What I'm currently working on is updating the current plugin to
> not schedule and handle the connection stuff manually, but use the scraper
> component of PLC4X.
>         Also is the current configuration not production ready and I'll be
> working on to make it more easily usable.
>
>         But it will definitely not hurt to have some Kafka Pro have a look
> at what we did and propose improvements. After all we want the thing to be
> rock-solid :-)
>
>         Chris
>
>
>
>         Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
>
>             Hi Chris,
>
>             great that you will work on the connector.
>
>             I am not deep technical, but if you need guidance from Kafka
> Connect
>             experts, I can connect you to a Confluent colleague to can
> help with best
>             practices for building the connector.
>
>             For example, we have implemented a wildcard option into our
> MQTT Connector
>             to map MQTT Topics to Kafka Topics in a more flexible way
> (e.g. 1000s of
>             cars from different MQTT Topics can be routed into 1 Kafka
> Topic). This
>             might also be interesting for this connector as you expect to
> various PLCs.
>
>             This guide might also help:
>
> https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
>
>
>
>             On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <
> christofer.dutz@c-ware.de>
>             wrote:
>
>             > Hi all,
>             >
>             > I am currently planning on cleaning up the Kafka Connect
> adapter a little
>             > as this was implemented as part of a proof of concept and is
> still I a
>             > state I wouldn’t use in production ;-)
>             > But a lot has happened since then and I’m planning on making
> it a really
>             > usable tool in the next few days.
>             >
>             > A lot has changed since we created the integration module
> QT3 2018 and I
>             > would like to refactor it to use the Scraper for the heavy
> lifting.
>             >
>             > Currently a user has to provide a parameter “query” which
> contains a
>             > comma-separated list of connection-strings with appended
> address. This is
>             > purely unmanageable.
>             >
>             > I would like to make it configurable via JSON or Yaml file.
>             >
>             > I think it would make sense to define groups of fields that
> are collected
>             > on one device at an equal rate. So it’s pretty similar to
> the scraper
>             > example, however I would like to not specify the source in
> the job, but the
>             > other way around.
>             > When specifying the “sources” I would also provide which
> jobs should run
>             > on a given collection.
>             > As the connector was initially showcased in a scenario where
> data had to
>             > be collected on a big number of PLCs with equal specs,
>             > I think this is the probably most important use-case and in
> this it is
>             > also probably more common to add new devices to collect
> standard data on
>             > than the other way around.
>             >
>             > Also should we provide the means to also set per connection
> to which
>             > kafka-topic the data should be sent to.
>             > We could provide the means to set a default and make it
> optional however.
>             > When posting to a topic we also need to provide means for
> partitioning, so
>             > I would provide sources with an optional “name”.
>             > Each message would not only have the data requested, but
> also the
>             > source-url, source-name and the job-name with a timestamp.
>             >
>             > So I guess it would look something like this:
>             >
>             > #
>             >
> ----------------------------------------------------------------------------
>             > # Licensed to the Apache Software Foundation (ASF) under one
>             > # or more contributor license agreements.  See the NOTICE
> file
>             > # distributed with this work for additional information
>             > # regarding copyright ownership.  The ASF licenses this file
>             > # to you under the Apache License, Version 2.0 (the
>             > # "License"); you may not use this file except in compliance
>             > # with the License.  You may obtain a copy of the License at
>             > #
>             > #    http://www.apache.org/licenses/LICENSE-2.0
>             > #
>             > # Unless required by applicable law or agreed to in writing,
>             > # software distributed under the License is distributed on an
>             > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>             > # KIND, either express or implied.  See the License for the
>             > # specific language governing permissions and limitations
>             > # under the License.
>             > #
>             >
> ----------------------------------------------------------------------------
>             > ---
>             > # Defaults used throughout all collections
>             > defaults:
>             >   # If not specified, all data goes to this topic (optional)
>             >   default-topic: some/default
>             >
>             > # Defines connections to PLCs
>             > sources:
>             >   # Connection to a S7 device
>             >  - name: machineA
>             >     # PLC4X connection URL
>             >     url: s7://1.2.3.4/1/1
>             >     jobs:
>             >       # Just references the job "s7-dashboard". All data
> will be published
>             > to the default topic
>             >       - name: s7-dashboard
>             >       # References the job "s7-heartbeat", however is
> configures the
>             > output to go to the topic "heartbeat"
>             >       - name: s7-heartbeat
>             >         topic: heartbeat
>             >
>             >   # Connection to a second S7 device
>             >   - name: machineB
>             >     url: s7://10.20.30.40/1/1
>             >     # Sets the default topic for this connection. All jobs
> data will go to
>             > "heartbeat"
>             >     topic: heartbeat
>             >     jobs:
>             >       - s7-heartbeat
>             >
>             >   # Connection to a Beckhoff device
>             >   - name: machineC
>             >     url: ads://1.2.3.4.5.6
>             >     topic: heartbeat
>             >     jobs:
>             >       - ads-heartbeat
>             >
>             > # Defines what should be collected how often
>             > jobs:
>             >   # Defines a job to collect a set of fields on s7 devices
> every 500ms
>             >   - name: s7-dashboard
>             >     scrapeRate: 500
>             >     fields:
>             >       # The key will be used in the Kafka message to
> identify this field,
>             > the value here contains the PLC4X address
>             >       inputPreasure: %DB.DB1.4:INT
>             >       outputPreasure: %Q1:BYTE
>             >       temperature: %I3:INT
>             >
>             >   # Defines a second job to collect a set of fields on s7
> devices every
>             > 1000ms
>             >   - name: s7-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: %I0.2:BOOL
>             >
>             >   # Defines a third job that collects data on Beckhoff
> devices
>             >   - name: ads-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: Main.running
>             >
>             > I think it should be self-explanatory with my comments
> inline.
>             >
>             > What do you think?
>             >
>             > Chris
>             >
>             >
>
>
>
>
>
>
>

AW: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hi,

I agree with what you say.
We use Kafka and plc4x exactly the same way in prod :)

Prepaid Björn can also comment here as he also consists a Kafka based plc communication layer..

Julian

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
Von: Kai Wähner
An: dev@plc4x.apache.org
Cc:

Hi Chris,

here a few thoughts:

*I haven't seen a single use case of someone writing to a PLC with PLC4X *
=> This sounds similar to most IoT MQTT scenarios I see today. The main
challenge is to get data out of machines to analyze and use it. Writing
back (e.g. to control the systems) is the second step which comes later,
sometimes years later - or even never if not needed.

* I doubt such a scenario is ideal for the Kafka setup*
=> If you don't see it in other PLC4X projects, then I would not worry
about it too much for Kafka integration today.
However, I would not think too much about Kafka here. Either you see many
use cases for Sinks in general today for PLC4X, or you don't see them yet.
Some use Kafka, some use Java, some use Nifi or whatever.

*A PLC is not able to consume a lot of data until it chokes and stops
working*
=> This is an argument we hear often in Kafka community. Often it is *not*
valid (and we see more and more use cases where Kafka is just used for
transactional data like bank payments which are very low volume).
Therefore, Kafka is not just used for big data sets. In IoT scenarios (like
connected cars), we see more and more use cases where people also want to
send information back to the device (e.g. alerts or recommendations to the
driver, though, this is just a few messages, not comparable to sensor
egress messages).
In IoT scenarios, in almost all cases, you ingest a lot of IoT data from
the devices / machines to other clusters (kafka, spark, cloud, AI,
whatever), but if you send data back to the device / machine, it is just
control data or other limited, small data sets. So a high throughput egress
from IoT devices does not imply a high throughput ingress. Therefore, I
recommend to not use this argument.

* So in a first round I'll concentrate on implementing a robust
Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely. If we
find out that a Sink makes sense we'll be able to add it in a later version*
I would recommend exactly the same. Focus on the important direction, and
implement the other direction later (if people ask for it and share their
use cases). Confluent did exactly the same for many other connectors, e.g.
AWS S3 Sink or MQTT Source. Then we also built MQTT Sink and S3 Source
later because we saw more demand for these, too.

Best regards,
Kai Waehner

On Tue, Aug 13, 2019 at 3:52 PM Christofer Dutz <ch...@c-ware.de>
wrote:

> Hi all,
>
> so while I was working on the Kafka Sink I sort of started thinking if
> such a thing is a good idea to have at all.
> A Kafka system would be able to provide a vast stream of data that would
> have to be routed to a PLCs.
> On the one side, I haven't seen a single usecase of someone writing to a
> PLC with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
> A PLC is not able to consume a lot of data until it chokes and stops
> working.
>
> So in a first round I'll concentrate on implementing a robust
> Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely.
>
> If we find out that a Sink makes sense we'll be able to add it in a later
> version.
>
> What do you think?
>
> Chris
>
>
> Am 02.08.19, 11:24 schrieb "Julian Feinauer" <
> j.feinauer@pragmaticminds.de>:
>
>     Hi,
>
>     I think that's the best way then.
>
>     I agree with your suggestion.
>
>     Julian
>
>     Von meinem Mobiltelefon gesendet
>
>
>     -------- Ursprüngliche Nachricht --------
>     Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
>     Von: Christofer Dutz
>     An: dev@plc4x.apache.org
>     Cc:
>
>     Hi all,
>
>     Ok so it seems that Kafka only supports configuration via simple
> unstructured maps.
>     So if we want some hierarchical configuration like the proposed one,
> we'll have to do it log4j.properties-style:
>
>     name=plc-source-test
>     connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
>
>     defaults.default-topic=some/default
>
>     sources.machineA.url=s7://1.2.3.4/1/1
>     sources.machineA.jobs.s7-dashboard.enabled=true
>     sources.machineA.jobs.s7-heartbeat.enabled=true
>     sources.machineA.jobs.s7-heartbeat.topic=heartbeat
>
>     sources.machineB.url=s7://10.20.30.40/1/1
>     sources.machineB.topic=heartbeat
>     sources.machineB.jobs.s7-heartbeat.enabled=true
>
>     sources.machineC.url=ads://1.2.3.4.5.6
>     sources.machineC.topic=heartbeat
>     sources.machineC.jobs.ads-heartbeat.enabled=true
>
>     jobs.s7-dashboard.scrapeRate=500
>     jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
>     jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
>     jobs.s7-dashboard.fields.temperature=%I3:INT
>
>     jobs.s7-heartbeat.scrapeRate=1000
>     jobs.s7-heartbeat.fields.active=%I0.2:BOOL
>
>     jobs.ads-heartbeat.scrapeRate=1000
>     jobs.ads-heartbeat.active=Main.running
>
>
>     Chris
>
>
>
>     Am 01.08.19, 15:46 schrieb "Christofer Dutz" <
> christofer.dutz@c-ware.de>:
>
>         Hi Kai,
>
>         that document is exactly the one I'm currently using.
>         What I'm currently working on is updating the current plugin to
> not schedule and handle the connection stuff manually, but use the scraper
> component of PLC4X.
>         Also is the current configuration not production ready and I'll be
> working on to make it more easily usable.
>
>         But it will definitely not hurt to have some Kafka Pro have a look
> at what we did and propose improvements. After all we want the thing to be
> rock-solid :-)
>
>         Chris
>
>
>
>         Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
>
>             Hi Chris,
>
>             great that you will work on the connector.
>
>             I am not deep technical, but if you need guidance from Kafka
> Connect
>             experts, I can connect you to a Confluent colleague to can
> help with best
>             practices for building the connector.
>
>             For example, we have implemented a wildcard option into our
> MQTT Connector
>             to map MQTT Topics to Kafka Topics in a more flexible way
> (e.g. 1000s of
>             cars from different MQTT Topics can be routed into 1 Kafka
> Topic). This
>             might also be interesting for this connector as you expect to
> various PLCs.
>
>             This guide might also help:
>
> https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
>
>
>
>             On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <
> christofer.dutz@c-ware.de>
>             wrote:
>
>             > Hi all,
>             >
>             > I am currently planning on cleaning up the Kafka Connect
> adapter a little
>             > as this was implemented as part of a proof of concept and is
> still I a
>             > state I wouldn’t use in production ;-)
>             > But a lot has happened since then and I’m planning on making
> it a really
>             > usable tool in the next few days.
>             >
>             > A lot has changed since we created the integration module
> QT3 2018 and I
>             > would like to refactor it to use the Scraper for the heavy
> lifting.
>             >
>             > Currently a user has to provide a parameter “query” which
> contains a
>             > comma-separated list of connection-strings with appended
> address. This is
>             > purely unmanageable.
>             >
>             > I would like to make it configurable via JSON or Yaml file.
>             >
>             > I think it would make sense to define groups of fields that
> are collected
>             > on one device at an equal rate. So it’s pretty similar to
> the scraper
>             > example, however I would like to not specify the source in
> the job, but the
>             > other way around.
>             > When specifying the “sources” I would also provide which
> jobs should run
>             > on a given collection.
>             > As the connector was initially showcased in a scenario where
> data had to
>             > be collected on a big number of PLCs with equal specs,
>             > I think this is the probably most important use-case and in
> this it is
>             > also probably more common to add new devices to collect
> standard data on
>             > than the other way around.
>             >
>             > Also should we provide the means to also set per connection
> to which
>             > kafka-topic the data should be sent to.
>             > We could provide the means to set a default and make it
> optional however.
>             > When posting to a topic we also need to provide means for
> partitioning, so
>             > I would provide sources with an optional “name”.
>             > Each message would not only have the data requested, but
> also the
>             > source-url, source-name and the job-name with a timestamp.
>             >
>             > So I guess it would look something like this:
>             >
>             > #
>             >
> ----------------------------------------------------------------------------
>             > # Licensed to the Apache Software Foundation (ASF) under one
>             > # or more contributor license agreements.  See the NOTICE
> file
>             > # distributed with this work for additional information
>             > # regarding copyright ownership.  The ASF licenses this file
>             > # to you under the Apache License, Version 2.0 (the
>             > # "License"); you may not use this file except in compliance
>             > # with the License.  You may obtain a copy of the License at
>             > #
>             > #    http://www.apache.org/licenses/LICENSE-2.0
>             > #
>             > # Unless required by applicable law or agreed to in writing,
>             > # software distributed under the License is distributed on an
>             > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>             > # KIND, either express or implied.  See the License for the
>             > # specific language governing permissions and limitations
>             > # under the License.
>             > #
>             >
> ----------------------------------------------------------------------------
>             > ---
>             > # Defaults used throughout all collections
>             > defaults:
>             >   # If not specified, all data goes to this topic (optional)
>             >   default-topic: some/default
>             >
>             > # Defines connections to PLCs
>             > sources:
>             >   # Connection to a S7 device
>             >  - name: machineA
>             >     # PLC4X connection URL
>             >     url: s7://1.2.3.4/1/1
>             >     jobs:
>             >       # Just references the job "s7-dashboard". All data
> will be published
>             > to the default topic
>             >       - name: s7-dashboard
>             >       # References the job "s7-heartbeat", however is
> configures the
>             > output to go to the topic "heartbeat"
>             >       - name: s7-heartbeat
>             >         topic: heartbeat
>             >
>             >   # Connection to a second S7 device
>             >   - name: machineB
>             >     url: s7://10.20.30.40/1/1
>             >     # Sets the default topic for this connection. All jobs
> data will go to
>             > "heartbeat"
>             >     topic: heartbeat
>             >     jobs:
>             >       - s7-heartbeat
>             >
>             >   # Connection to a Beckhoff device
>             >   - name: machineC
>             >     url: ads://1.2.3.4.5.6
>             >     topic: heartbeat
>             >     jobs:
>             >       - ads-heartbeat
>             >
>             > # Defines what should be collected how often
>             > jobs:
>             >   # Defines a job to collect a set of fields on s7 devices
> every 500ms
>             >   - name: s7-dashboard
>             >     scrapeRate: 500
>             >     fields:
>             >       # The key will be used in the Kafka message to
> identify this field,
>             > the value here contains the PLC4X address
>             >       inputPreasure: %DB.DB1.4:INT
>             >       outputPreasure: %Q1:BYTE
>             >       temperature: %I3:INT
>             >
>             >   # Defines a second job to collect a set of fields on s7
> devices every
>             > 1000ms
>             >   - name: s7-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: %I0.2:BOOL
>             >
>             >   # Defines a third job that collects data on Beckhoff
> devices
>             >   - name: ads-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: Main.running
>             >
>             > I think it should be self-explanatory with my comments
> inline.
>             >
>             > What do you think?
>             >
>             > Chris
>             >
>             >
>
>
>
>
>
>
>

Re: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Kai Wähner <me...@gmail.com>.
Hi Chris,

here a few thoughts:

*I haven't seen a single use case of someone writing to a PLC with PLC4X *
=> This sounds similar to most IoT MQTT scenarios I see today. The main
challenge is to get data out of machines to analyze and use it. Writing
back (e.g. to control the systems) is the second step which comes later,
sometimes years later - or even never if not needed.

* I doubt such a scenario is ideal for the Kafka setup*
=> If you don't see it in other PLC4X projects, then I would not worry
about it too much for Kafka integration today.
However, I would not think too much about Kafka here. Either you see many
use cases for Sinks in general today for PLC4X, or you don't see them yet.
Some use Kafka, some use Java, some use Nifi or whatever.

*A PLC is not able to consume a lot of data until it chokes and stops
working*
=> This is an argument we hear often in Kafka community. Often it is *not*
valid (and we see more and more use cases where Kafka is just used for
transactional data like bank payments which are very low volume).
Therefore, Kafka is not just used for big data sets. In IoT scenarios (like
connected cars), we see more and more use cases where people also want to
send information back to the device (e.g. alerts or recommendations to the
driver, though, this is just a few messages, not comparable to sensor
egress messages).
In IoT scenarios, in almost all cases, you ingest a lot of IoT data from
the devices / machines to other clusters (kafka, spark, cloud, AI,
whatever), but if you send data back to the device / machine, it is just
control data or other limited, small data sets. So a high throughput egress
from IoT devices does not imply a high throughput ingress. Therefore, I
recommend to not use this argument.

* So in a first round I'll concentrate on implementing a robust
Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely. If we
find out that a Sink makes sense we'll be able to add it in a later version*
I would recommend exactly the same. Focus on the important direction, and
implement the other direction later (if people ask for it and share their
use cases). Confluent did exactly the same for many other connectors, e.g.
AWS S3 Sink or MQTT Source. Then we also built MQTT Sink and S3 Source
later because we saw more demand for these, too.

Best regards,
Kai Waehner

On Tue, Aug 13, 2019 at 3:52 PM Christofer Dutz <ch...@c-ware.de>
wrote:

> Hi all,
>
> so while I was working on the Kafka Sink I sort of started thinking if
> such a thing is a good idea to have at all.
> A Kafka system would be able to provide a vast stream of data that would
> have to be routed to a PLCs.
> On the one side, I haven't seen a single usecase of someone writing to a
> PLC with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
> A PLC is not able to consume a lot of data until it chokes and stops
> working.
>
> So in a first round I'll concentrate on implementing a robust
> Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely.
>
> If we find out that a Sink makes sense we'll be able to add it in a later
> version.
>
> What do you think?
>
> Chris
>
>
> Am 02.08.19, 11:24 schrieb "Julian Feinauer" <
> j.feinauer@pragmaticminds.de>:
>
>     Hi,
>
>     I think that's the best way then.
>
>     I agree with your suggestion.
>
>     Julian
>
>     Von meinem Mobiltelefon gesendet
>
>
>     -------- Ursprüngliche Nachricht --------
>     Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
>     Von: Christofer Dutz
>     An: dev@plc4x.apache.org
>     Cc:
>
>     Hi all,
>
>     Ok so it seems that Kafka only supports configuration via simple
> unstructured maps.
>     So if we want some hierarchical configuration like the proposed one,
> we'll have to do it log4j.properties-style:
>
>     name=plc-source-test
>     connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
>
>     defaults.default-topic=some/default
>
>     sources.machineA.url=s7://1.2.3.4/1/1
>     sources.machineA.jobs.s7-dashboard.enabled=true
>     sources.machineA.jobs.s7-heartbeat.enabled=true
>     sources.machineA.jobs.s7-heartbeat.topic=heartbeat
>
>     sources.machineB.url=s7://10.20.30.40/1/1
>     sources.machineB.topic=heartbeat
>     sources.machineB.jobs.s7-heartbeat.enabled=true
>
>     sources.machineC.url=ads://1.2.3.4.5.6
>     sources.machineC.topic=heartbeat
>     sources.machineC.jobs.ads-heartbeat.enabled=true
>
>     jobs.s7-dashboard.scrapeRate=500
>     jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
>     jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
>     jobs.s7-dashboard.fields.temperature=%I3:INT
>
>     jobs.s7-heartbeat.scrapeRate=1000
>     jobs.s7-heartbeat.fields.active=%I0.2:BOOL
>
>     jobs.ads-heartbeat.scrapeRate=1000
>     jobs.ads-heartbeat.active=Main.running
>
>
>     Chris
>
>
>
>     Am 01.08.19, 15:46 schrieb "Christofer Dutz" <
> christofer.dutz@c-ware.de>:
>
>         Hi Kai,
>
>         that document is exactly the one I'm currently using.
>         What I'm currently working on is updating the current plugin to
> not schedule and handle the connection stuff manually, but use the scraper
> component of PLC4X.
>         Also is the current configuration not production ready and I'll be
> working on to make it more easily usable.
>
>         But it will definitely not hurt to have some Kafka Pro have a look
> at what we did and propose improvements. After all we want the thing to be
> rock-solid :-)
>
>         Chris
>
>
>
>         Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
>
>             Hi Chris,
>
>             great that you will work on the connector.
>
>             I am not deep technical, but if you need guidance from Kafka
> Connect
>             experts, I can connect you to a Confluent colleague to can
> help with best
>             practices for building the connector.
>
>             For example, we have implemented a wildcard option into our
> MQTT Connector
>             to map MQTT Topics to Kafka Topics in a more flexible way
> (e.g. 1000s of
>             cars from different MQTT Topics can be routed into 1 Kafka
> Topic). This
>             might also be interesting for this connector as you expect to
> various PLCs.
>
>             This guide might also help:
>
> https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
>
>
>
>             On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <
> christofer.dutz@c-ware.de>
>             wrote:
>
>             > Hi all,
>             >
>             > I am currently planning on cleaning up the Kafka Connect
> adapter a little
>             > as this was implemented as part of a proof of concept and is
> still I a
>             > state I wouldn’t use in production ;-)
>             > But a lot has happened since then and I’m planning on making
> it a really
>             > usable tool in the next few days.
>             >
>             > A lot has changed since we created the integration module
> QT3 2018 and I
>             > would like to refactor it to use the Scraper for the heavy
> lifting.
>             >
>             > Currently a user has to provide a parameter “query” which
> contains a
>             > comma-separated list of connection-strings with appended
> address. This is
>             > purely unmanageable.
>             >
>             > I would like to make it configurable via JSON or Yaml file.
>             >
>             > I think it would make sense to define groups of fields that
> are collected
>             > on one device at an equal rate. So it’s pretty similar to
> the scraper
>             > example, however I would like to not specify the source in
> the job, but the
>             > other way around.
>             > When specifying the “sources” I would also provide which
> jobs should run
>             > on a given collection.
>             > As the connector was initially showcased in a scenario where
> data had to
>             > be collected on a big number of PLCs with equal specs,
>             > I think this is the probably most important use-case and in
> this it is
>             > also probably more common to add new devices to collect
> standard data on
>             > than the other way around.
>             >
>             > Also should we provide the means to also set per connection
> to which
>             > kafka-topic the data should be sent to.
>             > We could provide the means to set a default and make it
> optional however.
>             > When posting to a topic we also need to provide means for
> partitioning, so
>             > I would provide sources with an optional “name”.
>             > Each message would not only have the data requested, but
> also the
>             > source-url, source-name and the job-name with a timestamp.
>             >
>             > So I guess it would look something like this:
>             >
>             > #
>             >
> ----------------------------------------------------------------------------
>             > # Licensed to the Apache Software Foundation (ASF) under one
>             > # or more contributor license agreements.  See the NOTICE
> file
>             > # distributed with this work for additional information
>             > # regarding copyright ownership.  The ASF licenses this file
>             > # to you under the Apache License, Version 2.0 (the
>             > # "License"); you may not use this file except in compliance
>             > # with the License.  You may obtain a copy of the License at
>             > #
>             > #    http://www.apache.org/licenses/LICENSE-2.0
>             > #
>             > # Unless required by applicable law or agreed to in writing,
>             > # software distributed under the License is distributed on an
>             > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>             > # KIND, either express or implied.  See the License for the
>             > # specific language governing permissions and limitations
>             > # under the License.
>             > #
>             >
> ----------------------------------------------------------------------------
>             > ---
>             > # Defaults used throughout all collections
>             > defaults:
>             >   # If not specified, all data goes to this topic (optional)
>             >   default-topic: some/default
>             >
>             > # Defines connections to PLCs
>             > sources:
>             >   # Connection to a S7 device
>             >  - name: machineA
>             >     # PLC4X connection URL
>             >     url: s7://1.2.3.4/1/1
>             >     jobs:
>             >       # Just references the job "s7-dashboard". All data
> will be published
>             > to the default topic
>             >       - name: s7-dashboard
>             >       # References the job "s7-heartbeat", however is
> configures the
>             > output to go to the topic "heartbeat"
>             >       - name: s7-heartbeat
>             >         topic: heartbeat
>             >
>             >   # Connection to a second S7 device
>             >   - name: machineB
>             >     url: s7://10.20.30.40/1/1
>             >     # Sets the default topic for this connection. All jobs
> data will go to
>             > "heartbeat"
>             >     topic: heartbeat
>             >     jobs:
>             >       - s7-heartbeat
>             >
>             >   # Connection to a Beckhoff device
>             >   - name: machineC
>             >     url: ads://1.2.3.4.5.6
>             >     topic: heartbeat
>             >     jobs:
>             >       - ads-heartbeat
>             >
>             > # Defines what should be collected how often
>             > jobs:
>             >   # Defines a job to collect a set of fields on s7 devices
> every 500ms
>             >   - name: s7-dashboard
>             >     scrapeRate: 500
>             >     fields:
>             >       # The key will be used in the Kafka message to
> identify this field,
>             > the value here contains the PLC4X address
>             >       inputPreasure: %DB.DB1.4:INT
>             >       outputPreasure: %Q1:BYTE
>             >       temperature: %I3:INT
>             >
>             >   # Defines a second job to collect a set of fields on s7
> devices every
>             > 1000ms
>             >   - name: s7-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: %I0.2:BOOL
>             >
>             >   # Defines a third job that collects data on Beckhoff
> devices
>             >   - name: ads-heartbeat
>             >     scrapeRate: 1000
>             >     fields:
>             >       active: Main.running
>             >
>             > I think it should be self-explanatory with my comments
> inline.
>             >
>             > What do you think?
>             >
>             > Chris
>             >
>             >
>
>
>
>
>
>
>

Re: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi all,

so while I was working on the Kafka Sink I sort of started thinking if such a thing is a good idea to have at all.
A Kafka system would be able to provide a vast stream of data that would have to be routed to a PLCs.
On the one side, I haven't seen a single usecase of someone writing to a PLC with PLC4X and I doubt such a scenario is ideal for the Kafka setup.
A PLC is not able to consume a lot of data until it chokes and stops working.

So in a first round I'll concentrate on implementing a robust Scraper-based PLC4X Kafka Connect Source, but drop the Sink entirely.

If we find out that a Sink makes sense we'll be able to add it in a later version.

What do you think?

Chris


Am 02.08.19, 11:24 schrieb "Julian Feinauer" <j....@pragmaticminds.de>:

    Hi,
    
    I think that's the best way then.
    
    I agree with your suggestion.
    
    Julian
    
    Von meinem Mobiltelefon gesendet
    
    
    -------- Ursprüngliche Nachricht --------
    Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
    Von: Christofer Dutz
    An: dev@plc4x.apache.org
    Cc:
    
    Hi all,
    
    Ok so it seems that Kafka only supports configuration via simple unstructured maps.
    So if we want some hierarchical configuration like the proposed one, we'll have to do it log4j.properties-style:
    
    name=plc-source-test
    connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector
    
    defaults.default-topic=some/default
    
    sources.machineA.url=s7://1.2.3.4/1/1
    sources.machineA.jobs.s7-dashboard.enabled=true
    sources.machineA.jobs.s7-heartbeat.enabled=true
    sources.machineA.jobs.s7-heartbeat.topic=heartbeat
    
    sources.machineB.url=s7://10.20.30.40/1/1
    sources.machineB.topic=heartbeat
    sources.machineB.jobs.s7-heartbeat.enabled=true
    
    sources.machineC.url=ads://1.2.3.4.5.6
    sources.machineC.topic=heartbeat
    sources.machineC.jobs.ads-heartbeat.enabled=true
    
    jobs.s7-dashboard.scrapeRate=500
    jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
    jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
    jobs.s7-dashboard.fields.temperature=%I3:INT
    
    jobs.s7-heartbeat.scrapeRate=1000
    jobs.s7-heartbeat.fields.active=%I0.2:BOOL
    
    jobs.ads-heartbeat.scrapeRate=1000
    jobs.ads-heartbeat.active=Main.running
    
    
    Chris
    
    
    
    Am 01.08.19, 15:46 schrieb "Christofer Dutz" <ch...@c-ware.de>:
    
        Hi Kai,
    
        that document is exactly the one I'm currently using.
        What I'm currently working on is updating the current plugin to not schedule and handle the connection stuff manually, but use the scraper component of PLC4X.
        Also is the current configuration not production ready and I'll be working on to make it more easily usable.
    
        But it will definitely not hurt to have some Kafka Pro have a look at what we did and propose improvements. After all we want the thing to be rock-solid :-)
    
        Chris
    
    
    
        Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
    
            Hi Chris,
    
            great that you will work on the connector.
    
            I am not deep technical, but if you need guidance from Kafka Connect
            experts, I can connect you to a Confluent colleague to can help with best
            practices for building the connector.
    
            For example, we have implemented a wildcard option into our MQTT Connector
            to map MQTT Topics to Kafka Topics in a more flexible way (e.g. 1000s of
            cars from different MQTT Topics can be routed into 1 Kafka Topic). This
            might also be interesting for this connector as you expect to various PLCs.
    
            This guide might also help:
            https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
    
    
    
            On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <ch...@c-ware.de>
            wrote:
    
            > Hi all,
            >
            > I am currently planning on cleaning up the Kafka Connect adapter a little
            > as this was implemented as part of a proof of concept and is still I a
            > state I wouldn’t use in production ;-)
            > But a lot has happened since then and I’m planning on making it a really
            > usable tool in the next few days.
            >
            > A lot has changed since we created the integration module QT3 2018 and I
            > would like to refactor it to use the Scraper for the heavy lifting.
            >
            > Currently a user has to provide a parameter “query” which contains a
            > comma-separated list of connection-strings with appended address. This is
            > purely unmanageable.
            >
            > I would like to make it configurable via JSON or Yaml file.
            >
            > I think it would make sense to define groups of fields that are collected
            > on one device at an equal rate. So it’s pretty similar to the scraper
            > example, however I would like to not specify the source in the job, but the
            > other way around.
            > When specifying the “sources” I would also provide which jobs should run
            > on a given collection.
            > As the connector was initially showcased in a scenario where data had to
            > be collected on a big number of PLCs with equal specs,
            > I think this is the probably most important use-case and in this it is
            > also probably more common to add new devices to collect standard data on
            > than the other way around.
            >
            > Also should we provide the means to also set per connection to which
            > kafka-topic the data should be sent to.
            > We could provide the means to set a default and make it optional however.
            > When posting to a topic we also need to provide means for partitioning, so
            > I would provide sources with an optional “name”.
            > Each message would not only have the data requested, but also the
            > source-url, source-name and the job-name with a timestamp.
            >
            > So I guess it would look something like this:
            >
            > #
            > ----------------------------------------------------------------------------
            > # Licensed to the Apache Software Foundation (ASF) under one
            > # or more contributor license agreements.  See the NOTICE file
            > # distributed with this work for additional information
            > # regarding copyright ownership.  The ASF licenses this file
            > # to you under the Apache License, Version 2.0 (the
            > # "License"); you may not use this file except in compliance
            > # with the License.  You may obtain a copy of the License at
            > #
            > #    http://www.apache.org/licenses/LICENSE-2.0
            > #
            > # Unless required by applicable law or agreed to in writing,
            > # software distributed under the License is distributed on an
            > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
            > # KIND, either express or implied.  See the License for the
            > # specific language governing permissions and limitations
            > # under the License.
            > #
            > ----------------------------------------------------------------------------
            > ---
            > # Defaults used throughout all collections
            > defaults:
            >   # If not specified, all data goes to this topic (optional)
            >   default-topic: some/default
            >
            > # Defines connections to PLCs
            > sources:
            >   # Connection to a S7 device
            >  - name: machineA
            >     # PLC4X connection URL
            >     url: s7://1.2.3.4/1/1
            >     jobs:
            >       # Just references the job "s7-dashboard". All data will be published
            > to the default topic
            >       - name: s7-dashboard
            >       # References the job "s7-heartbeat", however is configures the
            > output to go to the topic "heartbeat"
            >       - name: s7-heartbeat
            >         topic: heartbeat
            >
            >   # Connection to a second S7 device
            >   - name: machineB
            >     url: s7://10.20.30.40/1/1
            >     # Sets the default topic for this connection. All jobs data will go to
            > "heartbeat"
            >     topic: heartbeat
            >     jobs:
            >       - s7-heartbeat
            >
            >   # Connection to a Beckhoff device
            >   - name: machineC
            >     url: ads://1.2.3.4.5.6
            >     topic: heartbeat
            >     jobs:
            >       - ads-heartbeat
            >
            > # Defines what should be collected how often
            > jobs:
            >   # Defines a job to collect a set of fields on s7 devices every 500ms
            >   - name: s7-dashboard
            >     scrapeRate: 500
            >     fields:
            >       # The key will be used in the Kafka message to identify this field,
            > the value here contains the PLC4X address
            >       inputPreasure: %DB.DB1.4:INT
            >       outputPreasure: %Q1:BYTE
            >       temperature: %I3:INT
            >
            >   # Defines a second job to collect a set of fields on s7 devices every
            > 1000ms
            >   - name: s7-heartbeat
            >     scrapeRate: 1000
            >     fields:
            >       active: %I0.2:BOOL
            >
            >   # Defines a third job that collects data on Beckhoff devices
            >   - name: ads-heartbeat
            >     scrapeRate: 1000
            >     fields:
            >       active: Main.running
            >
            > I think it should be self-explanatory with my comments inline.
            >
            > What do you think?
            >
            > Chris
            >
            >
    
    
    
    
    


AW: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hi,

I think that's the best way then.

I agree with your suggestion.

Julian

Von meinem Mobiltelefon gesendet


-------- Ursprüngliche Nachricht --------
Betreff: Re: [KAFKA] Refactoring the Kafka Connect plugin?
Von: Christofer Dutz
An: dev@plc4x.apache.org
Cc:

Hi all,

Ok so it seems that Kafka only supports configuration via simple unstructured maps.
So if we want some hierarchical configuration like the proposed one, we'll have to do it log4j.properties-style:

name=plc-source-test
connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector

defaults.default-topic=some/default

sources.machineA.url=s7://1.2.3.4/1/1
sources.machineA.jobs.s7-dashboard.enabled=true
sources.machineA.jobs.s7-heartbeat.enabled=true
sources.machineA.jobs.s7-heartbeat.topic=heartbeat

sources.machineB.url=s7://10.20.30.40/1/1
sources.machineB.topic=heartbeat
sources.machineB.jobs.s7-heartbeat.enabled=true

sources.machineC.url=ads://1.2.3.4.5.6
sources.machineC.topic=heartbeat
sources.machineC.jobs.ads-heartbeat.enabled=true

jobs.s7-dashboard.scrapeRate=500
jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
jobs.s7-dashboard.fields.temperature=%I3:INT

jobs.s7-heartbeat.scrapeRate=1000
jobs.s7-heartbeat.fields.active=%I0.2:BOOL

jobs.ads-heartbeat.scrapeRate=1000
jobs.ads-heartbeat.active=Main.running


Chris



Am 01.08.19, 15:46 schrieb "Christofer Dutz" <ch...@c-ware.de>:

    Hi Kai,

    that document is exactly the one I'm currently using.
    What I'm currently working on is updating the current plugin to not schedule and handle the connection stuff manually, but use the scraper component of PLC4X.
    Also is the current configuration not production ready and I'll be working on to make it more easily usable.

    But it will definitely not hurt to have some Kafka Pro have a look at what we did and propose improvements. After all we want the thing to be rock-solid :-)

    Chris



    Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:

        Hi Chris,

        great that you will work on the connector.

        I am not deep technical, but if you need guidance from Kafka Connect
        experts, I can connect you to a Confluent colleague to can help with best
        practices for building the connector.

        For example, we have implemented a wildcard option into our MQTT Connector
        to map MQTT Topics to Kafka Topics in a more flexible way (e.g. 1000s of
        cars from different MQTT Topics can be routed into 1 Kafka Topic). This
        might also be interesting for this connector as you expect to various PLCs.

        This guide might also help:
        https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf



        On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <ch...@c-ware.de>
        wrote:

        > Hi all,
        >
        > I am currently planning on cleaning up the Kafka Connect adapter a little
        > as this was implemented as part of a proof of concept and is still I a
        > state I wouldn’t use in production ;-)
        > But a lot has happened since then and I’m planning on making it a really
        > usable tool in the next few days.
        >
        > A lot has changed since we created the integration module QT3 2018 and I
        > would like to refactor it to use the Scraper for the heavy lifting.
        >
        > Currently a user has to provide a parameter “query” which contains a
        > comma-separated list of connection-strings with appended address. This is
        > purely unmanageable.
        >
        > I would like to make it configurable via JSON or Yaml file.
        >
        > I think it would make sense to define groups of fields that are collected
        > on one device at an equal rate. So it’s pretty similar to the scraper
        > example, however I would like to not specify the source in the job, but the
        > other way around.
        > When specifying the “sources” I would also provide which jobs should run
        > on a given collection.
        > As the connector was initially showcased in a scenario where data had to
        > be collected on a big number of PLCs with equal specs,
        > I think this is the probably most important use-case and in this it is
        > also probably more common to add new devices to collect standard data on
        > than the other way around.
        >
        > Also should we provide the means to also set per connection to which
        > kafka-topic the data should be sent to.
        > We could provide the means to set a default and make it optional however.
        > When posting to a topic we also need to provide means for partitioning, so
        > I would provide sources with an optional “name”.
        > Each message would not only have the data requested, but also the
        > source-url, source-name and the job-name with a timestamp.
        >
        > So I guess it would look something like this:
        >
        > #
        > ----------------------------------------------------------------------------
        > # Licensed to the Apache Software Foundation (ASF) under one
        > # or more contributor license agreements.  See the NOTICE file
        > # distributed with this work for additional information
        > # regarding copyright ownership.  The ASF licenses this file
        > # to you under the Apache License, Version 2.0 (the
        > # "License"); you may not use this file except in compliance
        > # with the License.  You may obtain a copy of the License at
        > #
        > #    http://www.apache.org/licenses/LICENSE-2.0
        > #
        > # Unless required by applicable law or agreed to in writing,
        > # software distributed under the License is distributed on an
        > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
        > # KIND, either express or implied.  See the License for the
        > # specific language governing permissions and limitations
        > # under the License.
        > #
        > ----------------------------------------------------------------------------
        > ---
        > # Defaults used throughout all collections
        > defaults:
        >   # If not specified, all data goes to this topic (optional)
        >   default-topic: some/default
        >
        > # Defines connections to PLCs
        > sources:
        >   # Connection to a S7 device
        >  - name: machineA
        >     # PLC4X connection URL
        >     url: s7://1.2.3.4/1/1
        >     jobs:
        >       # Just references the job "s7-dashboard". All data will be published
        > to the default topic
        >       - name: s7-dashboard
        >       # References the job "s7-heartbeat", however is configures the
        > output to go to the topic "heartbeat"
        >       - name: s7-heartbeat
        >         topic: heartbeat
        >
        >   # Connection to a second S7 device
        >   - name: machineB
        >     url: s7://10.20.30.40/1/1
        >     # Sets the default topic for this connection. All jobs data will go to
        > "heartbeat"
        >     topic: heartbeat
        >     jobs:
        >       - s7-heartbeat
        >
        >   # Connection to a Beckhoff device
        >   - name: machineC
        >     url: ads://1.2.3.4.5.6
        >     topic: heartbeat
        >     jobs:
        >       - ads-heartbeat
        >
        > # Defines what should be collected how often
        > jobs:
        >   # Defines a job to collect a set of fields on s7 devices every 500ms
        >   - name: s7-dashboard
        >     scrapeRate: 500
        >     fields:
        >       # The key will be used in the Kafka message to identify this field,
        > the value here contains the PLC4X address
        >       inputPreasure: %DB.DB1.4:INT
        >       outputPreasure: %Q1:BYTE
        >       temperature: %I3:INT
        >
        >   # Defines a second job to collect a set of fields on s7 devices every
        > 1000ms
        >   - name: s7-heartbeat
        >     scrapeRate: 1000
        >     fields:
        >       active: %I0.2:BOOL
        >
        >   # Defines a third job that collects data on Beckhoff devices
        >   - name: ads-heartbeat
        >     scrapeRate: 1000
        >     fields:
        >       active: Main.running
        >
        > I think it should be self-explanatory with my comments inline.
        >
        > What do you think?
        >
        > Chris
        >
        >





Re: [KAFKA] Refactoring the Kafka Connect plugin?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi all,

Ok so it seems that Kafka only supports configuration via simple unstructured maps. 
So if we want some hierarchical configuration like the proposed one, we'll have to do it log4j.properties-style:

name=plc-source-test
connector.class=org.apache.plc4x.kafka.Plc4xSourceConnector

defaults.default-topic=some/default

sources.machineA.url=s7://1.2.3.4/1/1
sources.machineA.jobs.s7-dashboard.enabled=true
sources.machineA.jobs.s7-heartbeat.enabled=true
sources.machineA.jobs.s7-heartbeat.topic=heartbeat

sources.machineB.url=s7://10.20.30.40/1/1
sources.machineB.topic=heartbeat
sources.machineB.jobs.s7-heartbeat.enabled=true

sources.machineC.url=ads://1.2.3.4.5.6
sources.machineC.topic=heartbeat
sources.machineC.jobs.ads-heartbeat.enabled=true

jobs.s7-dashboard.scrapeRate=500
jobs.s7-dashboard.fields.inputPreasure=%DB.DB1.4:INT
jobs.s7-dashboard.fields.outputPreasure=%Q1:BYT
jobs.s7-dashboard.fields.temperature=%I3:INT

jobs.s7-heartbeat.scrapeRate=1000
jobs.s7-heartbeat.fields.active=%I0.2:BOOL

jobs.ads-heartbeat.scrapeRate=1000
jobs.ads-heartbeat.active=Main.running


Chris



Am 01.08.19, 15:46 schrieb "Christofer Dutz" <ch...@c-ware.de>:

    Hi Kai,
    
    that document is exactly the one I'm currently using. 
    What I'm currently working on is updating the current plugin to not schedule and handle the connection stuff manually, but use the scraper component of PLC4X.
    Also is the current configuration not production ready and I'll be working on to make it more easily usable.
    
    But it will definitely not hurt to have some Kafka Pro have a look at what we did and propose improvements. After all we want the thing to be rock-solid :-)
    
    Chris
    
    
    
    Am 31.07.19, 17:03 schrieb "Kai Wähner" <me...@gmail.com>:
    
        Hi Chris,
        
        great that you will work on the connector.
        
        I am not deep technical, but if you need guidance from Kafka Connect
        experts, I can connect you to a Confluent colleague to can help with best
        practices for building the connector.
        
        For example, we have implemented a wildcard option into our MQTT Connector
        to map MQTT Topics to Kafka Topics in a more flexible way (e.g. 1000s of
        cars from different MQTT Topics can be routed into 1 Kafka Topic). This
        might also be interesting for this connector as you expect to various PLCs.
        
        This guide might also help:
        https://www.confluent.io/wp-content/uploads/Verification-Guide-Confluent-Platform-Connectors-Integrations.pdf
        
        
        
        On Wed, Jul 31, 2019 at 4:39 PM Christofer Dutz <ch...@c-ware.de>
        wrote:
        
        > Hi all,
        >
        > I am currently planning on cleaning up the Kafka Connect adapter a little
        > as this was implemented as part of a proof of concept and is still I a
        > state I wouldn’t use in production ;-)
        > But a lot has happened since then and I’m planning on making it a really
        > usable tool in the next few days.
        >
        > A lot has changed since we created the integration module QT3 2018 and I
        > would like to refactor it to use the Scraper for the heavy lifting.
        >
        > Currently a user has to provide a parameter “query” which contains a
        > comma-separated list of connection-strings with appended address. This is
        > purely unmanageable.
        >
        > I would like to make it configurable via JSON or Yaml file.
        >
        > I think it would make sense to define groups of fields that are collected
        > on one device at an equal rate. So it’s pretty similar to the scraper
        > example, however I would like to not specify the source in the job, but the
        > other way around.
        > When specifying the “sources” I would also provide which jobs should run
        > on a given collection.
        > As the connector was initially showcased in a scenario where data had to
        > be collected on a big number of PLCs with equal specs,
        > I think this is the probably most important use-case and in this it is
        > also probably more common to add new devices to collect standard data on
        > than the other way around.
        >
        > Also should we provide the means to also set per connection to which
        > kafka-topic the data should be sent to.
        > We could provide the means to set a default and make it optional however.
        > When posting to a topic we also need to provide means for partitioning, so
        > I would provide sources with an optional “name”.
        > Each message would not only have the data requested, but also the
        > source-url, source-name and the job-name with a timestamp.
        >
        > So I guess it would look something like this:
        >
        > #
        > ----------------------------------------------------------------------------
        > # Licensed to the Apache Software Foundation (ASF) under one
        > # or more contributor license agreements.  See the NOTICE file
        > # distributed with this work for additional information
        > # regarding copyright ownership.  The ASF licenses this file
        > # to you under the Apache License, Version 2.0 (the
        > # "License"); you may not use this file except in compliance
        > # with the License.  You may obtain a copy of the License at
        > #
        > #    http://www.apache.org/licenses/LICENSE-2.0
        > #
        > # Unless required by applicable law or agreed to in writing,
        > # software distributed under the License is distributed on an
        > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
        > # KIND, either express or implied.  See the License for the
        > # specific language governing permissions and limitations
        > # under the License.
        > #
        > ----------------------------------------------------------------------------
        > ---
        > # Defaults used throughout all collections
        > defaults:
        >   # If not specified, all data goes to this topic (optional)
        >   default-topic: some/default
        >
        > # Defines connections to PLCs
        > sources:
        >   # Connection to a S7 device
        >  - name: machineA
        >     # PLC4X connection URL
        >     url: s7://1.2.3.4/1/1
        >     jobs:
        >       # Just references the job "s7-dashboard". All data will be published
        > to the default topic
        >       - name: s7-dashboard
        >       # References the job "s7-heartbeat", however is configures the
        > output to go to the topic "heartbeat"
        >       - name: s7-heartbeat
        >         topic: heartbeat
        >
        >   # Connection to a second S7 device
        >   - name: machineB
        >     url: s7://10.20.30.40/1/1
        >     # Sets the default topic for this connection. All jobs data will go to
        > "heartbeat"
        >     topic: heartbeat
        >     jobs:
        >       - s7-heartbeat
        >
        >   # Connection to a Beckhoff device
        >   - name: machineC
        >     url: ads://1.2.3.4.5.6
        >     topic: heartbeat
        >     jobs:
        >       - ads-heartbeat
        >
        > # Defines what should be collected how often
        > jobs:
        >   # Defines a job to collect a set of fields on s7 devices every 500ms
        >   - name: s7-dashboard
        >     scrapeRate: 500
        >     fields:
        >       # The key will be used in the Kafka message to identify this field,
        > the value here contains the PLC4X address
        >       inputPreasure: %DB.DB1.4:INT
        >       outputPreasure: %Q1:BYTE
        >       temperature: %I3:INT
        >
        >   # Defines a second job to collect a set of fields on s7 devices every
        > 1000ms
        >   - name: s7-heartbeat
        >     scrapeRate: 1000
        >     fields:
        >       active: %I0.2:BOOL
        >
        >   # Defines a third job that collects data on Beckhoff devices
        >   - name: ads-heartbeat
        >     scrapeRate: 1000
        >     fields:
        >       active: Main.running
        >
        > I think it should be self-explanatory with my comments inline.
        >
        > What do you think?
        >
        > Chris
        >
        >