You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by George <ge...@gmail.com> on 2020/01/15 03:42:51 UTC

ingesting web server logs, or log4j log files from a boss server

Hi all.

Please advise, a real noob here still, unpacking how the stack still
works...

if I have a mySQL server, or a web server, or a 2 node JBOSS cluster.

If I want to use the mysql connector to connect to the MySQL DB to pull
data using CDC... then I need to install the Kafka stack on the DB server,
I understand that this will be a stand alone install, assume with no
zookeeper involved.

Similarly for the apache web server and the 2 JBOSS servers

G

-- 
You have the obligation to inform one honestly of the risk, and as a person
you are committed to educate yourself to the total risk in any activity!

Once informed & totally aware of the risk,
every fool has the right to kill or injure themselves as they see fit!

Re: ingesting web server logs, or log4j log files from a boss server

Posted by George <ge...@gmail.com>.

thanks

G

On Wed, Jan 15, 2020 at 6:19 PM Robin Moffatt <ro...@confluent.io> wrote:

> If spooldir doesn't suit, there's also
> https://github.com/streamthoughts/kafka-connect-file-pulse to check out.
> Also bear in mind tools like filebeat from Elastic support Kafka as a
> target.
>
>
> --
>
> Robin Moffatt | Senior Developer Advocate | robin@confluent.io | @rmoff
>
>
> On Wed, 15 Jan 2020 at 12:48, George <ge...@gmail.com> wrote:
>
> > Hi Tom
> >
> > will do. for now I have 4 specific file types I need to ingest.
> >
> > 1. reading apache web server log files, http.log's.
> > 2. reading in our custom log files
> > 3. reading in log4j log files
> > 4. mysql connection as a source
> > 5. cassandra connection, as a sink
> >
> > I can not use NFS mounting the source file system to the Connect cluster,
> > we don't allow NFS.
> >
> > I'm hoping to pull #1-#3 in as each line a the value field of a JSON
> > message, then maybe use stream process, or kSQL to unpack into a 2nd
> > message which can then be consumed, analysed etc.
> >
> > bit amazed there is not a predefined connector for http logs files though
> >
> > G
> >
> >
> > On Wed, Jan 15, 2020 at 12:32 PM Tom Bentley <tb...@redhat.com>
> wrote:
> >
> > > Hi George,
> > >
> > > Since you mentioned CDC specifically you might want to check out
> > Debezium (
> > > https://debezium.io/) which operates as a connector of the sort Robin
> > > referred to and does CDC for MySQL and others.
> > >
> > > Cheers,
> > >
> > > Tom
> > >
> > > On Wed, Jan 15, 2020 at 10:18 AM Robin Moffatt <ro...@confluent.io>
> > wrote:
> > >
> > > > The integration part of Apache Kafka that you're talking about is
> > > > called Kafka Connect. Kafka Connect runs as its own process, known as
> > > > a Kafka Connect Worker, either on its own or as part of a cluster.
> > Kafka
> > > > Connect will usually be deployed on a separate instance from the
> Kafka
> > > > brokers.
> > > >
> > > > Kafka Connect connectors will usually connect to the external system
> > over
> > > > the network if that makes sense (e.g. a database) but not always
> (e.g.
> > if
> > > > its acting as a syslog endpoint, or maybe processing local files).
> > > >
> > > > You can learn more about Kafka Connect and its deployment model here:
> > > > https://rmoff.dev/crunch19-zero-to-hero-kafka-connect
> > > >
> > > >
> > > > --
> > > >
> > > > Robin Moffatt | Senior Developer Advocate | robin@confluent.io |
> > @rmoff
> > > >
> > > >
> > > > On Wed, 15 Jan 2020 at 03:43, George <ge...@gmail.com> wrote:
> > > >
> > > > > Hi all.
> > > > >
> > > > > Please advise, a real noob here still, unpacking how the stack
> still
> > > > > works...
> > > > >
> > > > > if I have a mySQL server, or a web server, or a 2 node JBOSS
> cluster.
> > > > >
> > > > > If I want to use the mysql connector to connect to the MySQL DB to
> > pull
> > > > > data using CDC... then I need to install the Kafka stack on the DB
> > > > server,
> > > > > I understand that this will be a stand alone install, assume with
> no
> > > > > zookeeper involved.
> > > > >
> > > > > Similarly for the apache web server and the 2 JBOSS servers
> > > > >
> > > > > G
> > > > >
> > > > > --
> > > > > You have the obligation to inform one honestly of the risk, and as
> a
> > > > person
> > > > > you are committed to educate yourself to the total risk in any
> > > activity!
> > > > >
> > > > > Once informed & totally aware of the risk,
> > > > > every fool has the right to kill or injure themselves as they see
> > fit!
> > > > >
> > > >
> > >
> >
> >
> > --
> > You have the obligation to inform one honestly of the risk, and as a
> person
> > you are committed to educate yourself to the total risk in any activity!
> >
> > Once informed & totally aware of the risk,
> > every fool has the right to kill or injure themselves as they see fit!
> >
>


-- 
You have the obligation to inform one honestly of the risk, and as a person
you are committed to educate yourself to the total risk in any activity!

Once informed & totally aware of the risk,
every fool has the right to kill or injure themselves as they see fit!

Re: ingesting web server logs, or log4j log files from a boss server

Posted by Robin Moffatt <ro...@confluent.io>.

If spooldir doesn't suit, there's also
https://github.com/streamthoughts/kafka-connect-file-pulse to check out.
Also bear in mind tools like filebeat from Elastic support Kafka as a
target.


-- 

Robin Moffatt | Senior Developer Advocate | robin@confluent.io | @rmoff


On Wed, 15 Jan 2020 at 12:48, George <ge...@gmail.com> wrote:

> Hi Tom
>
> will do. for now I have 4 specific file types I need to ingest.
>
> 1. reading apache web server log files, http.log's.
> 2. reading in our custom log files
> 3. reading in log4j log files
> 4. mysql connection as a source
> 5. cassandra connection, as a sink
>
> I can not use NFS mounting the source file system to the Connect cluster,
> we don't allow NFS.
>
> I'm hoping to pull #1-#3 in as each line a the value field of a JSON
> message, then maybe use stream process, or kSQL to unpack into a 2nd
> message which can then be consumed, analysed etc.
>
> bit amazed there is not a predefined connector for http logs files though
>
> G
>
>
> On Wed, Jan 15, 2020 at 12:32 PM Tom Bentley <tb...@redhat.com> wrote:
>
> > Hi George,
> >
> > Since you mentioned CDC specifically you might want to check out
> Debezium (
> > https://debezium.io/) which operates as a connector of the sort Robin
> > referred to and does CDC for MySQL and others.
> >
> > Cheers,
> >
> > Tom
> >
> > On Wed, Jan 15, 2020 at 10:18 AM Robin Moffatt <ro...@confluent.io>
> wrote:
> >
> > > The integration part of Apache Kafka that you're talking about is
> > > called Kafka Connect. Kafka Connect runs as its own process, known as
> > > a Kafka Connect Worker, either on its own or as part of a cluster.
> Kafka
> > > Connect will usually be deployed on a separate instance from the Kafka
> > > brokers.
> > >
> > > Kafka Connect connectors will usually connect to the external system
> over
> > > the network if that makes sense (e.g. a database) but not always (e.g.
> if
> > > its acting as a syslog endpoint, or maybe processing local files).
> > >
> > > You can learn more about Kafka Connect and its deployment model here:
> > > https://rmoff.dev/crunch19-zero-to-hero-kafka-connect
> > >
> > >
> > > --
> > >
> > > Robin Moffatt | Senior Developer Advocate | robin@confluent.io |
> @rmoff
> > >
> > >
> > > On Wed, 15 Jan 2020 at 03:43, George <ge...@gmail.com> wrote:
> > >
> > > > Hi all.
> > > >
> > > > Please advise, a real noob here still, unpacking how the stack still
> > > > works...
> > > >
> > > > if I have a mySQL server, or a web server, or a 2 node JBOSS cluster.
> > > >
> > > > If I want to use the mysql connector to connect to the MySQL DB to
> pull
> > > > data using CDC... then I need to install the Kafka stack on the DB
> > > server,
> > > > I understand that this will be a stand alone install, assume with no
> > > > zookeeper involved.
> > > >
> > > > Similarly for the apache web server and the 2 JBOSS servers
> > > >
> > > > G
> > > >
> > > > --
> > > > You have the obligation to inform one honestly of the risk, and as a
> > > person
> > > > you are committed to educate yourself to the total risk in any
> > activity!
> > > >
> > > > Once informed & totally aware of the risk,
> > > > every fool has the right to kill or injure themselves as they see
> fit!
> > > >
> > >
> >
>
>
> --
> You have the obligation to inform one honestly of the risk, and as a person
> you are committed to educate yourself to the total risk in any activity!
>
> Once informed & totally aware of the risk,
> every fool has the right to kill or injure themselves as they see fit!
>

Re: ingesting web server logs, or log4j log files from a boss server

Posted by George <ge...@gmail.com>.

Hi Tom

will do. for now I have 4 specific file types I need to ingest.

1. reading apache web server log files, http.log's.
2. reading in our custom log files
3. reading in log4j log files
4. mysql connection as a source
5. cassandra connection, as a sink

I can not use NFS mounting the source file system to the Connect cluster,
we don't allow NFS.

I'm hoping to pull #1-#3 in as each line a the value field of a JSON
message, then maybe use stream process, or kSQL to unpack into a 2nd
message which can then be consumed, analysed etc.

bit amazed there is not a predefined connector for http logs files though

G


On Wed, Jan 15, 2020 at 12:32 PM Tom Bentley <tb...@redhat.com> wrote:

> Hi George,
>
> Since you mentioned CDC specifically you might want to check out Debezium (
> https://debezium.io/) which operates as a connector of the sort Robin
> referred to and does CDC for MySQL and others.
>
> Cheers,
>
> Tom
>
> On Wed, Jan 15, 2020 at 10:18 AM Robin Moffatt <ro...@confluent.io> wrote:
>
> > The integration part of Apache Kafka that you're talking about is
> > called Kafka Connect. Kafka Connect runs as its own process, known as
> > a Kafka Connect Worker, either on its own or as part of a cluster. Kafka
> > Connect will usually be deployed on a separate instance from the Kafka
> > brokers.
> >
> > Kafka Connect connectors will usually connect to the external system over
> > the network if that makes sense (e.g. a database) but not always (e.g. if
> > its acting as a syslog endpoint, or maybe processing local files).
> >
> > You can learn more about Kafka Connect and its deployment model here:
> > https://rmoff.dev/crunch19-zero-to-hero-kafka-connect
> >
> >
> > --
> >
> > Robin Moffatt | Senior Developer Advocate | robin@confluent.io | @rmoff
> >
> >
> > On Wed, 15 Jan 2020 at 03:43, George <ge...@gmail.com> wrote:
> >
> > > Hi all.
> > >
> > > Please advise, a real noob here still, unpacking how the stack still
> > > works...
> > >
> > > if I have a mySQL server, or a web server, or a 2 node JBOSS cluster.
> > >
> > > If I want to use the mysql connector to connect to the MySQL DB to pull
> > > data using CDC... then I need to install the Kafka stack on the DB
> > server,
> > > I understand that this will be a stand alone install, assume with no
> > > zookeeper involved.
> > >
> > > Similarly for the apache web server and the 2 JBOSS servers
> > >
> > > G
> > >
> > > --
> > > You have the obligation to inform one honestly of the risk, and as a
> > person
> > > you are committed to educate yourself to the total risk in any
> activity!
> > >
> > > Once informed & totally aware of the risk,
> > > every fool has the right to kill or injure themselves as they see fit!
> > >
> >
>


-- 
You have the obligation to inform one honestly of the risk, and as a person
you are committed to educate yourself to the total risk in any activity!

Once informed & totally aware of the risk,
every fool has the right to kill or injure themselves as they see fit!

Re: ingesting web server logs, or log4j log files from a boss server

Posted by Tom Bentley <tb...@redhat.com>.

Hi George,

Since you mentioned CDC specifically you might want to check out Debezium (
https://debezium.io/) which operates as a connector of the sort Robin
referred to and does CDC for MySQL and others.

Cheers,

Tom

On Wed, Jan 15, 2020 at 10:18 AM Robin Moffatt <ro...@confluent.io> wrote:

> The integration part of Apache Kafka that you're talking about is
> called Kafka Connect. Kafka Connect runs as its own process, known as
> a Kafka Connect Worker, either on its own or as part of a cluster. Kafka
> Connect will usually be deployed on a separate instance from the Kafka
> brokers.
>
> Kafka Connect connectors will usually connect to the external system over
> the network if that makes sense (e.g. a database) but not always (e.g. if
> its acting as a syslog endpoint, or maybe processing local files).
>
> You can learn more about Kafka Connect and its deployment model here:
> https://rmoff.dev/crunch19-zero-to-hero-kafka-connect
>
>
> --
>
> Robin Moffatt | Senior Developer Advocate | robin@confluent.io | @rmoff
>
>
> On Wed, 15 Jan 2020 at 03:43, George <ge...@gmail.com> wrote:
>
> > Hi all.
> >
> > Please advise, a real noob here still, unpacking how the stack still
> > works...
> >
> > if I have a mySQL server, or a web server, or a 2 node JBOSS cluster.
> >
> > If I want to use the mysql connector to connect to the MySQL DB to pull
> > data using CDC... then I need to install the Kafka stack on the DB
> server,
> > I understand that this will be a stand alone install, assume with no
> > zookeeper involved.
> >
> > Similarly for the apache web server and the 2 JBOSS servers
> >
> > G
> >
> > --
> > You have the obligation to inform one honestly of the risk, and as a
> person
> > you are committed to educate yourself to the total risk in any activity!
> >
> > Once informed & totally aware of the risk,
> > every fool has the right to kill or injure themselves as they see fit!
> >
>

Re: ingesting web server logs, or log4j log files from a boss server

Posted by George <ge...@gmail.com>.

Hi Robin

Ok, been reading, been asking some more questions and as far as something
like Spooldir is concerned, as a connector, I will need to NFS mount the
directory where the logs files are, be that our own custom files, or apache
http.log files onto the Connect cluster.

Yes a connector like a Cassandra connector (and syslog as it can push the
syslog information to the connect cluster, similarly log4j can push towards
a cluster) can work remotely, but connectors working on file ingesting
needs.

G

On Wed, Jan 15, 2020 at 12:12 PM Robin Moffatt <ro...@confluent.io> wrote:

> The integration part of Apache Kafka that you're talking about is
> called Kafka Connect. Kafka Connect runs as its own process, known as
> a Kafka Connect Worker, either on its own or as part of a cluster. Kafka
> Connect will usually be deployed on a separate instance from the Kafka
> brokers.
>
> Kafka Connect connectors will usually connect to the external system over
> the network if that makes sense (e.g. a database) but not always (e.g. if
> its acting as a syslog endpoint, or maybe processing local files).
>
> You can learn more about Kafka Connect and its deployment model here:
> https://rmoff.dev/crunch19-zero-to-hero-kafka-connect
>
>
> --
>
> Robin Moffatt | Senior Developer Advocate | robin@confluent.io | @rmoff
>
>
> On Wed, 15 Jan 2020 at 03:43, George <ge...@gmail.com> wrote:
>
> > Hi all.
> >
> > Please advise, a real noob here still, unpacking how the stack still
> > works...
> >
> > if I have a mySQL server, or a web server, or a 2 node JBOSS cluster.
> >
> > If I want to use the mysql connector to connect to the MySQL DB to pull
> > data using CDC... then I need to install the Kafka stack on the DB
> server,
> > I understand that this will be a stand alone install, assume with no
> > zookeeper involved.
> >
> > Similarly for the apache web server and the 2 JBOSS servers
> >
> > G
> >
> > --
> > You have the obligation to inform one honestly of the risk, and as a
> person
> > you are committed to educate yourself to the total risk in any activity!
> >
> > Once informed & totally aware of the risk,
> > every fool has the right to kill or injure themselves as they see fit!
> >
>


-- 
You have the obligation to inform one honestly of the risk, and as a person
you are committed to educate yourself to the total risk in any activity!

Once informed & totally aware of the risk,
every fool has the right to kill or injure themselves as they see fit!

Re: ingesting web server logs, or log4j log files from a boss server

Posted by Robin Moffatt <ro...@confluent.io>.

The integration part of Apache Kafka that you're talking about is
called Kafka Connect. Kafka Connect runs as its own process, known as
a Kafka Connect Worker, either on its own or as part of a cluster. Kafka
Connect will usually be deployed on a separate instance from the Kafka
brokers.

Kafka Connect connectors will usually connect to the external system over
the network if that makes sense (e.g. a database) but not always (e.g. if
its acting as a syslog endpoint, or maybe processing local files).

You can learn more about Kafka Connect and its deployment model here:
https://rmoff.dev/crunch19-zero-to-hero-kafka-connect

-- 

Robin Moffatt | Senior Developer Advocate | robin@confluent.io | @rmoff

On Wed, 15 Jan 2020 at 03:43, George <ge...@gmail.com> wrote:

> Hi all.
>
> Please advise, a real noob here still, unpacking how the stack still
> works...
>
> if I have a mySQL server, or a web server, or a 2 node JBOSS cluster.
>
> If I want to use the mysql connector to connect to the MySQL DB to pull
> data using CDC... then I need to install the Kafka stack on the DB server,
> I understand that this will be a stand alone install, assume with no
> zookeeper involved.
>
> Similarly for the apache web server and the 2 JBOSS servers
>
> G
>
> --
> You have the obligation to inform one honestly of the risk, and as a person
> you are committed to educate yourself to the total risk in any activity!
>
> Once informed & totally aware of the risk,
> every fool has the right to kill or injure themselves as they see fit!
>