You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/06/09 16:33:15 UTC

[GitHub] [kafka] tombentley opened a new pull request #8839: KIP-585: Documentation

tombentley opened a new pull request #8839:
URL: https://github.com/apache/kafka/pull/8839


   Add documentation for using transformation predicates.
   Add `PredicateDoc` for generating predicate config docs, following the style of `TransformationDoc`.
   Fix the header depth mismatch.
   Avoid generating HTML ids based purely on the config name since there
   are very likely to conflict (e.g. #name). Instead allow passing a function
   which can be used to generate an id from a config key.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kkonstantine commented on pull request #8839: MINOR: Documentation for KIP-585

Posted by GitBox <gi...@apache.org>.
kkonstantine commented on pull request #8839:
URL: https://github.com/apache/kafka/pull/8839#issuecomment-644537352


   Merged to `trunk` and `2.6`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] tombentley commented on pull request #8839: KIP-585: Documentation

Posted by GitBox <gi...@apache.org>.
tombentley commented on pull request #8839:
URL: https://github.com/apache/kafka/pull/8839#issuecomment-643123172


   @kkonstantine thanks for those, and I can confirm that I _did_ test the docs locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] tombentley commented on pull request #8839: KIP-585: Documentation

Posted by GitBox <gi...@apache.org>.
tombentley commented on pull request #8839:
URL: https://github.com/apache/kafka/pull/8839#issuecomment-641201269


   @kkonstantine please could you review this, thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] tombentley commented on pull request #8839: KIP-585: Documentation

Posted by GitBox <gi...@apache.org>.
tombentley commented on pull request #8839:
URL: https://github.com/apache/kafka/pull/8839#issuecomment-644176096


   @kkonstantine @rhauch any chance this could be merged for 2.6, or is it too late now?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kkonstantine merged pull request #8839: MINOR: Documentation for KIP-585

Posted by GitBox <gi...@apache.org>.
kkonstantine merged pull request #8839:
URL: https://github.com/apache/kafka/pull/8839


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kkonstantine commented on a change in pull request #8839: KIP-585: Documentation

Posted by GitBox <gi...@apache.org>.
kkonstantine commented on a change in pull request #8839:
URL: https://github.com/apache/kafka/pull/8839#discussion_r439227501



##########
File path: docs/connect.html
##########
@@ -180,13 +182,80 @@ <h4><a id="connect_transforms" href="#connect_transforms">Transformations</a></h
         <li>SetSchemaMetadata - modify the schema name or version</li>
         <li>TimestampRouter - Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps</li>
         <li>RegexRouter - modify the topic of a record based on original topic, replacement string and a regular expression</li>
+        <li>Filter - Removes messages from all further processing. This is used with a <a href="#connect_predicates">predicate</a> to selectively filter certain messages.</li>
     </ul>
 
     <p>Details on how to configure each transformation are listed below:</p>
 
 
     <!--#include virtual="generated/connect_transforms.html" -->
 
+
+    <h5><a id="connect_predicates" href="#connect_predicates">Predicates</a></h5>
+
+    <p>Transformations can be configured with prediates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p>

Review comment:
       ```suggestion
       <p>Transformations can be configured with predicates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p>
   ```

##########
File path: docs/connect.html
##########
@@ -180,13 +182,80 @@ <h4><a id="connect_transforms" href="#connect_transforms">Transformations</a></h
         <li>SetSchemaMetadata - modify the schema name or version</li>
         <li>TimestampRouter - Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps</li>
         <li>RegexRouter - modify the topic of a record based on original topic, replacement string and a regular expression</li>
+        <li>Filter - Removes messages from all further processing. This is used with a <a href="#connect_predicates">predicate</a> to selectively filter certain messages.</li>
     </ul>
 
     <p>Details on how to configure each transformation are listed below:</p>
 
 
     <!--#include virtual="generated/connect_transforms.html" -->
 
+
+    <h5><a id="connect_predicates" href="#connect_predicates">Predicates</a></h5>
+
+    <p>Transformations can be configured with prediates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p>
+
+    <p>Predicates are specified in the connector configuration.</p>
+
+    <ul>
+        <li><code>predicates</code> - Set of aliases for the predicates to be applied to some of the transformations.</li>
+        <li><code>predicates.$alias.type</code> - Fully qualified class name for the predicate.</li>
+        <li><code>predicates.$alias.$predicateSpecificConfig</code> - Configuration properties for the predicate.</li>
+    </ul>
+
+    <p>All transformations have the implicit config properties <code>predicate</code> and <code>negate</code>. A predicular predicate is associated with a transformation by setting the transformation's <code>predicate</code> config to the predicate's alias. The predicate's value can be reversed using the <code>negate</code> configuration property.</p>
+
+    <p>For example, suppose you have a source connector which produces messages to many different topics and you want to:</p>
+    <ul>
+        <li>filter out the messages in the 'foo' topic entirely</li>
+        <li>apply the ExtractField transformation with the field name 'other_field' to records in all topics <i>except</i> the topic 'bar'</li>
+    </ul>
+
+    <p>To do this we need to first to filter out the records destined for the topic 'foo'. The Filter transformation removes records from further processing, and can use the TopicNameMatches predicate to apply the transformation only to records in topics which match a certain regular expression. TopicNameMatches's only configuration property is <code>pattern</code> which is a Java regular expression for matching against the topic name. The configuration would look like this:</p>
+
+    <pre class="brush: text;">
+        transforms=Filter
+        transforms.Filter.type=org.apache.kafka.connect.transforms.Filter
+        transforms.Filter.predicate=IsFoo
+
+        predicates=IsFoo
+        predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatches
+        predicates.IsFoo.pattern=foo
+    </pre>
+        
+    <p>Next we need to apply ExtractField only when the topic name of the record is not 'bar'. We can't just use TopicNameMatches directly, because that would apply the transformation to matching topic names, not topic names which do <i>not</i> match. The transformation's implicit <code>negate</code> config properties allows us to invert the set of records which a predicate matches. Adding the configuration for this to the previous example we arrive at:</p>
+
+    <pre class="brush: text;">
+        transforms=Filter,Extract
+        transforms.Filter.type=org.apache.kafka.connect.transforms.Filter
+        transforms.Filter.predicate=IsFoo
+
+        transforms.Extract.type=org.apache.kafka.connect.transforms.ExtractField$Key
+        transforms.Extract.field=other_field
+        transforms.Extract.predicate=IsBar
+        transforms.Extract.negate=true
+
+        predicates=IsFoo,IsBar
+        predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatches
+        predicates.IsFoo.pattern=foo
+
+        predicates.IsBar.type=org.apache.kafka.connect.predicates.TopicNameMatches
+        predicates.IsBar.pattern=bar
+    </pre>
+
+    <p>Kafka Connect includes the following predicates:</p>
+
+    <ul>
+        <li><code>TopicNameMatches</code> - matches records in a topic with a name matching a particular Java regular expression.</li>
+        <li><code>HasHeaderKey</code> - matches records which have a header with the given key.</li>
+        <li><code>RecordIsTombstone</code> - matches tombstone records, that is, those will a null value.</li>

Review comment:
       ```suggestion
           <li><code>RecordIsTombstone</code> - matches tombstone records, that is records with a null value.</li>
   ```

##########
File path: docs/connect.html
##########
@@ -180,13 +182,80 @@ <h4><a id="connect_transforms" href="#connect_transforms">Transformations</a></h
         <li>SetSchemaMetadata - modify the schema name or version</li>
         <li>TimestampRouter - Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps</li>
         <li>RegexRouter - modify the topic of a record based on original topic, replacement string and a regular expression</li>
+        <li>Filter - Removes messages from all further processing. This is used with a <a href="#connect_predicates">predicate</a> to selectively filter certain messages.</li>
     </ul>
 
     <p>Details on how to configure each transformation are listed below:</p>
 
 
     <!--#include virtual="generated/connect_transforms.html" -->
 
+
+    <h5><a id="connect_predicates" href="#connect_predicates">Predicates</a></h5>
+
+    <p>Transformations can be configured with prediates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p>
+
+    <p>Predicates are specified in the connector configuration.</p>
+
+    <ul>
+        <li><code>predicates</code> - Set of aliases for the predicates to be applied to some of the transformations.</li>
+        <li><code>predicates.$alias.type</code> - Fully qualified class name for the predicate.</li>
+        <li><code>predicates.$alias.$predicateSpecificConfig</code> - Configuration properties for the predicate.</li>
+    </ul>
+
+    <p>All transformations have the implicit config properties <code>predicate</code> and <code>negate</code>. A predicular predicate is associated with a transformation by setting the transformation's <code>predicate</code> config to the predicate's alias. The predicate's value can be reversed using the <code>negate</code> configuration property.</p>
+
+    <p>For example, suppose you have a source connector which produces messages to many different topics and you want to:</p>
+    <ul>
+        <li>filter out the messages in the 'foo' topic entirely</li>
+        <li>apply the ExtractField transformation with the field name 'other_field' to records in all topics <i>except</i> the topic 'bar'</li>
+    </ul>
+
+    <p>To do this we need to first to filter out the records destined for the topic 'foo'. The Filter transformation removes records from further processing, and can use the TopicNameMatches predicate to apply the transformation only to records in topics which match a certain regular expression. TopicNameMatches's only configuration property is <code>pattern</code> which is a Java regular expression for matching against the topic name. The configuration would look like this:</p>

Review comment:
       ```suggestion
       <p>To do this we need first to filter out the records destined for the topic 'foo'. The Filter transformation removes records from further processing, and can use the TopicNameMatches predicate to apply the transformation only to records in topics which match a certain regular expression. TopicNameMatches's only configuration property is <code>pattern</code> which is a Java regular expression for matching against the topic name. The configuration would look like this:</p>
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rhauch commented on pull request #8839: KIP-585: Documentation

Posted by GitBox <gi...@apache.org>.
rhauch commented on pull request #8839:
URL: https://github.com/apache/kafka/pull/8839#issuecomment-644382512


   @tombentley, yes we'll want to merge this and backport to the `2.6` branch. That branch is not yet frozen for documentation or tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] tombentley commented on pull request #8839: MINOR: Documentation for KIP-585

Posted by GitBox <gi...@apache.org>.
tombentley commented on pull request #8839:
URL: https://github.com/apache/kafka/pull/8839#issuecomment-644606253


   Thanks @kkonstantine!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org