You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/02/25 03:11:33 UTC

[GitHub] [incubator-pinot] npawar opened a new pull request #6611: SegmentWriter interface

npawar opened a new pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611


   Introducing the SegmentWriter interface as proposed in https://github.com/apache/incubator-pinot/issues/6610


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r589047280



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initialize the {@link SegmentWriter} with details about the Pinot cluster and table
+   */
+  void init(PinotConfiguration conf)

Review comment:
       should we make it a concrete object?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-785559563


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=h1) Report
   > Merging [#6611](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=desc) (3c3229b) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `22.40%`.
   > The diff coverage is `42.26%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6611/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #6611       +/-   ##
   ===========================================
   - Coverage   66.44%   44.04%   -22.41%     
   ===========================================
     Files        1075     1363      +288     
     Lines       54773    66548    +11775     
     Branches     8168     9702     +1534     
   ===========================================
   - Hits        36396    29311     -7085     
   - Misses      15700    34790    +19090     
   + Partials     2677     2447      -230     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `44.04% <42.26%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...t/broker/broker/BasicAuthAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0Jhc2ljQXV0aEFjY2Vzc0NvbnRyb2xGYWN0b3J5LmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=) | `0.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ceselector/StrictReplicaGroupInstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL1N0cmljdFJlcGxpY2FHcm91cEluc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...roker/routing/segmentpruner/TimeSegmentPruner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL1RpbWVTZWdtZW50UHJ1bmVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...roker/routing/segmentpruner/interval/Interval.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL2ludGVydmFsL0ludGVydmFsLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | ... and [1392 more](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=footer). Last update [78152cd...3c3229b](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r589844201



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter {

Review comment:
       Not sure what you mean. That's an implementation specific detail right? For MutableSegmentImpl, we would have a SegmentWriterImpl, such that the `collect` call will call the `realtimeSegment#index` and the `flush` will build the segment exactly as it is being done today.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r582539241



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter {
+
+  /**
+   * Initialize the {@link SegmentWriter} with details about the Pinot cluster and table
+   */
+  void init(PinotConfiguration conf);
+
+  /**
+   * Collect a single {@link GenericRow} into a buffer.
+   * This row is not available in Pinot until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row);

Review comment:
       shall we follow the API as discussed on the design doc?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592674190



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow[] rowBatch)
+      throws IOException;
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the batchConfigs.

Review comment:
       I think returning `segmentURI ` from `flush` will be good, so as you said the caller can decide on uploading some specific segments, instead of all under the folder.
   
   The async is just the wrapper implementation, it shall package some default fault-tolerance mechanisms to handle generation/push fails with retry etc. Async just means that the caller can decide to run it in a separate thread and not to block `segmentWriter.collect`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
snleee commented on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-790876594


   @amrishlal Why do you need to get metadata information from `SegmentWriter`? 
   
   I think that those stats will be used internal implementation and as long as the implementation internally keeps those stats, we can generate segments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r589841977



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initialize the {@link SegmentWriter} with details about the Pinot cluster and table
+   */
+  void init(PinotConfiguration conf)
+    throws IOException;
+
+  /**
+   * Collect a single {@link GenericRow} into a buffer.
+   * This row is not available in Pinot until a <code>flush()</code> is invoked.

Review comment:
       That would depend on the impl. For say an OFFLINE table, we'd use something like a FileBasedBuffer, and we'll only get the rows into Pinot for querying after a flush is called.
   But for a realtime table, we'd use a MutableSegmentBuffer, in which case as soon as it's collected, we'll be able to query it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] amrishlal commented on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
amrishlal commented on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-786835396


   Interface doesn't appear to have any get method, nor are the existing method returning any information. How will we get metadata information from interface: number of rows written so far, size of segment file, name of underlying implementation (like jdbc driver name). Can multiple threads use this interface to feed information into a segment?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r583012605



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter {

Review comment:
       How does this tie in with columnar segment build?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r593465252



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/uploader/SegmentUploader.java
##########
@@ -34,10 +36,17 @@
   void init(TableConfig tableConfig)
       throws Exception;
 
+  /**
+   * Uploads the segment tar file to the cluster
+   * @param segmentTarFile URI of segment tar file
+   */
+  void uploadSegment(URI segmentTarFile)
+      throws Exception;
+
   /**
    * Uploads the segments from the segmentDir to the cluster
-   * @param segmentDir URI of segment tar file or URI of directory containing segment tar files
+   * @param segmentDir URI of directory containing segment tar files
    */
-  void upload(URI segmentDir)
+  void uploadSegments(URI segmentDir)

Review comment:
       there's 2 methods. `uploadSegment` takes the segmentTarFile, `uploadSegments` takes the segmentDir.
   the wrapper class can make the call of which API to call. Potentially flush can also make multiple segments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592031623



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/uploader/SegmentUploader.java
##########
@@ -0,0 +1,38 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.uploader;
+
+import java.net.URI;
+
+
+/**
+ * Interface for uploading segments to Pinot
+ */
+public interface SegmentUploader {

Review comment:
       We would probably have only in Pinot. We could maybe split them as TarImpl, UriImpl, MetadataImpl. @fx19880617  wdyt?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r593472691



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/uploader/SegmentUploader.java
##########
@@ -34,10 +36,17 @@
   void init(TableConfig tableConfig)
       throws Exception;
 
+  /**
+   * Uploads the segment tar file to the cluster
+   * @param segmentTarFile URI of segment tar file
+   */
+  void uploadSegment(URI segmentTarFile)
+      throws Exception;
+
   /**
    * Uploads the segments from the segmentDir to the cluster
-   * @param segmentDir URI of segment tar file or URI of directory containing segment tar files
+   * @param segmentDir URI of directory containing segment tar files
    */
-  void upload(URI segmentDir)
+  void uploadSegments(URI segmentDir)

Review comment:
       ah I see. Sounds good.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-796398254


   @mcvsubbu would you please review again and unblock from merging?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-785559563


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=h1) Report
   > Merging [#6611](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=desc) (f16817f) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `0.63%`.
   > The diff coverage is `63.05%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6611/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6611      +/-   ##
   ==========================================
   - Coverage   66.44%   65.81%   -0.64%     
   ==========================================
     Files        1075     1360     +285     
     Lines       54773    66535   +11762     
     Branches     8168     9698    +1530     
   ==========================================
   + Hits        36396    43790    +7394     
   - Misses      15700    19620    +3920     
   - Partials     2677     3125     +448     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `65.81% <63.05%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...pinot/broker/api/resources/PinotClientRequest.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdENsaWVudFJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-27.28%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...org/apache/pinot/client/DynamicBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0R5bmFtaWNCcm9rZXJTZWxlY3Rvci5qYXZh) | `82.85% <ø> (+10.12%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `68.88% <ø> (ø)` | |
   | ... and [1245 more](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=footer). Last update [78152cd...f16817f](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-785567745


   @yupeng9 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r583102511



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter {
+
+  /**
+   * Initialize the {@link SegmentWriter} with details about the Pinot cluster and table
+   */
+  void init(PinotConfiguration conf);
+
+  /**
+   * Collect a single {@link GenericRow} into a buffer.
+   * This row is not available in Pinot until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row);
+
+  /**
+   * Collect a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in Pinot until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow[] rowBatch);
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer, and uploads the segment to Pinot.
+   * Successful invocation of this method means that the {@link GenericRow}s collected so far,
+   * are now available in Pinot and not available in the buffer anymore.
+   */
+  void flush();
+
+  /**
+   * Close the {@link SegmentWriter}
+   */
+  void close();

Review comment:
       Make the interface extending `Closeable`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592094402



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow[] rowBatch)
+      throws IOException;
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the batchConfigs.

Review comment:
       +1 to the wrapper class. Ideally we don't have to expose the segment files to the Spark/Flink connectors. 
   Also, can you describe the flow of how the connector invokes this? 
   Does it first call flush, and then call upload? I feel it's a bit unclean as `flush` does not return a handle, so how does the caller know what will be uploaded?
   
   Previously I thought upload will also take care of the upload (ideally in an async way done by a thread), so the caller does not need to worry about the segment creation/upload details




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-785559563


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=h1) Report
   > Merging [#6611](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=desc) (0c0711b) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `22.29%`.
   > The diff coverage is `42.03%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6611/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #6611       +/-   ##
   ===========================================
   - Coverage   66.44%   44.15%   -22.30%     
   ===========================================
     Files        1075     1358      +283     
     Lines       54773    66500    +11727     
     Branches     8168     9694     +1526     
   ===========================================
   - Hits        36396    29363     -7033     
   - Misses      15700    34684    +18984     
   + Partials     2677     2453      -224     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `44.15% <42.03%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...t/broker/broker/BasicAuthAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0Jhc2ljQXV0aEFjY2Vzc0NvbnRyb2xGYWN0b3J5LmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=) | `0.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ceselector/StrictReplicaGroupInstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL1N0cmljdFJlcGxpY2FHcm91cEluc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...roker/routing/segmentpruner/TimeSegmentPruner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL1RpbWVTZWdtZW50UHJ1bmVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...roker/routing/segmentpruner/interval/Interval.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL2ludGVydmFsL0ludGVydmFsLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | ... and [1373 more](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=footer). Last update [78152cd...0c0711b](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r590588683



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter {

Review comment:
       In that case, can you also change the MutableSegment to extend this interface, and implement it in MutableSegmetImpl? I just want to make sure that any columnar segment build is indeed hidden underneath the implementation here, and this interface is for pure row-by-row segment store (and possibly serve queries)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592573659



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow[] rowBatch)
+      throws IOException;
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the batchConfigs.

Review comment:
       Yes, the caller will first call `SegmentWriter#flush()` and then `SegmentUploader#upload(segmentDir)`.
   The segment will be generated in the `outputDirURI` that is provided in the tableConfig or in the batchConfigOverride provided by the caller. We can make `flush` return the segmentURI if you think that's better?
   
   Yes previously with just one interface, flush would be doing both. But the aim with the new interface is to keep segment generation and segment push steps separate. There could be cases where user only wants to generate segment (e.g. preprocessing). Caller can use the wrapper class if they wish not to deal with 2 interfaces.
   
   
   I'm not sure about async. What happens if the segment generation/push fails, or the thread gets killed, but flink has already started writing to the next segment? Who will be responsible for going back and rewriting the rows for that failed segment?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 removed a comment on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
fx19880617 removed a comment on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-785567745


   @yupeng9 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r586079850



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter {
+
+  /**
+   * Initialize the {@link SegmentWriter} with details about the Pinot cluster and table
+   */
+  void init(PinotConfiguration conf);
+
+  /**
+   * Collect a single {@link GenericRow} into a buffer.
+   * This row is not available in Pinot until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row);

Review comment:
       As discussed offline, we want to give users the impression that collect merely collects the rows, and will not be written to the segment until flush. Hence, better to call this collect




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592824743



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.

Review comment:
       Do you think we shall add some recommendations on the thread safety to the implementations?
   For example, is the implementation you plan to add thread-safe?

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   * @param tableConfig The table config for the segment
+   * @param schema The Pinot schema for the table
+   */
+  void init(TableConfig tableConfig, Schema schema)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws Exception;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  default void collect(GenericRow[] rowBatch)
+      throws Exception {
+    for (GenericRow row : rowBatch) {
+      collect(row);
+    }
+  }
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the tableConfig->batchConfigs.
+   * Successful invocation of this method means that the {@link GenericRow}s collected so far,
+   * are now available in the Pinot segment and not available in the buffer anymore.
+   *
+   * @return URI of the generated segment
+   */
+  URI flush()

Review comment:
       Is it that right that URI is concrete only when the segment is created successfully, otherwise it'll be null?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592920377



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   * @param tableConfig The table config for the segment
+   * @param schema The Pinot schema for the table
+   */
+  void init(TableConfig tableConfig, Schema schema)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws Exception;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  default void collect(GenericRow[] rowBatch)
+      throws Exception {
+    for (GenericRow row : rowBatch) {
+      collect(row);
+    }
+  }
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the tableConfig->batchConfigs.

Review comment:
       almost feels like it might better to take the output directory as an explicit parameter. Typically the one in tableConfig might be for the minion or controller

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/uploader/SegmentUploader.java
##########
@@ -0,0 +1,43 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.uploader;
+
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+
+
+/**
+ * Interface for uploading segments to Pinot
+ */
+public interface SegmentUploader {
+
+  /**
+   * Initializes the {@link SegmentUploader}
+   * @param tableConfig The table config for the segment upload
+   */
+  void init(TableConfig tableConfig)
+      throws Exception;
+
+  /**
+   * Uploads the segments from the segmentDir to the cluster
+   * @param segmentDir URI of segment tar file or URI of directory containing segment tar files

Review comment:
       should we have separate methods for a single tar file vs a directory?

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   * @param tableConfig The table config for the segment
+   * @param schema The Pinot schema for the table
+   */
+  void init(TableConfig tableConfig, Schema schema)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws Exception;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  default void collect(GenericRow[] rowBatch)
+      throws Exception {
+    for (GenericRow row : rowBatch) {
+      collect(row);
+    }
+  }
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the tableConfig->batchConfigs.
+   * Successful invocation of this method means that the {@link GenericRow}s collected so far,
+   * are now available in the Pinot segment and not available in the buffer anymore.
+   *
+   * @return URI of the generated segment
+   */
+  URI flush()

Review comment:
       flush is a bit misleading? for e.g. FileWriter has flush and close. maybe finish/build/close




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r593445283



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/uploader/SegmentUploader.java
##########
@@ -34,10 +36,17 @@
   void init(TableConfig tableConfig)
       throws Exception;
 
+  /**
+   * Uploads the segment tar file to the cluster
+   * @param segmentTarFile URI of segment tar file
+   */
+  void uploadSegment(URI segmentTarFile)
+      throws Exception;
+
   /**
    * Uploads the segments from the segmentDir to the cluster
-   * @param segmentDir URI of segment tar file or URI of directory containing segment tar files
+   * @param segmentDir URI of directory containing segment tar files
    */
-  void upload(URI segmentDir)
+  void uploadSegments(URI segmentDir)

Review comment:
       so this will upload all segments in dir, but not possible of a single segment tar? Seems mismatch with the output of the flush ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
yupeng9 commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r591961768



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/uploader/SegmentUploader.java
##########
@@ -0,0 +1,38 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.uploader;
+
+import java.net.URI;
+
+
+/**
+ * Interface for uploading segments to Pinot
+ */
+public interface SegmentUploader {

Review comment:
       do you expect more than one implementation of this?

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;

Review comment:
       why would this throw IOException?

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/uploader/SegmentUploaderConfig.java
##########
@@ -0,0 +1,73 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.uploader;
+
+import com.google.common.base.Preconditions;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.ingestion.batch.BatchConfig;
+
+
+/**
+ * Config to initialize the {@link SegmentUploader}
+ */
+public class SegmentUploaderConfig {
+  private final TableConfig _tableConfig;
+  private final BatchConfig _batchConfigOverride;
+
+  private SegmentUploaderConfig(TableConfig tableConfig, @Nullable BatchConfig batchConfigOverride) {
+    this._tableConfig = tableConfig;
+    this._batchConfigOverride = batchConfigOverride;
+  }
+
+  public TableConfig getTableConfig() {
+    return _tableConfig;
+  }
+
+  @Nullable
+  public BatchConfig getBatchConfigOverride() {
+    return _batchConfigOverride;
+  }
+
+  public static class Builder {
+    private TableConfig _tableConfig;
+    private BatchConfig _batchConfigOverride;
+
+    public Builder setTableConfig(TableConfig tableConfig) {
+      _tableConfig = tableConfig;
+      return this;
+    }
+
+    public Builder setBatchConfigOverride(@Nullable BatchConfig batchConfigOverride) {
+      _batchConfigOverride = batchConfigOverride;
+      return this;
+    }
+
+    public SegmentUploaderConfig build() {
+      Preconditions.checkNotNull(_tableConfig);
+      return new SegmentUploaderConfig(_tableConfig, _batchConfigOverride);
+    }
+  }
+
+  @Override
+  public String toString() {
+    return "SegmentWriterConfig{" + "\n_tableConfig=" + _tableConfig + ", \nbatchConfigOverride=" + _batchConfigOverride

Review comment:
       SegmentUploaderConfig

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow[] rowBatch)
+      throws IOException;
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the batchConfigs.

Review comment:
       will the flush upload the segment?

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow[] rowBatch)
+      throws IOException;
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the batchConfigs.
+   * Successful invocation of this method means that the {@link GenericRow}s collected so far,
+   * are now available in the Pinot segment and not available in the buffer anymore.
+   */
+  void flush();

Review comment:
       this could throw an exception?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r590654738



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter {

Review comment:
       I would rather not do that as part of this PR. There will be several PRs following this, for implementations. We can work on MutableSegmentImpl in one of those




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io commented on pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-785559563


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=h1) Report
   > Merging [#6611](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=desc) (9b446f6) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `0.88%`.
   > The diff coverage is `62.57%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6611/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6611      +/-   ##
   ==========================================
   - Coverage   66.44%   65.56%   -0.89%     
   ==========================================
     Files        1075     1352     +277     
     Lines       54773    66536   +11763     
     Branches     8168     9694    +1526     
   ==========================================
   + Hits        36396    43624    +7228     
   - Misses      15700    19797    +4097     
   - Partials     2677     3115     +438     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `65.56% <62.57%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...pinot/broker/api/resources/PinotClientRequest.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdENsaWVudFJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-27.28%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...org/apache/pinot/client/DynamicBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0R5bmFtaWNCcm9rZXJTZWxlY3Rvci5qYXZh) | `82.85% <ø> (+10.12%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `68.88% <ø> (ø)` | |
   | ... and [1214 more](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=footer). Last update [78152cd...9b446f6](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592924119



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   * @param tableConfig The table config for the segment
+   * @param schema The Pinot schema for the table
+   */
+  void init(TableConfig tableConfig, Schema schema)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws Exception;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  default void collect(GenericRow[] rowBatch)
+      throws Exception {
+    for (GenericRow row : rowBatch) {
+      collect(row);
+    }
+  }
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the tableConfig->batchConfigs.

Review comment:
       The caller can always overwrite the outputDir in the tableConfig object before calling init ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar merged pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
npawar merged pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r593476414



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.

Review comment:
       As discussed offline, lets leave that decision and freedom to the implementations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-785559563


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=h1) Report
   > Merging [#6611](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=desc) (19f685a) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `0.62%`.
   > The diff coverage is `63.22%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6611/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6611      +/-   ##
   ==========================================
   - Coverage   66.44%   65.82%   -0.63%     
   ==========================================
     Files        1075     1363     +288     
     Lines       54773    66556   +11783     
     Branches     8168     9702    +1534     
   ==========================================
   + Hits        36396    43813    +7417     
   - Misses      15700    19621    +3921     
   - Partials     2677     3122     +445     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `65.82% <63.22%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...pinot/broker/api/resources/PinotClientRequest.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdENsaWVudFJlcXVlc3QuamF2YQ==) | `0.00% <0.00%> (-27.28%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...org/apache/pinot/client/DynamicBrokerSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0R5bmFtaWNCcm9rZXJTZWxlY3Rvci5qYXZh) | `82.85% <ø> (+10.12%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `68.88% <ø> (ø)` | |
   | ... and [1250 more](https://codecov.io/gh/apache/incubator-pinot/pull/6611/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=footer). Last update [78152cd...19f685a](https://codecov.io/gh/apache/incubator-pinot/pull/6611?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592030849



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow[] rowBatch)
+      throws IOException;
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the batchConfigs.

Review comment:
       It will not. Upload will be done by calling SegmentUploader#upload.
   We could introduce a wrapper class, that does segment generation and push, by calling SegmentWriter and SegmentUploader internally




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r592031060



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   */
+  void init(SegmentWriterConfig segmentWriterConfig)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws IOException;

Review comment:
       Changed to Exception. I think IOException came in there from the impl class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6611: SegmentWriter interface

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r587777738



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initialize the {@link SegmentWriter} with details about the Pinot cluster and table

Review comment:
       sed `s/Initialize/Initializes/`. Same for other APIs.

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.env.PinotConfiguration;
+
+
+/**
+ * An interface to write records into Pinot.
+ * This interface helps abstract out details regarding segment generation and push from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initialize the {@link SegmentWriter} with details about the Pinot cluster and table
+   */
+  void init(PinotConfiguration conf)
+    throws IOException;
+
+  /**
+   * Collect a single {@link GenericRow} into a buffer.
+   * This row is not available in Pinot until a <code>flush()</code> is invoked.

Review comment:
       > This row is not available in Pinot until a <code>flush()</code> is invoked.
   
   Does the availability of a row here mean it cannot be queried?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #6611: SegmentWriter and SegmentUploader interface

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #6611:
URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r593424484



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java
##########
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.ingestion.segment.writer;
+
+import java.io.Closeable;
+import java.net.URI;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * An interface to collect records and create a Pinot segment.
+ * This interface helps abstract out details regarding segment generation from the caller.
+ */
+public interface SegmentWriter extends Closeable {
+
+  /**
+   * Initializes the {@link SegmentWriter} with provided tableConfig and Pinot schema.
+   * @param tableConfig The table config for the segment
+   * @param schema The Pinot schema for the table
+   */
+  void init(TableConfig tableConfig, Schema schema)
+      throws Exception;
+
+  /**
+   * Collects a single {@link GenericRow} into a buffer.
+   * This row is not available in the segment until a <code>flush()</code> is invoked.
+   */
+  void collect(GenericRow row)
+      throws Exception;
+
+  /**
+   * Collects a batch of {@link GenericRow}s into a buffer.
+   * These rows are not available in the segment until a <code>flush()</code> is invoked.
+   */
+  default void collect(GenericRow[] rowBatch)
+      throws Exception {
+    for (GenericRow row : rowBatch) {
+      collect(row);
+    }
+  }
+
+  /**
+   * Creates one Pinot segment using the {@link GenericRow}s collected in the buffer,
+   * at the outputDirUri as specified in the tableConfig->batchConfigs.
+   * Successful invocation of this method means that the {@link GenericRow}s collected so far,
+   * are now available in the Pinot segment and not available in the buffer anymore.
+   *
+   * @return URI of the generated segment
+   */
+  URI flush()

Review comment:
       This is pretty much like the FileWriter right? So flush and close make sense?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org