You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "shounakmk219 (via GitHub)" <gi...@apache.org> on 2023/12/08 06:33:56 UTC

[PR] Data generator reorganisation [pinot]

shounakmk219 opened a new pull request, #12122:
URL: https://github.com/apache/pinot/pull/12122

   ## Description
   This PR decouples the data writer part from the data generator so that we can easily add new writers by keeping data generator untouched.
   
   ## Labels
   `refactor`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Data generator reorganisation [pinot]

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on PR #12122:
URL: https://github.com/apache/pinot/pull/12122#issuecomment-1864410328

   Thanks for the review @xiangfu0 !
   I have addressed your comments along with a bugfix for removing hardcoded file extension and also added a new `cleanup()` method to `Writer` interface.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Data generator reorganisation [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 merged PR #12122:
URL: https://github.com/apache/pinot/pull/12122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Data generator reorganisation [pinot]

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #12122:
URL: https://github.com/apache/pinot/pull/12122#discussion_r1432657109


##########
pinot-controller/src/main/java/org/apache/pinot/controller/recommender/data/writer/AvroRecordAppender.java:
##########
@@ -0,0 +1,56 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.recommender.data.writer;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+
+
+public class AvroRecordAppender implements Closeable {
+
+  private final DataFileWriter<GenericData.Record> _recordWriter;
+  private final org.apache.avro.Schema _avroSchema;
+
+  public AvroRecordAppender(File file, org.apache.avro.Schema avroSchema)

Review Comment:
   Atleast right now only AvroWriter is using it so makes sense to merge👍



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Data generator reorganisation [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #12122:
URL: https://github.com/apache/pinot/pull/12122#discussion_r1432405769


##########
pinot-controller/src/main/java/org/apache/pinot/controller/recommender/data/writer/AvroRecordAppender.java:
##########
@@ -0,0 +1,56 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.recommender.data.writer;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.avro.file.DataFileWriter;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+
+
+public class AvroRecordAppender implements Closeable {
+
+  private final DataFileWriter<GenericData.Record> _recordWriter;
+  private final org.apache.avro.Schema _avroSchema;
+
+  public AvroRecordAppender(File file, org.apache.avro.Schema avroSchema)

Review Comment:
   Is this class reusable? Or shall we merge it into AvroWriter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Data generator reorganisation [pinot]

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #12122:
URL: https://github.com/apache/pinot/pull/12122#discussion_r1432650824


##########
pinot-controller/src/main/java/org/apache/pinot/controller/recommender/data/writer/FileWriter.java:
##########
@@ -0,0 +1,53 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.recommender.data.writer;
+
+import java.io.File;
+import org.apache.commons.lang.StringUtils;
+import org.apache.pinot.controller.recommender.data.generator.DataGenerator;
+
+
+public class FileWriter implements Writer {
+  private FileWriterSpec _spec;
+  @Override
+  public void init(WriterSpec spec) {
+    _spec = (FileWriterSpec) spec;
+  }
+
+  @Override
+  public void write()
+      throws Exception {
+    final int numPerFiles = (int) (_spec.getTotalDocs() / _spec.getNumFiles());
+    final String headers = StringUtils.join(_spec.getGenerator().nextRow().keySet(), ",");
+    for (int i = 0; i < _spec.getNumFiles(); i++) {
+      try (java.io.FileWriter writer =
+          new java.io.FileWriter(new File(_spec.getBaseDir(), String.format("output_%d.csv", i)))) {
+        writer.append(headers).append('\n');
+        for (int j = 0; j < numPerFiles; j++) {
+          String appendString = generateRow(_spec.getGenerator());
+          writer.append(appendString).append('\n');
+        }
+      }
+    }
+  }
+
+  protected String generateRow(DataGenerator generator) {

Review Comment:
   Agree, will make the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Data generator reorganisation [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #12122:
URL: https://github.com/apache/pinot/pull/12122#discussion_r1432407851


##########
pinot-controller/src/main/java/org/apache/pinot/controller/recommender/data/writer/FileWriter.java:
##########
@@ -0,0 +1,53 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.recommender.data.writer;
+
+import java.io.File;
+import org.apache.commons.lang.StringUtils;
+import org.apache.pinot.controller.recommender.data.generator.DataGenerator;
+
+
+public class FileWriter implements Writer {
+  private FileWriterSpec _spec;
+  @Override
+  public void init(WriterSpec spec) {
+    _spec = (FileWriterSpec) spec;
+  }
+
+  @Override
+  public void write()
+      throws Exception {
+    final int numPerFiles = (int) (_spec.getTotalDocs() / _spec.getNumFiles());
+    final String headers = StringUtils.join(_spec.getGenerator().nextRow().keySet(), ",");
+    for (int i = 0; i < _spec.getNumFiles(); i++) {
+      try (java.io.FileWriter writer =
+          new java.io.FileWriter(new File(_spec.getBaseDir(), String.format("output_%d.csv", i)))) {
+        writer.append(headers).append('\n');
+        for (int j = 0; j < numPerFiles; j++) {
+          String appendString = generateRow(_spec.getGenerator());
+          writer.append(appendString).append('\n');
+        }
+      }
+    }
+  }
+
+  protected String generateRow(DataGenerator generator) {

Review Comment:
   Make this method/class abstract?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Data generator reorganisation [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12122:
URL: https://github.com/apache/pinot/pull/12122#issuecomment-1846664664

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12122?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Comparison is base [(`56ecafc`)](https://app.codecov.io/gh/apache/pinot/commit/56ecafc17e141b4cd9404255452b654231cc1247?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.60% compared to head [(`8268e28`)](https://app.codecov.io/gh/apache/pinot/pull/12122?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 46.51%.
   
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12122       +/-   ##
   =============================================
   - Coverage     61.60%   46.51%   -15.10%     
   + Complexity     1152      943      -209     
   =============================================
     Files          2406     1808      -598     
     Lines        130859    95132    -35727     
     Branches      20218    15344     -4874     
   =============================================
   - Hits          80611    44247    -36364     
   - Misses        44373    47730     +3357     
   + Partials       5875     3155     -2720     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.51% <ø> (-14.97%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.51% <ø> (-14.95%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.51% <ø> (-15.10%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.51% <ø> (-15.09%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.51% <ø> (-0.20%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12122/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12122?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org