You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/03/05 19:01:11 UTC

[GitHub] [incubator-pinot] apucher opened a new pull request #5118: Synthetic Time Series Generator for pinot-tools

apucher opened a new pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118
 
 
   Extends pinot-tools data generator with realistic time series templates considering seasonality,
   trends, noise, and major outliers. #5117 
   
   Specifically, this adds support for:
   * seasonal (diurnal) patterns to simulate page views and clicks metrics
   * spiky (long-tailed) patterns to simulate error metrics
   * sequential patterns to deterministically populate time stamp columns
   * string dictionaries to deterministically populate dimension columns
   * mixture models of the above
   
   Additionally provides two sample configurations "simpleWebsite" and "complexWebsite" which
   generate non-dimensional and dimensional examples of time series respectively. Also provides
   instructions for generating and loading mock data onto an existing Pinot cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher edited a comment on issue #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher edited a comment on issue #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#issuecomment-595573642
 
 
   Thank you @mayankshriv for the quick turnaround. I absolutely agree that "pattern" is a better name for this. Pushed update.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388688691
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGeneratorSpec.java
 ##########
 @@ -47,17 +48,19 @@
   private final boolean overrideOutDir;
 
   public DataGeneratorSpec() {
-    this(new ArrayList<String>(), new HashMap<String, Integer>(), new HashMap<String, IntRange>(),
-        new HashMap<String, DataType>(), new HashMap<String, FieldType>(), new HashMap<String, TimeUnit>(),
+    this(new ArrayList<String>(), new HashMap<>(), new HashMap<>(), new HashMap<>(),
+        new HashMap<>(), new HashMap<>(), new HashMap<>(),
 
 Review comment:
   fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388688665
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/TemplateSpikeGenerator.java
 ##########
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.tools.data.generator;
+
+import org.apache.commons.math3.distribution.LogNormalDistribution;
+
+import java.util.Map;
+
+/**
+ * TemplateSpikeGenerator produces a series of log-normal spikes with log-normal arrival times, with optional smoothing.
+ * This pattern is typical for rare even spikes, such as error counts. The generated values are sampled non-deterministically.
+ *
+ * Generator example:
+ * <pre>
+ *     baseline = 0
+ *     arrivalMean = ?
+ *     magnitudeMean = ?
+ *
+ *     returns [ 0, 0, 0, 0, 0, 0, 47, 15, 2, 1, 0, 0, ... ]
+ * </pre>
+ *
+ * Configuration examples:
+ * <ul>
+ *     <li>./pinot-tools/src/main/resources/generator/simpleWebsite_generator.json</li>
+ *     <li>./pinot-tools/src/main/resources/generator/complexWebsite_generator.json</li>
+ * </ul>
+ */
+public class TemplateSpikeGenerator implements Generator {
+    private final double baseline;
+    private final double smoothing;
+
+    private final LogNormalDistribution arrivalGenerator;
+    private final LogNormalDistribution magnitudeGenerator;
+
+    private long step = -1;
+
+    private long nextArrival;
+    private double lastValue;
+
+    public TemplateSpikeGenerator(Map<String, Object> templateConfig) {
+        this(toDouble(templateConfig.get("baseline"), 0),
+                toDouble(templateConfig.get("arrivalMean"), 2),
+                toDouble(templateConfig.get("arrivalSigma"), 1),
+                toDouble(templateConfig.get("magnitudeMean"), 2),
+                toDouble(templateConfig.get("magnitudeSigma"), 1),
+                toDouble(templateConfig.get("smoothing"), 0));
+    }
+
+    public TemplateSpikeGenerator(double baseline, double arrivalMean, double arrivalSigma, double magnitudeMean, double magnitudeSigma, double smoothing) {
+        this.baseline = baseline;
+        this.smoothing = smoothing;
+
+        this.arrivalGenerator = new LogNormalDistribution(arrivalMean, arrivalSigma);
+        this.magnitudeGenerator = new LogNormalDistribution(magnitudeMean, magnitudeSigma);
+
+        this.nextArrival = (long) arrivalGenerator.sample();
+        this.lastValue = baseline;
+    }
+
+    @Override
+    public void init() {
+        // left blank
+    }
+
+    @Override
+    public Object next() {
+        step++;
+
+        if (step < nextArrival) {
+            lastValue = (1 - smoothing) * baseline + smoothing * lastValue;
+            return (long) lastValue;
+        }
+
+        nextArrival += (long) arrivalGenerator.sample();
+        lastValue = baseline + this.magnitudeGenerator.sample();
+        return (long) lastValue;
+    }
+
+    private static double toDouble(Object obj, double defaultValue) {
+        if (obj == null) {
+            return defaultValue;
+        }
+        return Double.valueOf(obj.toString());
+    }
+
+    public static void main(String[] args) {
 
 Review comment:
   cleaned up

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388625495
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGenerator.java
 ##########
 @@ -100,6 +110,23 @@ public void generate(long totalDocs, int numFiles)
     }
   }
 
+  public void generateCsv(long totalDocs, int numFiles)
+      throws IOException {
+    final int numPerFiles = (int) (totalDocs / numFiles);
+    for (int i = 0; i < numFiles; i++) {
+      try (FileWriter writer = new FileWriter(outDir + "/output.csv")) {
 
 Review comment:
   Output file name should come from cli?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388690431
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGenerator.java
 ##########
 @@ -100,6 +110,23 @@ public void generate(long totalDocs, int numFiles)
     }
   }
 
+  public void generateCsv(long totalDocs, int numFiles)
+      throws IOException {
+    final int numPerFiles = (int) (totalDocs / numFiles);
+    for (int i = 0; i < numFiles; i++) {
+      try (FileWriter writer = new FileWriter(outDir + "/output.csv")) {
 
 Review comment:
   output is directory-based due to avro support above. imho doesn't add much value to support custom file names

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher merged pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher merged pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388625912
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGeneratorSpec.java
 ##########
 @@ -47,17 +48,19 @@
   private final boolean overrideOutDir;
 
   public DataGeneratorSpec() {
-    this(new ArrayList<String>(), new HashMap<String, Integer>(), new HashMap<String, IntRange>(),
-        new HashMap<String, DataType>(), new HashMap<String, FieldType>(), new HashMap<String, TimeUnit>(),
+    this(new ArrayList<String>(), new HashMap<>(), new HashMap<>(), new HashMap<>(),
+        new HashMap<>(), new HashMap<>(), new HashMap<>(),
 
 Review comment:
   `System.getProperty("java.io.tmpdir")` instead of `/tmp`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388688727
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/TemplateStringGenerator.java
 ##########
 @@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.tools.data.generator;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * TemplateStringGenerator produces series of strings by cycling through a predefined list of values, optionally with
+ * a number of repetitions per value.
+ *
+ * Generator example:
+ * <pre>
+ *     values = [ "hello", "world" ]
+ *     repetitions = 2
+ *
+ *     returns [ "hello", "hello", "world", "world", "hello", ... ]
+ * </pre>
+ *
+ * Configuration examples:
+ * <ul>
+ *     <li>./pinot-tools/src/main/resources/generator/simpleWebsite_generator.json</li>
+ *     <li>./pinot-tools/src/main/resources/generator/complexWebsite_generator.json</li>
+ * </ul>
+ */
+public class TemplateStringGenerator implements Generator {
+    private final String[] values;
+    private final long repetitions;
+
+    private long step;
+
+    public TemplateStringGenerator(Map<String, Object> templateConfig) {
+        this(((List<String>) templateConfig.get("values")).toArray(new String[0]), toLong(templateConfig.get("repetitions"), 1));
+    }
+
+    public TemplateStringGenerator(String[] values, long repetitions) {
+        this.values = values;
+        this.repetitions = repetitions;
+    }
+
+    @Override
+    public void init() {
+        // left blank
+    }
+
+    @Override
+    public Object next() {
+        return values[(int) (step++ / repetitions) % values.length];
+    }
+
+    private static long toLong(Object obj, long defaultValue) {
+        if (obj == null) {
+            return defaultValue;
+        }
+        return Long.valueOf(obj.toString());
+    }
+
+    public static void main(String[] args) {
 
 Review comment:
   cleaned up

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388688622
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGenerator.java
 ##########
 @@ -100,6 +110,23 @@ public void generate(long totalDocs, int numFiles)
     }
   }
 
+  public void generateCsv(long totalDocs, int numFiles)
+      throws IOException {
+    final int numPerFiles = (int) (totalDocs / numFiles);
+    for (int i = 0; i < numFiles; i++) {
+      try (FileWriter writer = new FileWriter(outDir + "/output.csv")) {
 
 Review comment:
   cleaned up

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388627944
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/TemplateSpikeGenerator.java
 ##########
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.tools.data.generator;
+
+import org.apache.commons.math3.distribution.LogNormalDistribution;
+
+import java.util.Map;
+
+/**
+ * TemplateSpikeGenerator produces a series of log-normal spikes with log-normal arrival times, with optional smoothing.
+ * This pattern is typical for rare even spikes, such as error counts. The generated values are sampled non-deterministically.
+ *
+ * Generator example:
+ * <pre>
+ *     baseline = 0
+ *     arrivalMean = ?
+ *     magnitudeMean = ?
+ *
+ *     returns [ 0, 0, 0, 0, 0, 0, 47, 15, 2, 1, 0, 0, ... ]
+ * </pre>
+ *
+ * Configuration examples:
+ * <ul>
+ *     <li>./pinot-tools/src/main/resources/generator/simpleWebsite_generator.json</li>
+ *     <li>./pinot-tools/src/main/resources/generator/complexWebsite_generator.json</li>
+ * </ul>
+ */
+public class TemplateSpikeGenerator implements Generator {
+    private final double baseline;
+    private final double smoothing;
+
+    private final LogNormalDistribution arrivalGenerator;
+    private final LogNormalDistribution magnitudeGenerator;
+
+    private long step = -1;
+
+    private long nextArrival;
+    private double lastValue;
+
+    public TemplateSpikeGenerator(Map<String, Object> templateConfig) {
+        this(toDouble(templateConfig.get("baseline"), 0),
+                toDouble(templateConfig.get("arrivalMean"), 2),
+                toDouble(templateConfig.get("arrivalSigma"), 1),
+                toDouble(templateConfig.get("magnitudeMean"), 2),
+                toDouble(templateConfig.get("magnitudeSigma"), 1),
+                toDouble(templateConfig.get("smoothing"), 0));
+    }
+
+    public TemplateSpikeGenerator(double baseline, double arrivalMean, double arrivalSigma, double magnitudeMean, double magnitudeSigma, double smoothing) {
+        this.baseline = baseline;
+        this.smoothing = smoothing;
+
+        this.arrivalGenerator = new LogNormalDistribution(arrivalMean, arrivalSigma);
+        this.magnitudeGenerator = new LogNormalDistribution(magnitudeMean, magnitudeSigma);
+
+        this.nextArrival = (long) arrivalGenerator.sample();
+        this.lastValue = baseline;
+    }
+
+    @Override
+    public void init() {
+        // left blank
+    }
+
+    @Override
+    public Object next() {
+        step++;
+
+        if (step < nextArrival) {
+            lastValue = (1 - smoothing) * baseline + smoothing * lastValue;
+            return (long) lastValue;
+        }
+
+        nextArrival += (long) arrivalGenerator.sample();
+        lastValue = baseline + this.magnitudeGenerator.sample();
+        return (long) lastValue;
+    }
+
+    private static double toDouble(Object obj, double defaultValue) {
+        if (obj == null) {
+            return defaultValue;
+        }
+        return Double.valueOf(obj.toString());
+    }
+
+    public static void main(String[] args) {
 
 Review comment:
   Is this needed?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on issue #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on issue #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#issuecomment-595573642
 
 
   Thank you @mayankshriv for the quick turnaround. I absolutely agree that "pattern" is a better name for this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388689123
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/TemplateType.java
 ##########
 @@ -0,0 +1,27 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.tools.data.generator;
+
+public enum TemplateType {
 
 Review comment:
   Renamed to "Pattern". Definitely more appropriate since those aren't physical "templates". 
   
   Each type already explained in the generator module. Added javadoc referencing that information.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388628535
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/TemplateType.java
 ##########
 @@ -0,0 +1,27 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.tools.data.generator;
+
+public enum TemplateType {
 
 Review comment:
   Apologies for my ignorance, is `Template` a generally accepted term here? If not, it is not intuitive enough, and may be PATTERN/DISTRIBUTION is a better word?
   
   Also, would be good to add javadoc explaining each type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r388628016
 
 

 ##########
 File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/TemplateStringGenerator.java
 ##########
 @@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.tools.data.generator;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * TemplateStringGenerator produces series of strings by cycling through a predefined list of values, optionally with
+ * a number of repetitions per value.
+ *
+ * Generator example:
+ * <pre>
+ *     values = [ "hello", "world" ]
+ *     repetitions = 2
+ *
+ *     returns [ "hello", "hello", "world", "world", "hello", ... ]
+ * </pre>
+ *
+ * Configuration examples:
+ * <ul>
+ *     <li>./pinot-tools/src/main/resources/generator/simpleWebsite_generator.json</li>
+ *     <li>./pinot-tools/src/main/resources/generator/complexWebsite_generator.json</li>
+ * </ul>
+ */
+public class TemplateStringGenerator implements Generator {
+    private final String[] values;
+    private final long repetitions;
+
+    private long step;
+
+    public TemplateStringGenerator(Map<String, Object> templateConfig) {
+        this(((List<String>) templateConfig.get("values")).toArray(new String[0]), toLong(templateConfig.get("repetitions"), 1));
+    }
+
+    public TemplateStringGenerator(String[] values, long repetitions) {
+        this.values = values;
+        this.repetitions = repetitions;
+    }
+
+    @Override
+    public void init() {
+        // left blank
+    }
+
+    @Override
+    public Object next() {
+        return values[(int) (step++ / repetitions) % values.length];
+    }
+
+    private static long toLong(Object obj, long defaultValue) {
+        if (obj == null) {
+            return defaultValue;
+        }
+        return Long.valueOf(obj.toString());
+    }
+
+    public static void main(String[] args) {
 
 Review comment:
   Same here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] JohnTortugo commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
JohnTortugo commented on a change in pull request #5118:
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r514586224



##########
File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGenerator.java
##########
@@ -100,6 +110,23 @@ public void generate(long totalDocs, int numFiles)
     }
   }
 
+  public void generateCsv(long totalDocs, int numFiles)
+      throws IOException {
+    final int numPerFiles = (int) (totalDocs / numFiles);
+    for (int i = 0; i < numFiles; i++) {
+      try (FileWriter writer = new FileWriter(outDir + "/output.csv")) {

Review comment:
       Won't this always override the previous file?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118:
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r514588788



##########
File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGenerator.java
##########
@@ -100,6 +110,23 @@ public void generate(long totalDocs, int numFiles)
     }
   }
 
+  public void generateCsv(long totalDocs, int numFiles)
+      throws IOException {
+    final int numPerFiles = (int) (totalDocs / numFiles);
+    for (int i = 0; i < numFiles; i++) {
+      try (FileWriter writer = new FileWriter(outDir + "/output.csv")) {

Review comment:
       btw. here's the `generator.sh` script that uses the DataGenerator:
   https://github.com/apache/incubator-pinot/blob/master/docker/images/pinot/bin/generator.sh




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] apucher commented on a change in pull request #5118: Synthetic Time Series Generator for pinot-tools

Posted by GitBox <gi...@apache.org>.
apucher commented on a change in pull request #5118:
URL: https://github.com/apache/incubator-pinot/pull/5118#discussion_r514588216



##########
File path: pinot-tools/src/main/java/org/apache/pinot/tools/data/generator/DataGenerator.java
##########
@@ -100,6 +110,23 @@ public void generate(long totalDocs, int numFiles)
     }
   }
 
+  public void generateCsv(long totalDocs, int numFiles)
+      throws IOException {
+    final int numPerFiles = (int) (totalDocs / numFiles);
+    for (int i = 0; i < numFiles; i++) {
+      try (FileWriter writer = new FileWriter(outDir + "/output.csv")) {

Review comment:
       the `outDir` is set as a cli arg




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org