You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by "Geercken, Uwe" <Uw...@swissport.com> on 2015/08/25 09:50:54 UTC

data generator

Hello everybody,

Here is another tool: to generate mass CSV data. It is a java based tool and named: datagenerator. Generate data based on word lists, regular expressions or random. Also generates columns for date/time that correspond to each other (that "make sense")

https://github.com/uwegeercken/datagenerator


hope this helps.

Uwe

Re: data generator

Posted by Ted Dunning <te...@gmail.com>.
Uwe,

Thanks for the pointer.

Some differences (based on a quick glance):

- GPL versus Apache license
- log-synth generates data using the JSON data model. For playing with
Drill this is really handy.
- log-synth has file lookup samplers like datagenerator does, but it also
has a wide variety of skewed sampling against these files
- log-synth has realistic sampling for dates, time sequences, VIN numbers,
SSN's, Zip codes, random walks, names, addresses, browser versions,
languages. The first three also support lots of additional details. For
instance, the VIN decodes country of origin, for some manufacturers also
model number, engine size and more.
- log-synth has the ability to sample stateful sequences
- log-synth can fill in templates for crazy sampling
- log-synth can use random samplers as the input for other samplers or as
the parameters of other samplers
- log-synth is easily extensible and has a very simple Java API in case you
want to use it from a program

That said, I would love to work together on this sort of problem if we can
resolve the license issues.

On Tue, Aug 25, 2015 at 12:50 AM, Geercken, Uwe <Uw...@swissport.com>
wrote:

> Hello everybody,
>
> Here is another tool: to generate mass CSV data. It is a java based tool
> and named: datagenerator. Generate data based on word lists, regular
> expressions or random. Also generates columns for date/time that correspond
> to each other (that "make sense")
>
> https://github.com/uwegeercken/datagenerator
>
>
> hope this helps.
>
> Uwe
>