You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@streampipes.apache.org by Grainier Perera <gr...@gmail.com> on 2020/05/10 16:20:16 UTC

DataSink for Redis

Hi all,

I'm planning to implement a data sink that forwards and store events into
Redis[1][2]. But I'd like to get some feedback and opinion from you before
proceeding.

The question that I have is; since Redis is merely a key-value store, and
we have a structured event to be persisted, what would the key-value be?
Following are the possible approaches[3];

1. Store the entire object as a JSON-encoded string in a single key.

* SET event:{id} '{"sensorId":"001", "temp":28}'*


   - Pro: faster when accessing all the fields of the event at once.
   - Pro: works with nested objects (but I don't think we have any nested
   objects).
   - Pro: can set the TTL.
   - Con: slower when accessing a single or subset of fields of the event.
   - Con: JSON parsing is required to retrieve fields. However, it's quite
   fast.


2. Store each Object's properties in a Redis hash.

* HMSET event:{id} sensorId "001"*

* HMSET event:{id} temp "28"*


   - Pro: can set the TTL.
   - Pro: no need to parse JSON strings.
   - Con: faster when accessing a single or subset of fields of the event.
   - Con: slower when accessing all the fields of the event.


3. Store each Object as a JSON string in a Redis hash.

* HMSET events {id1} '{"sensorId":"001", "temp":28}'*

* HMSET events {id2} '{"sensorId":"002", "temp":32}'*


   - Pro: fewer keys to work with.
   - Con: can't set the TTL.
   - Con: JSON parsing is required to retrieve fields.
   - Con: slower when accessing a single or subset of fields of the event.


4. Store each property of each Object in a dedicated key.

* SET event:{id}:sensorId "001"*

* SET event:{id}:temp 28*


   - Pro: can set the TTL per field (but it's not necessary for our
   scenario).
   - Pro: no need to parse JSON strings.
   - Con: faster when accessing a single or subset of fields of the event.
   - Con: slower when accessing all the fields of the event.


5. Use RedisJSON[4][5] module and store each event as a JSON.

* JSON.SET event . '{"sensorId":"001", "temp":28}'*


   - Pro: faster manipulation of JSON documents.
   - Pro: faster when accessing single/multiple fields of the event.
   - Pro: can set the TTL.
   - Con: requires RedisJSON module.


IMO, 1 & 2 would be the best choices given that they both allow (TTL) for
purging. What would you think is best? Your feedback is highly appreciated.

[1] https://redis.io/
[2] https://issues.apache.org/jira/browse/STREAMPIPES-121
<https://redis.io/>
[3]
https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
[4] https://redislabs.com/redis-enterprise/redis-json/
[5] https://oss.redislabs.com/redisjson/

Regards,
Grainier.

Re: DataSink for Redis

Posted by Grainier Perera <gr...@gmail.com>.

Hi Philipp,

I've created an issue [1] and added a docker-compose file for Redis in
PR[2]. Please review and merge.

[1] https://issues.apache.org/jira/browse/STREAMPIPES-124
[2] https://github.com/apache/incubator-streampipes-installer/pull/6

Thanks,
Grainier.

On Wed, 13 May 2020 at 02:01, Philipp Zehnder <ze...@apache.org> wrote:

> Hi Grainer,
>
> your PR looks very good.
> Do you have a docker-compose file for Redis?
> I would like to add it to our CLI [1] in the service directory.
>
> This makes it easy for StreamPipes users to setup an instance and use your
> new sink.
> A user just has to add ‘redis’ to the system file and the container is
> then started with the rest of the system.
> We already provided docker-compose files for other DBs.
>
> Philipp
>
> [1] https://github.com/apache/incubator-streampipes-installer/tree/dev/cli
> <https://github.com/apache/incubator-streampipes-installer/tree/dev/cli>
>
> > On 12. May 2020, at 18:09, Grainier Perera <gr...@gmail.com>
> wrote:
> >
> > Hi Philipp,
> >
> > I agree with your opinion on the key-field. So I've modified it with an
> > option to either use auto-increment or use an existing event field as the
> > key field [1]. Now it will have a radio button to select True/False on
> > auto-increment. And if it's True, key-field will be ignored and a
> > sequential numeric key will be used. Otherwise, it'll use the selected
> > field as the key field.
> >
> > When it comes to use-cases, a user can;
> >
> >   1. Store the last event per asset (asset id as the key-field,
> >   auto-increment disabled, index -1).
> >   2. Collect all the events for per asset for diagnostics, replaying,
> >   etc... (auto-increment enabled, different index per asset) (index is
> like a
> >   separate DB with a distinct keyspace, independent from the others [2])
> >   3. To collect recent events with data purging. (similar to 1, 2. But,
> >   with an expiration time).
> >
> > So, with this new approach, it would allow all the above scenarios. What
> do
> > you think?
> >
> > [1] https://github.com/apache/incubator-streampipes-extensions/pull/13
> > [2] https://www.mikeperham.com/2015/09/24/storing-data-with-redis/
> >
> > Regards,
> > Grainier.
> >
> > On Tue, 12 May 2020 at 12:36, Philipp Zehnder <ze...@apache.org>
> wrote:
> >
> >> Hi Grainer,
> >>
> >> the sink looks very cool and I merged your PR.
> >>
> >> I have a question regarding the key field.
> >>
> >> Currently users can either select ‘-‘ or a ‘runtimeName’ as a
> >> requiredTextParameter.
> >> When ‘-‘ is selected a unique counter is used for the key, right?
> >> The problem is when a user selects a ‘runtimeName’ we can not provide
> any
> >> input validation.
> >> If the primaryKey is not within the event the user will see an error
> when
> >> the pipeline is started and has to go back and edit the pipeline.
> >>
> >> Alternatively we could use a mapping property for the key field, then
> the
> >> user would see a drop down menu of all event properties and could select
> >> one.
> >> This way we can ensure that the key is within the event, but then we do
> >> not have the chance to select ‘-‘.
> >>
> >> What do you think is a common use case for the Redit sink?
> >> Could a use case for redit be to store the last event per asset? (e.g.
> >> sensor or machine)
> >> Therefore, we could use the mapping property solution and further extend
> >> it with a dimension property requirement.
> >> Then users can select a property representing an identifier (e.g.
> machine
> >> id. For each machine an entry would be created in Redit)
> >>
> >>
> >> What do you think?
> >>
> >> Philipp
> >>
> >>
> >>
> >>> On 11. May 2020, at 17:51, Grainier Perera <gr...@gmail.com>
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I've sent PR [1] with the initial implementation. Please review and
> >> merge.
> >>>
> >>> [1] https://github.com/apache/incubator-streampipes-extensions/pull/12
> >>>
> >>> Thanks,
> >>> Grainier.
> >>>
> >>> On Mon, 11 May 2020 at 01:20, Dominik Riemer <ri...@apache.org>
> wrote:
> >>>
> >>>> Hi Grainier,
> >>>>
> >>>> very cool! A Redis sink would be awesome.
> >>>> Since I haven't worked a lot with Redis in the past, I don't have a
> >> strong
> >>>> opinion, just some thoughts:
> >>>> I guess the answer depends on the question how users will use events
> >>>> stored in Redis, whether they will need to access single fields or the
> >>>> whole event. I'd probably guess that most users will access whole
> >> events,
> >>>> which would lead to option 1.
> >>>> Maybe we could start with 1 and later on add an option in the pipeline
> >>>> element configuration where users can switch between both options?
> >>>>
> >>>> I'll be happy to help you with the SDK in case you have any questions
> -
> >> I
> >>>> know that our documentation has some potential for improvement, so
> feel
> >>>> free to ask 😉
> >>>>
> >>>> Dominik
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Grainier Perera <gr...@gmail.com>
> >>>> Sent: Sunday, May 10, 2020 6:20 PM
> >>>> To: dev@streampipes.apache.org
> >>>> Subject: DataSink for Redis
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I'm planning to implement a data sink that forwards and store events
> >> into
> >>>> Redis[1][2]. But I'd like to get some feedback and opinion from you
> >> before
> >>>> proceeding.
> >>>>
> >>>> The question that I have is; since Redis is merely a key-value store,
> >> and
> >>>> we have a structured event to be persisted, what would the key-value
> be?
> >>>> Following are the possible approaches[3];
> >>>>
> >>>> 1. Store the entire object as a JSON-encoded string in a single key.
> >>>>
> >>>> * SET event:{id} '{"sensorId":"001", "temp":28}'*
> >>>>
> >>>>
> >>>>  - Pro: faster when accessing all the fields of the event at once.
> >>>>  - Pro: works with nested objects (but I don't think we have any
> nested
> >>>>  objects).
> >>>>  - Pro: can set the TTL.
> >>>>  - Con: slower when accessing a single or subset of fields of the
> >> event.
> >>>>  - Con: JSON parsing is required to retrieve fields. However, it's
> >> quite
> >>>>  fast.
> >>>>
> >>>>
> >>>> 2. Store each Object's properties in a Redis hash.
> >>>>
> >>>> * HMSET event:{id} sensorId "001"*
> >>>>
> >>>> * HMSET event:{id} temp "28"*
> >>>>
> >>>>
> >>>>  - Pro: can set the TTL.
> >>>>  - Pro: no need to parse JSON strings.
> >>>>  - Con: faster when accessing a single or subset of fields of the
> >> event.
> >>>>  - Con: slower when accessing all the fields of the event.
> >>>>
> >>>>
> >>>> 3. Store each Object as a JSON string in a Redis hash.
> >>>>
> >>>> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
> >>>>
> >>>> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
> >>>>
> >>>>
> >>>>  - Pro: fewer keys to work with.
> >>>>  - Con: can't set the TTL.
> >>>>  - Con: JSON parsing is required to retrieve fields.
> >>>>  - Con: slower when accessing a single or subset of fields of the
> >> event.
> >>>>
> >>>>
> >>>> 4. Store each property of each Object in a dedicated key.
> >>>>
> >>>> * SET event:{id}:sensorId "001"*
> >>>>
> >>>> * SET event:{id}:temp 28*
> >>>>
> >>>>
> >>>>  - Pro: can set the TTL per field (but it's not necessary for our
> >>>>  scenario).
> >>>>  - Pro: no need to parse JSON strings.
> >>>>  - Con: faster when accessing a single or subset of fields of the
> >> event.
> >>>>  - Con: slower when accessing all the fields of the event.
> >>>>
> >>>>
> >>>> 5. Use RedisJSON[4][5] module and store each event as a JSON.
> >>>>
> >>>> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
> >>>>
> >>>>
> >>>>  - Pro: faster manipulation of JSON documents.
> >>>>  - Pro: faster when accessing single/multiple fields of the event.
> >>>>  - Pro: can set the TTL.
> >>>>  - Con: requires RedisJSON module.
> >>>>
> >>>>
> >>>> IMO, 1 & 2 would be the best choices given that they both allow (TTL)
> >> for
> >>>> purging. What would you think is best? Your feedback is highly
> >> appreciated.
> >>>>
> >>>> [1] https://redis.io/
> >>>> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
> >>>> <https://redis.io/>
> >>>> [3]
> >>>>
> >>>>
> >>
> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
> >>>> [4] https://redislabs.com/redis-enterprise/redis-json/
> >>>> [5] https://oss.redislabs.com/redisjson/
> >>>>
> >>>> Regards,
> >>>> Grainier.
> >>>>
> >>>>
> >>
> >>
> >>
>
>
>

Re: DataSink for Redis

Posted by Philipp Zehnder <ze...@apache.org>.

Hi Grainer,

your PR looks very good.
Do you have a docker-compose file for Redis?
I would like to add it to our CLI [1] in the service directory. 

This makes it easy for StreamPipes users to setup an instance and use your new sink.
A user just has to add ‘redis’ to the system file and the container is then started with the rest of the system.
We already provided docker-compose files for other DBs.

Philipp

[1] https://github.com/apache/incubator-streampipes-installer/tree/dev/cli <https://github.com/apache/incubator-streampipes-installer/tree/dev/cli>

> On 12. May 2020, at 18:09, Grainier Perera <gr...@gmail.com> wrote:
> 
> Hi Philipp,
> 
> I agree with your opinion on the key-field. So I've modified it with an
> option to either use auto-increment or use an existing event field as the
> key field [1]. Now it will have a radio button to select True/False on
> auto-increment. And if it's True, key-field will be ignored and a
> sequential numeric key will be used. Otherwise, it'll use the selected
> field as the key field.
> 
> When it comes to use-cases, a user can;
> 
>   1. Store the last event per asset (asset id as the key-field,
>   auto-increment disabled, index -1).
>   2. Collect all the events for per asset for diagnostics, replaying,
>   etc... (auto-increment enabled, different index per asset) (index is like a
>   separate DB with a distinct keyspace, independent from the others [2])
>   3. To collect recent events with data purging. (similar to 1, 2. But,
>   with an expiration time).
> 
> So, with this new approach, it would allow all the above scenarios. What do
> you think?
> 
> [1] https://github.com/apache/incubator-streampipes-extensions/pull/13
> [2] https://www.mikeperham.com/2015/09/24/storing-data-with-redis/
> 
> Regards,
> Grainier.
> 
> On Tue, 12 May 2020 at 12:36, Philipp Zehnder <ze...@apache.org> wrote:
> 
>> Hi Grainer,
>> 
>> the sink looks very cool and I merged your PR.
>> 
>> I have a question regarding the key field.
>> 
>> Currently users can either select ‘-‘ or a ‘runtimeName’ as a
>> requiredTextParameter.
>> When ‘-‘ is selected a unique counter is used for the key, right?
>> The problem is when a user selects a ‘runtimeName’ we can not provide any
>> input validation.
>> If the primaryKey is not within the event the user will see an error when
>> the pipeline is started and has to go back and edit the pipeline.
>> 
>> Alternatively we could use a mapping property for the key field, then the
>> user would see a drop down menu of all event properties and could select
>> one.
>> This way we can ensure that the key is within the event, but then we do
>> not have the chance to select ‘-‘.
>> 
>> What do you think is a common use case for the Redit sink?
>> Could a use case for redit be to store the last event per asset? (e.g.
>> sensor or machine)
>> Therefore, we could use the mapping property solution and further extend
>> it with a dimension property requirement.
>> Then users can select a property representing an identifier (e.g. machine
>> id. For each machine an entry would be created in Redit)
>> 
>> 
>> What do you think?
>> 
>> Philipp
>> 
>> 
>> 
>>> On 11. May 2020, at 17:51, Grainier Perera <gr...@gmail.com>
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I've sent PR [1] with the initial implementation. Please review and
>> merge.
>>> 
>>> [1] https://github.com/apache/incubator-streampipes-extensions/pull/12
>>> 
>>> Thanks,
>>> Grainier.
>>> 
>>> On Mon, 11 May 2020 at 01:20, Dominik Riemer <ri...@apache.org> wrote:
>>> 
>>>> Hi Grainier,
>>>> 
>>>> very cool! A Redis sink would be awesome.
>>>> Since I haven't worked a lot with Redis in the past, I don't have a
>> strong
>>>> opinion, just some thoughts:
>>>> I guess the answer depends on the question how users will use events
>>>> stored in Redis, whether they will need to access single fields or the
>>>> whole event. I'd probably guess that most users will access whole
>> events,
>>>> which would lead to option 1.
>>>> Maybe we could start with 1 and later on add an option in the pipeline
>>>> element configuration where users can switch between both options?
>>>> 
>>>> I'll be happy to help you with the SDK in case you have any questions -
>> I
>>>> know that our documentation has some potential for improvement, so feel
>>>> free to ask 😉
>>>> 
>>>> Dominik
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Grainier Perera <gr...@gmail.com>
>>>> Sent: Sunday, May 10, 2020 6:20 PM
>>>> To: dev@streampipes.apache.org
>>>> Subject: DataSink for Redis
>>>> 
>>>> Hi all,
>>>> 
>>>> I'm planning to implement a data sink that forwards and store events
>> into
>>>> Redis[1][2]. But I'd like to get some feedback and opinion from you
>> before
>>>> proceeding.
>>>> 
>>>> The question that I have is; since Redis is merely a key-value store,
>> and
>>>> we have a structured event to be persisted, what would the key-value be?
>>>> Following are the possible approaches[3];
>>>> 
>>>> 1. Store the entire object as a JSON-encoded string in a single key.
>>>> 
>>>> * SET event:{id} '{"sensorId":"001", "temp":28}'*
>>>> 
>>>> 
>>>>  - Pro: faster when accessing all the fields of the event at once.
>>>>  - Pro: works with nested objects (but I don't think we have any nested
>>>>  objects).
>>>>  - Pro: can set the TTL.
>>>>  - Con: slower when accessing a single or subset of fields of the
>> event.
>>>>  - Con: JSON parsing is required to retrieve fields. However, it's
>> quite
>>>>  fast.
>>>> 
>>>> 
>>>> 2. Store each Object's properties in a Redis hash.
>>>> 
>>>> * HMSET event:{id} sensorId "001"*
>>>> 
>>>> * HMSET event:{id} temp "28"*
>>>> 
>>>> 
>>>>  - Pro: can set the TTL.
>>>>  - Pro: no need to parse JSON strings.
>>>>  - Con: faster when accessing a single or subset of fields of the
>> event.
>>>>  - Con: slower when accessing all the fields of the event.
>>>> 
>>>> 
>>>> 3. Store each Object as a JSON string in a Redis hash.
>>>> 
>>>> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
>>>> 
>>>> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
>>>> 
>>>> 
>>>>  - Pro: fewer keys to work with.
>>>>  - Con: can't set the TTL.
>>>>  - Con: JSON parsing is required to retrieve fields.
>>>>  - Con: slower when accessing a single or subset of fields of the
>> event.
>>>> 
>>>> 
>>>> 4. Store each property of each Object in a dedicated key.
>>>> 
>>>> * SET event:{id}:sensorId "001"*
>>>> 
>>>> * SET event:{id}:temp 28*
>>>> 
>>>> 
>>>>  - Pro: can set the TTL per field (but it's not necessary for our
>>>>  scenario).
>>>>  - Pro: no need to parse JSON strings.
>>>>  - Con: faster when accessing a single or subset of fields of the
>> event.
>>>>  - Con: slower when accessing all the fields of the event.
>>>> 
>>>> 
>>>> 5. Use RedisJSON[4][5] module and store each event as a JSON.
>>>> 
>>>> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
>>>> 
>>>> 
>>>>  - Pro: faster manipulation of JSON documents.
>>>>  - Pro: faster when accessing single/multiple fields of the event.
>>>>  - Pro: can set the TTL.
>>>>  - Con: requires RedisJSON module.
>>>> 
>>>> 
>>>> IMO, 1 & 2 would be the best choices given that they both allow (TTL)
>> for
>>>> purging. What would you think is best? Your feedback is highly
>> appreciated.
>>>> 
>>>> [1] https://redis.io/
>>>> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
>>>> <https://redis.io/>
>>>> [3]
>>>> 
>>>> 
>> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
>>>> [4] https://redislabs.com/redis-enterprise/redis-json/
>>>> [5] https://oss.redislabs.com/redisjson/
>>>> 
>>>> Regards,
>>>> Grainier.
>>>> 
>>>> 
>> 
>> 
>>

Re: DataSink for Redis

Posted by Grainier Perera <gr...@gmail.com>.

Hi Philipp,

I agree with your opinion on the key-field. So I've modified it with an
option to either use auto-increment or use an existing event field as the
key field [1]. Now it will have a radio button to select True/False on
auto-increment. And if it's True, key-field will be ignored and a
sequential numeric key will be used. Otherwise, it'll use the selected
field as the key field.

When it comes to use-cases, a user can;

   1. Store the last event per asset (asset id as the key-field,
   auto-increment disabled, index -1).
   2. Collect all the events for per asset for diagnostics, replaying,
   etc... (auto-increment enabled, different index per asset) (index is like a
   separate DB with a distinct keyspace, independent from the others [2])
   3. To collect recent events with data purging. (similar to 1, 2. But,
   with an expiration time).

So, with this new approach, it would allow all the above scenarios. What do
you think?

[1] https://github.com/apache/incubator-streampipes-extensions/pull/13
[2] https://www.mikeperham.com/2015/09/24/storing-data-with-redis/

Regards,
Grainier.

On Tue, 12 May 2020 at 12:36, Philipp Zehnder <ze...@apache.org> wrote:

> Hi Grainer,
>
> the sink looks very cool and I merged your PR.
>
> I have a question regarding the key field.
>
> Currently users can either select ‘-‘ or a ‘runtimeName’ as a
> requiredTextParameter.
> When ‘-‘ is selected a unique counter is used for the key, right?
> The problem is when a user selects a ‘runtimeName’ we can not provide any
> input validation.
> If the primaryKey is not within the event the user will see an error when
> the pipeline is started and has to go back and edit the pipeline.
>
> Alternatively we could use a mapping property for the key field, then the
> user would see a drop down menu of all event properties and could select
> one.
> This way we can ensure that the key is within the event, but then we do
> not have the chance to select ‘-‘.
>
> What do you think is a common use case for the Redit sink?
> Could a use case for redit be to store the last event per asset? (e.g.
> sensor or machine)
> Therefore, we could use the mapping property solution and further extend
> it with a dimension property requirement.
> Then users can select a property representing an identifier (e.g. machine
> id. For each machine an entry would be created in Redit)
>
>
> What do you think?
>
> Philipp
>
>
>
> > On 11. May 2020, at 17:51, Grainier Perera <gr...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I've sent PR [1] with the initial implementation. Please review and
> merge.
> >
> > [1] https://github.com/apache/incubator-streampipes-extensions/pull/12
> >
> > Thanks,
> > Grainier.
> >
> > On Mon, 11 May 2020 at 01:20, Dominik Riemer <ri...@apache.org> wrote:
> >
> >> Hi Grainier,
> >>
> >> very cool! A Redis sink would be awesome.
> >> Since I haven't worked a lot with Redis in the past, I don't have a
> strong
> >> opinion, just some thoughts:
> >> I guess the answer depends on the question how users will use events
> >> stored in Redis, whether they will need to access single fields or the
> >> whole event. I'd probably guess that most users will access whole
> events,
> >> which would lead to option 1.
> >> Maybe we could start with 1 and later on add an option in the pipeline
> >> element configuration where users can switch between both options?
> >>
> >> I'll be happy to help you with the SDK in case you have any questions -
> I
> >> know that our documentation has some potential for improvement, so feel
> >> free to ask 😉
> >>
> >> Dominik
> >>
> >>
> >> -----Original Message-----
> >> From: Grainier Perera <gr...@gmail.com>
> >> Sent: Sunday, May 10, 2020 6:20 PM
> >> To: dev@streampipes.apache.org
> >> Subject: DataSink for Redis
> >>
> >> Hi all,
> >>
> >> I'm planning to implement a data sink that forwards and store events
> into
> >> Redis[1][2]. But I'd like to get some feedback and opinion from you
> before
> >> proceeding.
> >>
> >> The question that I have is; since Redis is merely a key-value store,
> and
> >> we have a structured event to be persisted, what would the key-value be?
> >> Following are the possible approaches[3];
> >>
> >> 1. Store the entire object as a JSON-encoded string in a single key.
> >>
> >> * SET event:{id} '{"sensorId":"001", "temp":28}'*
> >>
> >>
> >>   - Pro: faster when accessing all the fields of the event at once.
> >>   - Pro: works with nested objects (but I don't think we have any nested
> >>   objects).
> >>   - Pro: can set the TTL.
> >>   - Con: slower when accessing a single or subset of fields of the
> event.
> >>   - Con: JSON parsing is required to retrieve fields. However, it's
> quite
> >>   fast.
> >>
> >>
> >> 2. Store each Object's properties in a Redis hash.
> >>
> >> * HMSET event:{id} sensorId "001"*
> >>
> >> * HMSET event:{id} temp "28"*
> >>
> >>
> >>   - Pro: can set the TTL.
> >>   - Pro: no need to parse JSON strings.
> >>   - Con: faster when accessing a single or subset of fields of the
> event.
> >>   - Con: slower when accessing all the fields of the event.
> >>
> >>
> >> 3. Store each Object as a JSON string in a Redis hash.
> >>
> >> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
> >>
> >> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
> >>
> >>
> >>   - Pro: fewer keys to work with.
> >>   - Con: can't set the TTL.
> >>   - Con: JSON parsing is required to retrieve fields.
> >>   - Con: slower when accessing a single or subset of fields of the
> event.
> >>
> >>
> >> 4. Store each property of each Object in a dedicated key.
> >>
> >> * SET event:{id}:sensorId "001"*
> >>
> >> * SET event:{id}:temp 28*
> >>
> >>
> >>   - Pro: can set the TTL per field (but it's not necessary for our
> >>   scenario).
> >>   - Pro: no need to parse JSON strings.
> >>   - Con: faster when accessing a single or subset of fields of the
> event.
> >>   - Con: slower when accessing all the fields of the event.
> >>
> >>
> >> 5. Use RedisJSON[4][5] module and store each event as a JSON.
> >>
> >> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
> >>
> >>
> >>   - Pro: faster manipulation of JSON documents.
> >>   - Pro: faster when accessing single/multiple fields of the event.
> >>   - Pro: can set the TTL.
> >>   - Con: requires RedisJSON module.
> >>
> >>
> >> IMO, 1 & 2 would be the best choices given that they both allow (TTL)
> for
> >> purging. What would you think is best? Your feedback is highly
> appreciated.
> >>
> >> [1] https://redis.io/
> >> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
> >> <https://redis.io/>
> >> [3]
> >>
> >>
> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
> >> [4] https://redislabs.com/redis-enterprise/redis-json/
> >> [5] https://oss.redislabs.com/redisjson/
> >>
> >> Regards,
> >> Grainier.
> >>
> >>
>
>
>

Re: DataSink for Redis

Posted by Philipp Zehnder <ze...@apache.org>.

Hi Grainer,

the sink looks very cool and I merged your PR.

I have a question regarding the key field. 

Currently users can either select ‘-‘ or a ‘runtimeName’ as a requiredTextParameter.
When ‘-‘ is selected a unique counter is used for the key, right?
The problem is when a user selects a ‘runtimeName’ we can not provide any input validation.
If the primaryKey is not within the event the user will see an error when the pipeline is started and has to go back and edit the pipeline.

Alternatively we could use a mapping property for the key field, then the user would see a drop down menu of all event properties and could select one. 
This way we can ensure that the key is within the event, but then we do not have the chance to select ‘-‘.

What do you think is a common use case for the Redit sink?
Could a use case for redit be to store the last event per asset? (e.g. sensor or machine)
Therefore, we could use the mapping property solution and further extend it with a dimension property requirement.
Then users can select a property representing an identifier (e.g. machine id. For each machine an entry would be created in Redit)


What do you think?

Philipp



> On 11. May 2020, at 17:51, Grainier Perera <gr...@gmail.com> wrote:
> 
> Hi all,
> 
> I've sent PR [1] with the initial implementation. Please review and merge.
> 
> [1] https://github.com/apache/incubator-streampipes-extensions/pull/12
> 
> Thanks,
> Grainier.
> 
> On Mon, 11 May 2020 at 01:20, Dominik Riemer <ri...@apache.org> wrote:
> 
>> Hi Grainier,
>> 
>> very cool! A Redis sink would be awesome.
>> Since I haven't worked a lot with Redis in the past, I don't have a strong
>> opinion, just some thoughts:
>> I guess the answer depends on the question how users will use events
>> stored in Redis, whether they will need to access single fields or the
>> whole event. I'd probably guess that most users will access whole events,
>> which would lead to option 1.
>> Maybe we could start with 1 and later on add an option in the pipeline
>> element configuration where users can switch between both options?
>> 
>> I'll be happy to help you with the SDK in case you have any questions - I
>> know that our documentation has some potential for improvement, so feel
>> free to ask 😉
>> 
>> Dominik
>> 
>> 
>> -----Original Message-----
>> From: Grainier Perera <gr...@gmail.com>
>> Sent: Sunday, May 10, 2020 6:20 PM
>> To: dev@streampipes.apache.org
>> Subject: DataSink for Redis
>> 
>> Hi all,
>> 
>> I'm planning to implement a data sink that forwards and store events into
>> Redis[1][2]. But I'd like to get some feedback and opinion from you before
>> proceeding.
>> 
>> The question that I have is; since Redis is merely a key-value store, and
>> we have a structured event to be persisted, what would the key-value be?
>> Following are the possible approaches[3];
>> 
>> 1. Store the entire object as a JSON-encoded string in a single key.
>> 
>> * SET event:{id} '{"sensorId":"001", "temp":28}'*
>> 
>> 
>>   - Pro: faster when accessing all the fields of the event at once.
>>   - Pro: works with nested objects (but I don't think we have any nested
>>   objects).
>>   - Pro: can set the TTL.
>>   - Con: slower when accessing a single or subset of fields of the event.
>>   - Con: JSON parsing is required to retrieve fields. However, it's quite
>>   fast.
>> 
>> 
>> 2. Store each Object's properties in a Redis hash.
>> 
>> * HMSET event:{id} sensorId "001"*
>> 
>> * HMSET event:{id} temp "28"*
>> 
>> 
>>   - Pro: can set the TTL.
>>   - Pro: no need to parse JSON strings.
>>   - Con: faster when accessing a single or subset of fields of the event.
>>   - Con: slower when accessing all the fields of the event.
>> 
>> 
>> 3. Store each Object as a JSON string in a Redis hash.
>> 
>> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
>> 
>> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
>> 
>> 
>>   - Pro: fewer keys to work with.
>>   - Con: can't set the TTL.
>>   - Con: JSON parsing is required to retrieve fields.
>>   - Con: slower when accessing a single or subset of fields of the event.
>> 
>> 
>> 4. Store each property of each Object in a dedicated key.
>> 
>> * SET event:{id}:sensorId "001"*
>> 
>> * SET event:{id}:temp 28*
>> 
>> 
>>   - Pro: can set the TTL per field (but it's not necessary for our
>>   scenario).
>>   - Pro: no need to parse JSON strings.
>>   - Con: faster when accessing a single or subset of fields of the event.
>>   - Con: slower when accessing all the fields of the event.
>> 
>> 
>> 5. Use RedisJSON[4][5] module and store each event as a JSON.
>> 
>> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
>> 
>> 
>>   - Pro: faster manipulation of JSON documents.
>>   - Pro: faster when accessing single/multiple fields of the event.
>>   - Pro: can set the TTL.
>>   - Con: requires RedisJSON module.
>> 
>> 
>> IMO, 1 & 2 would be the best choices given that they both allow (TTL) for
>> purging. What would you think is best? Your feedback is highly appreciated.
>> 
>> [1] https://redis.io/
>> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
>> <https://redis.io/>
>> [3]
>> 
>> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
>> [4] https://redislabs.com/redis-enterprise/redis-json/
>> [5] https://oss.redislabs.com/redisjson/
>> 
>> Regards,
>> Grainier.
>> 
>>

Re: DataSink for Redis

Posted by Grainier Perera <gr...@gmail.com>.

Hi all,

I've sent PR [1] with the initial implementation. Please review and merge.

[1] https://github.com/apache/incubator-streampipes-extensions/pull/12

Thanks,
Grainier.

On Mon, 11 May 2020 at 01:20, Dominik Riemer <ri...@apache.org> wrote:

> Hi Grainier,
>
> very cool! A Redis sink would be awesome.
> Since I haven't worked a lot with Redis in the past, I don't have a strong
> opinion, just some thoughts:
> I guess the answer depends on the question how users will use events
> stored in Redis, whether they will need to access single fields or the
> whole event. I'd probably guess that most users will access whole events,
> which would lead to option 1.
> Maybe we could start with 1 and later on add an option in the pipeline
> element configuration where users can switch between both options?
>
> I'll be happy to help you with the SDK in case you have any questions - I
> know that our documentation has some potential for improvement, so feel
> free to ask 😉
>
> Dominik
>
>
> -----Original Message-----
> From: Grainier Perera <gr...@gmail.com>
> Sent: Sunday, May 10, 2020 6:20 PM
> To: dev@streampipes.apache.org
> Subject: DataSink for Redis
>
> Hi all,
>
> I'm planning to implement a data sink that forwards and store events into
> Redis[1][2]. But I'd like to get some feedback and opinion from you before
> proceeding.
>
> The question that I have is; since Redis is merely a key-value store, and
> we have a structured event to be persisted, what would the key-value be?
> Following are the possible approaches[3];
>
> 1. Store the entire object as a JSON-encoded string in a single key.
>
> * SET event:{id} '{"sensorId":"001", "temp":28}'*
>
>
>    - Pro: faster when accessing all the fields of the event at once.
>    - Pro: works with nested objects (but I don't think we have any nested
>    objects).
>    - Pro: can set the TTL.
>    - Con: slower when accessing a single or subset of fields of the event.
>    - Con: JSON parsing is required to retrieve fields. However, it's quite
>    fast.
>
>
> 2. Store each Object's properties in a Redis hash.
>
> * HMSET event:{id} sensorId "001"*
>
> * HMSET event:{id} temp "28"*
>
>
>    - Pro: can set the TTL.
>    - Pro: no need to parse JSON strings.
>    - Con: faster when accessing a single or subset of fields of the event.
>    - Con: slower when accessing all the fields of the event.
>
>
> 3. Store each Object as a JSON string in a Redis hash.
>
> * HMSET events {id1} '{"sensorId":"001", "temp":28}'*
>
> * HMSET events {id2} '{"sensorId":"002", "temp":32}'*
>
>
>    - Pro: fewer keys to work with.
>    - Con: can't set the TTL.
>    - Con: JSON parsing is required to retrieve fields.
>    - Con: slower when accessing a single or subset of fields of the event.
>
>
> 4. Store each property of each Object in a dedicated key.
>
> * SET event:{id}:sensorId "001"*
>
> * SET event:{id}:temp 28*
>
>
>    - Pro: can set the TTL per field (but it's not necessary for our
>    scenario).
>    - Pro: no need to parse JSON strings.
>    - Con: faster when accessing a single or subset of fields of the event.
>    - Con: slower when accessing all the fields of the event.
>
>
> 5. Use RedisJSON[4][5] module and store each event as a JSON.
>
> * JSON.SET event . '{"sensorId":"001", "temp":28}'*
>
>
>    - Pro: faster manipulation of JSON documents.
>    - Pro: faster when accessing single/multiple fields of the event.
>    - Pro: can set the TTL.
>    - Con: requires RedisJSON module.
>
>
> IMO, 1 & 2 would be the best choices given that they both allow (TTL) for
> purging. What would you think is best? Your feedback is highly appreciated.
>
> [1] https://redis.io/
> [2] https://issues.apache.org/jira/browse/STREAMPIPES-121
> <https://redis.io/>
> [3]
>
> https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
> [4] https://redislabs.com/redis-enterprise/redis-json/
> [5] https://oss.redislabs.com/redisjson/
>
> Regards,
> Grainier.
>
>

RE: DataSink for Redis

Posted by Dominik Riemer <ri...@apache.org>.

Hi Grainier,

very cool! A Redis sink would be awesome.
Since I haven't worked a lot with Redis in the past, I don't have a strong opinion, just some thoughts:
I guess the answer depends on the question how users will use events stored in Redis, whether they will need to access single fields or the whole event. I'd probably guess that most users will access whole events, which would lead to option 1.
Maybe we could start with 1 and later on add an option in the pipeline element configuration where users can switch between both options? 

I'll be happy to help you with the SDK in case you have any questions - I know that our documentation has some potential for improvement, so feel free to ask 😉

Dominik


-----Original Message-----
From: Grainier Perera <gr...@gmail.com> 
Sent: Sunday, May 10, 2020 6:20 PM
To: dev@streampipes.apache.org
Subject: DataSink for Redis

Hi all,

I'm planning to implement a data sink that forwards and store events into Redis[1][2]. But I'd like to get some feedback and opinion from you before proceeding.

The question that I have is; since Redis is merely a key-value store, and we have a structured event to be persisted, what would the key-value be?
Following are the possible approaches[3];

1. Store the entire object as a JSON-encoded string in a single key.

* SET event:{id} '{"sensorId":"001", "temp":28}'*


   - Pro: faster when accessing all the fields of the event at once.
   - Pro: works with nested objects (but I don't think we have any nested
   objects).
   - Pro: can set the TTL.
   - Con: slower when accessing a single or subset of fields of the event.
   - Con: JSON parsing is required to retrieve fields. However, it's quite
   fast.


2. Store each Object's properties in a Redis hash.

* HMSET event:{id} sensorId "001"*

* HMSET event:{id} temp "28"*


   - Pro: can set the TTL.
   - Pro: no need to parse JSON strings.
   - Con: faster when accessing a single or subset of fields of the event.
   - Con: slower when accessing all the fields of the event.


3. Store each Object as a JSON string in a Redis hash.

* HMSET events {id1} '{"sensorId":"001", "temp":28}'*

* HMSET events {id2} '{"sensorId":"002", "temp":32}'*


   - Pro: fewer keys to work with.
   - Con: can't set the TTL.
   - Con: JSON parsing is required to retrieve fields.
   - Con: slower when accessing a single or subset of fields of the event.


4. Store each property of each Object in a dedicated key.

* SET event:{id}:sensorId "001"*

* SET event:{id}:temp 28*


   - Pro: can set the TTL per field (but it's not necessary for our
   scenario).
   - Pro: no need to parse JSON strings.
   - Con: faster when accessing a single or subset of fields of the event.
   - Con: slower when accessing all the fields of the event.


5. Use RedisJSON[4][5] module and store each event as a JSON.

* JSON.SET event . '{"sensorId":"001", "temp":28}'*


   - Pro: faster manipulation of JSON documents.
   - Pro: faster when accessing single/multiple fields of the event.
   - Pro: can set the TTL.
   - Con: requires RedisJSON module.


IMO, 1 & 2 would be the best choices given that they both allow (TTL) for purging. What would you think is best? Your feedback is highly appreciated.

[1] https://redis.io/
[2] https://issues.apache.org/jira/browse/STREAMPIPES-121
<https://redis.io/>
[3]
https://stackoverflow.com/questions/16375188/redis-strings-vs-redis-hashes-to-represent-json-efficiency
[4] https://redislabs.com/redis-enterprise/redis-json/
[5] https://oss.redislabs.com/redisjson/

Regards,
Grainier.