You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Jose Rivera-Rubio <jo...@internavenue.com> on 2016/08/09 02:09:33 UTC

Duplicated events in HBase

Hi all,

*Problem*:

I'll be generating dumps of my event data using pio export and then running
pio import using these dumps without doing pio app data-delete.

*Question*:
Is pio import running any duplicity checks or the data will be imported as
is, generating duplicated eventIds?

Many thanks!

Re: Duplicated events in HBase

Posted by Tom Chan <yu...@gmail.com>.
Looking at the source code on develop branch,

https://github.com/apache/incubator-predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala#L270

when events are exported the eventId is there, so at import time that
eventId will be used as rowKey:

https://github.com/apache/incubator-predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala#L147-L150

So it should replace the existing row because they have the same row key.
Same cannot be said if you manually created a replacement event with the
same name, entityId, etc.

To be sure you can export your events to file, import it to a test
appId/channelId twice and see if there's any duplicated events (say you can
check using the event server)

Tom



On Mon, Aug 8, 2016 at 7:09 PM, Jose Rivera-Rubio <
jose.rivera@internavenue.com> wrote:

> Hi all,
>
> *Problem*:
>
> I'll be generating dumps of my event data using pio export and then
> running pio import using these dumps without doing pio app data-delete.
>
> *Question*:
> Is pio import running any duplicity checks or the data will be imported as
> is, generating duplicated eventIds?
>
> Many thanks!
>