You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Jose Rivera-Rubio <jo...@internavenue.com> on 2016/08/09 02:09:33 UTC
Duplicated events in HBase
Hi all,
*Problem*:
I'll be generating dumps of my event data using pio export and then running
pio import using these dumps without doing pio app data-delete.
*Question*:
Is pio import running any duplicity checks or the data will be imported as
is, generating duplicated eventIds?
Many thanks!
Re: Duplicated events in HBase
Posted by Tom Chan <yu...@gmail.com>.
Looking at the source code on develop branch,
https://github.com/apache/incubator-predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala#L270
when events are exported the eventId is there, so at import time that
eventId will be used as rowKey:
https://github.com/apache/incubator-predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala#L147-L150
So it should replace the existing row because they have the same row key.
Same cannot be said if you manually created a replacement event with the
same name, entityId, etc.
To be sure you can export your events to file, import it to a test
appId/channelId twice and see if there's any duplicated events (say you can
check using the event server)
Tom
On Mon, Aug 8, 2016 at 7:09 PM, Jose Rivera-Rubio <
jose.rivera@internavenue.com> wrote:
> Hi all,
>
> *Problem*:
>
> I'll be generating dumps of my event data using pio export and then
> running pio import using these dumps without doing pio app data-delete.
>
> *Question*:
> Is pio import running any duplicity checks or the data will be imported as
> is, generating duplicated eventIds?
>
> Many thanks!
>