You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ignacio Martin <na...@gmail.com> on 2014/05/05 17:27:49 UTC

Avoiding email duplicates when registering users

Hi,

I know this is a pretty common topic, but I haven't found any solution that
really satisfy me. The problem is well known: you have a table with user
information with a UUID as primary key, but you want to avoid email
duplicates when new users register.

The closest solution I've found is this:

https://www.mail-archive.com/user@cassandra.apache.org/msg19766.html

It achieves the goal of avoiding confirmed users account with the same
email, but I think that does not avoid malicious users to fill your user
table with pending registrations (if you don't confirm any). I know that
this is a minor thing, but anyway.

I have though of a solution and it would be great if you could confirm that
it is a good one. Assume you have a table with the user information with a
UUID as primary key, and an "index" table email_to_UUID with email as
primary key.

When a user registers, the server generates a UUID and performs an INSERT
... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform
a SELECT from the same table and see if the read UUID is the same that the
one we just generated. If it is, we are allowed to INSERT the data in the
user table, knowing that no other will be doing it.

Does this sound right ? Is this the right way to have sort of UNIQUE
columns ?

Thanks in advance

Re: Avoiding email duplicates when registering users

Posted by Nikolay Mihaylov <nm...@nmmm.nu>.
the real question is - if you want the email to be unique, why use
"surrogate" primary key as UUID.

I wonder what UUID gives you at all?

If you want to have non email primary key, why not use md5(email) ?




On Wed, May 7, 2014 at 2:19 AM, Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Mon, May 5, 2014 at 10:27 AM, Ignacio Martin <na...@gmail.com> wrote:
>
>>
>> When a user registers, the server generates a UUID and performs an INSERT
>> ... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform
>> a SELECT from the same table and see if the read UUID is the same that the
>> one we just generated. If it is, we are allowed to INSERT the data in the
>> user table, knowing that no other will be doing it.
>>
>
> INSERT ... IF NOT EXISTS is the correct thing to do here, but you don't
> need to SELECT afterwards.  If the row does exist, the query results will
> show that the insert was not applied and the existing row will be returned.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Avoiding email duplicates when registering users

Posted by Charlie Mason <ch...@gmail.com>.
If you are worried about the over head of malicious bulk registration, you
could develop some rate limiting to restrict the sign ups to X signups per
hour from the same IP. Also you could use a CAPTCHA system to make lots of
requests hard to create.

The other thing that works well is automating the clean up. We do an
initial insert into the Users and UsersByEmail tables with a TTL of 12
hours. Only when they complete the signup do we redo the insert without a
TTL making the data permanent.

That works as a security measure and cleans up failed account creation at
next compaction. We also have some counters so we can keep track of failed
/ successful registration ratios.

Hope that helps,

Charlie M


On Wed, May 7, 2014 at 12:19 AM, Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Mon, May 5, 2014 at 10:27 AM, Ignacio Martin <na...@gmail.com> wrote:
>
>>
>> When a user registers, the server generates a UUID and performs an INSERT
>> ... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform
>> a SELECT from the same table and see if the read UUID is the same that the
>> one we just generated. If it is, we are allowed to INSERT the data in the
>> user table, knowing that no other will be doing it.
>>
>
> INSERT ... IF NOT EXISTS is the correct thing to do here, but you don't
> need to SELECT afterwards.  If the row does exist, the query results will
> show that the insert was not applied and the existing row will be returned.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Avoiding email duplicates when registering users

Posted by Tyler Hobbs <ty...@datastax.com>.
On Mon, May 5, 2014 at 10:27 AM, Ignacio Martin <na...@gmail.com> wrote:

>
> When a user registers, the server generates a UUID and performs an INSERT
> ... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform
> a SELECT from the same table and see if the read UUID is the same that the
> one we just generated. If it is, we are allowed to INSERT the data in the
> user table, knowing that no other will be doing it.
>

INSERT ... IF NOT EXISTS is the correct thing to do here, but you don't
need to SELECT afterwards.  If the row does exist, the query results will
show that the insert was not applied and the existing row will be returned.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>