You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whimsical.apache.org by sebb <se...@gmail.com> on 2021/07/31 11:11:42 UTC

Need for family-first (was: Testing secretary workbench)

> On Jul 30, 2021, at 3:20 PM, sebb <se...@gmail.com> wrote:
>
> On Fri, 30 Jul 2021 at 23:15, Craig Russell <ap...@gmail.com> wrote:
>>
>>
>>
>>> On Jul 30, 2021, at 3:11 PM, sebb <se...@gmail.com> wrote:
>>>
>>> Also, I thought we had abandoned the idea of requiring the
>>> family-first checkbox.
>>
>> Nope. Just recently we received a document from a person who filled the form with their family name first and the document was filed with the assumption that the family name was last.
>>
>> The Family First checkbox will prevent such mistakes in future.
>
> Why is that a mistake?

Srsly? You don't think that the secretary should care to register
"Craig Russell" instead of "Russell Craig"? He is a completely
different person.

That is a separate issue.

File names have to be unique within a directory, so ensuring a
consistent order for the file name parts *may* help to detect some
duplicates. However it is by no means infallible, especially since the
practice of changing the order of names is a relatively recent
introduction.

Also, people may provide additional or fewer given names with replacement ICLAs.
People change names.

In any case, there will be different people with the same name.
The current practice is to ask such people to provide an extra name --
I find that unnecessary and intrusive.

I think we should file under a different key, such as email (or just
UUID?), and provide alternate means of checking for possible
duplicates. For example, are there any existing ICLAs with the same
names, regardless of order? This would be trivial to check from an
index file.

Note that it's not just the file name that has to be unique: we
currently use folders where there is a replacement ICLA. The folder
name stem has to be different from the file name stem. So there
already has to be processing to check for possible duplicate stems.
That could be extended as necessary.

Sebb

Re: Need for family-first (was: Testing secretary workbench)

Posted by sebb <se...@gmail.com>.
On Sat, 31 Jul 2021 at 12:11, sebb <se...@gmail.com> wrote:
>
> > On Jul 30, 2021, at 3:20 PM, sebb <se...@gmail.com> wrote:
> >
> > On Fri, 30 Jul 2021 at 23:15, Craig Russell <ap...@gmail.com> wrote:
> >>
> >>
> >>
> >>> On Jul 30, 2021, at 3:11 PM, sebb <se...@gmail.com> wrote:
> >>>
> >>> Also, I thought we had abandoned the idea of requiring the
> >>> family-first checkbox.
> >>
> >> Nope. Just recently we received a document from a person who filled the form with their family name first and the document was filed with the assumption that the family name was last.
> >>
> >> The Family First checkbox will prevent such mistakes in future.
> >
> > Why is that a mistake?
>
> Srsly? You don't think that the secretary should care to register
> "Craig Russell" instead of "Russell Craig"? He is a completely
> different person.
>
> That is a separate issue.
>
> File names have to be unique within a directory, so ensuring a
> consistent order for the file name parts *may* help to detect some
> duplicates. However it is by no means infallible, especially since the
> practice of changing the order of names is a relatively recent
> introduction.
>
> Also, people may provide additional or fewer given names with replacement ICLAs.
> People change names.
>
> In any case, there will be different people with the same name.
> The current practice is to ask such people to provide an extra name --
> I find that unnecessary and intrusive.
>
> I think we should file under a different key, such as email (or just
> UUID?), and provide alternate means of checking for possible
> duplicates. For example, are there any existing ICLAs with the same
> names, regardless of order? This would be trivial to check from an
> index file.
>
> Note that it's not just the file name that has to be unique: we
> currently use folders where there is a replacement ICLA. The folder
> name stem has to be different from the file name stem. So there
> already has to be processing to check for possible duplicate stems.
> That could be extended as necessary.

I have made a start on code to check for duplicates, see:

http://whimsy.apache.org/secretary/icla-dupes
(needs Secretary karma to run on the private data, but the code is here [1])

The code that extracts the duplicates can be adapted for use when a
new ICLA is received.
Rather than showing all potential duplicates it would only show ones
related to the current ICLA.
It could then display potential matches for review.

I'm working on some tweaks to the code, and it has shown some more duplicates.

Note that the code does not rely on the ordering of names.
The family first flag is not needed.
Sebb
[1] https://github.com/apache/whimsy/blob/master/www/secretary/icla-dupes.cgi