You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Robert Burrell Donkin <ro...@gmail.com> on 2009/03/03 13:46:29 UTC

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
<al...@gmail.com> wrote:
> Hello folks,
>
> I want to know your opinion concerning the following matter. Recently
> I asked if one knows a free anti-plagiarism for scanning my project
> before an incubation. There was no answer if one knew.
>
> I think of suggesting this task for GSoC. One may use Google code
> search for detecting suspicious comments and code constructs in new
> contributions. The code search allows using of regular expressions,
> which allows whitespace and variable name differences to be neglected
> during comparison. What do you think?

this is - i think - nearly in scope for RAT

opinions?

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Ross,
Thank you for your encouraging comment and for sharing your thoughts.
I have updated the proposal with the first objective to define the
algorithm and by adding more code search engines.

[1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste

On Thu, Mar 5, 2009 at 7:39 PM, Ross Gardler <rg...@apache.org> wrote:
> 2009/3/5 Alexei Fedotov <al...@gmail.com>:
>> Hello folks,
>> Could you please take a look at [1]? Does it look ok?
>>
>> With best regards, Alexei
>>
>> [1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste
>
> The problem I see with this approach is the load it would put on
> Google code search. We can't search for just any old match. I'd
> suggest that the project needs an objective to define a search
> algorithm.
>
> For example, we may have a list of "common" variable and method names,
> then search through the code looking for uncommon ones, then use those
> as an initial search.
>
> Ross
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Ross Gardler <rg...@apache.org>.
2009/3/5 Alexei Fedotov <al...@gmail.com>:
> Hello folks,
> Could you please take a look at [1]? Does it look ok?
>
> With best regards, Alexei
>
> [1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste

The problem I see with this approach is the load it would put on
Google code search. We can't search for just any old match. I'd
suggest that the project needs an objective to define a search
algorithm.

For example, we may have a list of "common" variable and method names,
then search through the code looking for uncommon ones, then use those
as an initial search.

Ross

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Hello folks,
Could you please take a look at [1]? Does it look ok?

With best regards, Alexei

[1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste


On Tue, Mar 3, 2009 at 6:12 PM, Alexei Fedotov <al...@gmail.com> wrote:
> Robert,
> Thanks for a quick answer. I have read RAT's proposal, and I agree
> that it is a right place to go. Do I understand correctly that you and
> your project fellows are open to start this GSoC task? I'm open for
> any form of collaboration, e.g. co-mentorship.
>
> Wtih best regards, Alexei
>
> P.S. Technically, it is ok to create the anti-plagiarism tool using
> Java: Google provides Javi API. The next similarity is in heuristic
> nature of comparison. Could you please provide a pointer how RAT
> heuristics are kept in the code?
>
> On Tue, Mar 3, 2009 at 3:46 PM, Robert Burrell Donkin
> <ro...@gmail.com> wrote:
>> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
>> <al...@gmail.com> wrote:
>>> Hello folks,
>>>
>>> I want to know your opinion concerning the following matter. Recently
>>> I asked if one knows a free anti-plagiarism for scanning my project
>>> before an incubation. There was no answer if one knew.
>>>
>>> I think of suggesting this task for GSoC. One may use Google code
>>> search for detecting suspicious comments and code constructs in new
>>> contributions. The code search allows using of regular expressions,
>>> which allows whitespace and variable name differences to be neglected
>>> during comparison. What do you think?
>>
>> this is - i think - nearly in scope for RAT
>>
>> opinions?
>>
>> - robert
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
>
> --
> С уважением,
> Алексей Федотов,
> http://people.apache.org/~aaf/
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Robert,
Thanks for a quick answer. I have read RAT's proposal, and I agree
that it is a right place to go. Do I understand correctly that you and
your project fellows are open to start this GSoC task? I'm open for
any form of collaboration, e.g. co-mentorship.

Wtih best regards, Alexei

P.S. Technically, it is ok to create the anti-plagiarism tool using
Java: Google provides Javi API. The next similarity is in heuristic
nature of comparison. Could you please provide a pointer how RAT
heuristics are kept in the code?

On Tue, Mar 3, 2009 at 3:46 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
> <al...@gmail.com> wrote:
>> Hello folks,
>>
>> I want to know your opinion concerning the following matter. Recently
>> I asked if one knows a free anti-plagiarism for scanning my project
>> before an incubation. There was no answer if one knew.
>>
>> I think of suggesting this task for GSoC. One may use Google code
>> search for detecting suspicious comments and code constructs in new
>> contributions. The code search allows using of regular expressions,
>> which allows whitespace and variable name differences to be neglected
>> during comparison. What do you think?
>
> this is - i think - nearly in scope for RAT
>
> opinions?
>
> - robert
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Ross Gardler <rg...@apache.org>.
2009/3/3 Robert Burrell Donkin <ro...@gmail.com>:
> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
> <al...@gmail.com> wrote:
>> Hello folks,
>>
>> I want to know your opinion concerning the following matter. Recently
>> I asked if one knows a free anti-plagiarism for scanning my project
>> before an incubation. There was no answer if one knew.
>>
>> I think of suggesting this task for GSoC. One may use Google code
>> search for detecting suspicious comments and code constructs in new
>> contributions. The code search allows using of regular expressions,
>> which allows whitespace and variable name differences to be neglected
>> during comparison. What do you think?
>
> this is - i think - nearly in scope for RAT
>
> opinions?

I think this is a great idea. There are some tools out there that look
for cut and paste code in order to help in refactoring. It would be
interesting to do an initial search using Google Code, Koders and
other such repositories. When a possible match is found we could then
check-out the full project and use an existing tool to do a more
thorough analysis. I'm not suggesting the latter should be part of a
GSoC proposal, just indicating that this could be further developed.

+1 for a proposal.

Ross

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Robert,
Thanks for a quick answer. I have read RAT's proposal, and I agree
that it is a right place to go. Do I understand correctly that you and
your project fellows are open to start this GSoC task? I'm open for
any form of collaboration, e.g. co-mentorship.

Wtih best regards, Alexei

P.S. Technically, it is ok to create the anti-plagiarism tool using
Java: Google provides Javi API. The next similarity is in heuristic
nature of comparison. Could you please provide a pointer how RAT
heuristics are kept in the code?

On Tue, Mar 3, 2009 at 3:46 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
> <al...@gmail.com> wrote:
>> Hello folks,
>>
>> I want to know your opinion concerning the following matter. Recently
>> I asked if one knows a free anti-plagiarism for scanning my project
>> before an incubation. There was no answer if one knew.
>>
>> I think of suggesting this task for GSoC. One may use Google code
>> search for detecting suspicious comments and code constructs in new
>> contributions. The code search allows using of regular expressions,
>> which allows whitespace and variable name differences to be neglected
>> during comparison. What do you think?
>
> this is - i think - nearly in scope for RAT
>
> opinions?
>
> - robert
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org