You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@creadur.apache.org by Tharindu Mathew <mc...@gmail.com> on 2009/03/10 05:45:38 UTC

[gsoc] RAT 1 Cut&Paste Detector

Hi,

I'm a student interested in the Cut and Paste detector.

I was wondering about the scope of this project. Does this iclude parsing
just a few regex strings to code search? Or are we looking at a
spohisticated mechanism where even a neural network maybe trained to
identify certain code patterns?


Regards,

Tharindu

Re: [gsoc] RAT 1 Cut&Paste Detector

Posted by Alexei Fedotov <al...@gmail.com>.
Hello Tharindu,
Thanks for your question.

Simple methods work well for real-life tasks, while neural networks
may not. For example, I used regular expressions to  find multi-line
comments, and yes, most of them were copied. Also since legal matters
are involved, the simpler is our algorithm, the more understandable
are scan implications (and complications).

>From architectural point of view it would be nice to separate logic
into a separate module, so a sliding window could be easily replaced
with more sophisticated algorithms, even those you'd mentioned.

I wish you good luck trying GSoC this year.



On Tue, Mar 10, 2009 at 7:45 AM, Tharindu Mathew <mc...@gmail.com> wrote:
> Hi,
>
> I'm a student interested in the Cut and Paste detector.
>
> I was wondering about the scope of this project. Does this iclude parsing
> just a few regex strings to code search? Or are we looking at a
> spohisticated mechanism where even a neural network maybe trained to
> identify certain code patterns?
>
>
> Regards,
>
> Tharindu
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/