You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@creadur.apache.org by Tharindu Mathew <mc...@gmail.com> on 2009/03/10 05:45:38 UTC
[gsoc] RAT 1 Cut&Paste Detector
Hi,
I'm a student interested in the Cut and Paste detector.
I was wondering about the scope of this project. Does this iclude parsing
just a few regex strings to code search? Or are we looking at a
spohisticated mechanism where even a neural network maybe trained to
identify certain code patterns?
Regards,
Tharindu
Re: [gsoc] RAT 1 Cut&Paste Detector
Posted by Alexei Fedotov <al...@gmail.com>.
Hello Tharindu,
Thanks for your question.
Simple methods work well for real-life tasks, while neural networks
may not. For example, I used regular expressions to find multi-line
comments, and yes, most of them were copied. Also since legal matters
are involved, the simpler is our algorithm, the more understandable
are scan implications (and complications).
>From architectural point of view it would be nice to separate logic
into a separate module, so a sliding window could be easily replaced
with more sophisticated algorithms, even those you'd mentioned.
I wish you good luck trying GSoC this year.
On Tue, Mar 10, 2009 at 7:45 AM, Tharindu Mathew <mc...@gmail.com> wrote:
> Hi,
>
> I'm a student interested in the Cut and Paste detector.
>
> I was wondering about the scope of this project. Does this iclude parsing
> just a few regex strings to code search? Or are we looking at a
> spohisticated mechanism where even a neural network maybe trained to
> identify certain code patterns?
>
>
> Regards,
>
> Tharindu
>
--
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/