You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Alexei Fedotov <al...@gmail.com> on 2009/03/03 12:46:55 UTC

An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Hello folks,

I want to know your opinion concerning the following matter. Recently
I asked if one knows a free anti-plagiarism for scanning my project
before an incubation. There was no answer if one knew.

I think of suggesting this task for GSoC. One may use Google code
search for detecting suspicious comments and code constructs in new
contributions. The code search allows using of regular expressions,
which allows whitespace and variable name differences to be neglected
during comparison. What do you think?

Thank you!


On Wed, Dec 24, 2008 at 10:33 AM, Alexei Fedotov
<al...@gmail.com> wrote:
> Hello, Bernd,
> Thanks for a quick answer.
>
> Well, we do have a working code base, that's correct. I might perceive
> things as too complex having concerns about a gap between working code
> and the code which can be committed to the Apache subversion. Our code
> might (and likely was)  tainted by cut&pasted samples, open source
> fragments and third party inclusions with unknown authors and
> uncertain licensing models. If I would understand how to define
> questionable code and would remove the code, then this would result in
> the broken code base. There are minor issues like comments in Russian
> as well.
>
> My approach to avoid the licensing mess is writing from scratch. I
> plan moving an ongoing development to Apache and envision building new
> modules as pluggable APL-licensed libraries which then can be reused
> by the multiple-licensed project. Another goal is to import the code
> under different licenses using a build system rather then mixing all
> the code in the phone code base. Such build system have to be
> developed from scratch as well. That is I plan to use the labs for.
>
> I'm CCing this letter to general.AT.incubator.apache.org hoping to
> attract incubationers attention to the discussion and get your opinion
> whether it worth to start incubation or lab for the Pulse project, see
> the project description below. I have passed Apache Harmony incubation
> led by Geir, and have to admit openly that we are currently less
> prepared for it than Intel and IBM were. As the first step I have to
> find a anti-plagiarism tool reliable enough to detect GPL
> contamination and copied samples from code guru web sites. Could
> anyone share any tool which is preferably free for Apache committers?
>
> As for JIRA usage, I mean the following problem. As I've already said,
> I cannot see how to open the code for the whole project at once. Now
> imagine, that bugs opened for opened components would use Apache JIRA,
> bugs for closed components would use our internal Bugzilla, and some
> bugs during resolution would travel between two these tracking
> systems. I personally believe it would be nice to have all new bugs
> stored in one place. I don't think that allowing people keeping all
> bugs in JIRA would be an abuse for Apache because I don't expect heavy
> bug traffic. We have 45 issues so far.
>
> As for the binary, I got the point on careful naming. We may re-use
> Harmony "snapshot build" term to name it carefully.
> Thanks!
>
> With best regards, Alexei
>
> On Tue, Dec 23, 2008 at 9:55 PM, Bernd Fondermann <bf...@brainlounge.de> wrote:
>> Hi Alexei,
>>
>> Alexei Fedotov wrote:
>>>
>>> Hello folks,
>>>
>>> Recently I became involved in development of H.323-compatible software
>>> videophone [1] based on MPL-licensed OpenH323. It is mostly the same
>>> as Ekiga [2], though we hardly can join GPL-ed project due to complex
>>> licensing
>>> of the existing code. I have a strategic goal to expose our project
>>> code under Apache license and
>>> move related development to Apache. I believe that working on Apache would
>>> help
>>> us achieving legally clean code. As a nice side benefits goal I see
>>> openness to the English-speaking world which believes that software can be
>>> bought and not necessarily pirated, and providing an open alternative
>>> to the closed-source solutions ranged from Skype to Tandberg and Polycom,
>>> thus changing the world into being more competitive and
>>> customer-friendly. I would like taking advantage of APR though staying
>>> with PTLib seemed
>>> to be more realistic.
>>
>> If you have an existing code base and don't start from scratch - and that's
>> what I read from your mail - the Incubator seems more appropriate to me than
>> Labs.
>>
>>> I believe it would be feasible providing several portable and
>>> re-usable communication libraries with clean interfaces under APL as the
>>> first
>>> step. Well, this requires much more understanding from the fellow
>>> stakeholders including developers who are not familiar with an open
>>> source model. I plan to resolve patenting issues around codecs Ekiga is
>>> facing
>>> by providing enough modularity of the project design and keeping
>>> questionable modules out of the Apache source base.
>>
>> Ok, fine.
>>
>>> I wonder what do you think about using Apache labs as a
>>> launching pad for the project. It is hard to guarantee now if it would
>>> be possible to expose enough code to build a self sufficient product.
>>
>> See my first comment. If you indeed start a lab (or at the Incubator), you
>> don't need to say where your project will end up, finally. Just start and
>> see where it takes you.
>>
>>> Some parts of our product are contaminated while others are
>>> third party legacy of uncertain origin (e.g. may be contaminated as well).
>>> We may only hope to concentrate enough resources to rewrite all of them.
>>
>> see first comment...
>>
>>>
>>> Another question is about bug tracking facility usage. Would it be an
>>> abuse to
>>> use it for all bugs related to the project even if the parts of it were
>>> closed?
>>
>> You mean, transferring the whole history of issues to a JIRA? What does it
>> help?
>>
>>> The last question is about proper binary placement. Is it ok to place the
>>> binary
>>> for those who would like to try the product on the web site of our
>>> company,
>>> providing references from Apache to our site? Placing the binary at Apache
>>> labs
>>> contradicts to no-releases labs policy.
>>
>> You can always release ASF code on your own (commercially or not). But the
>> level of endorsement of the ASF is a delicate question. For example,
>> releases from the Incubator need to be referred to as "in incubation".
>>
>>> Finally, how does the project name "pulse" sound for your ear? Could
>>> anyone
>>> suggest anything better?
>>
>> Sounds good to my ears. (But I have a lab with a really stupid name, so you
>> probably want to hear more opinions on this.)
>>
>>  Bernd
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: labs-unsubscribe@labs.apache.org
>> For additional commands, e-mail: labs-help@labs.apache.org
>>
>>
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Ross,
Thank you for your encouraging comment and for sharing your thoughts.
I have updated the proposal with the first objective to define the
algorithm and by adding more code search engines.

[1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste

On Thu, Mar 5, 2009 at 7:39 PM, Ross Gardler <rg...@apache.org> wrote:
> 2009/3/5 Alexei Fedotov <al...@gmail.com>:
>> Hello folks,
>> Could you please take a look at [1]? Does it look ok?
>>
>> With best regards, Alexei
>>
>> [1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste
>
> The problem I see with this approach is the load it would put on
> Google code search. We can't search for just any old match. I'd
> suggest that the project needs an objective to define a search
> algorithm.
>
> For example, we may have a list of "common" variable and method names,
> then search through the code looking for uncommon ones, then use those
> as an initial search.
>
> Ross
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Ross Gardler <rg...@apache.org>.
2009/3/5 Alexei Fedotov <al...@gmail.com>:
> Hello folks,
> Could you please take a look at [1]? Does it look ok?
>
> With best regards, Alexei
>
> [1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste

The problem I see with this approach is the load it would put on
Google code search. We can't search for just any old match. I'd
suggest that the project needs an objective to define a search
algorithm.

For example, we may have a list of "common" variable and method names,
then search through the code looking for uncommon ones, then use those
as an initial search.

Ross

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Hello folks,
Could you please take a look at [1]? Does it look ok?

With best regards, Alexei

[1] http://wiki.apache.org/general/SummerOfCode2009#rat-1-cutnpaste


On Tue, Mar 3, 2009 at 6:12 PM, Alexei Fedotov <al...@gmail.com> wrote:
> Robert,
> Thanks for a quick answer. I have read RAT's proposal, and I agree
> that it is a right place to go. Do I understand correctly that you and
> your project fellows are open to start this GSoC task? I'm open for
> any form of collaboration, e.g. co-mentorship.
>
> Wtih best regards, Alexei
>
> P.S. Technically, it is ok to create the anti-plagiarism tool using
> Java: Google provides Javi API. The next similarity is in heuristic
> nature of comparison. Could you please provide a pointer how RAT
> heuristics are kept in the code?
>
> On Tue, Mar 3, 2009 at 3:46 PM, Robert Burrell Donkin
> <ro...@gmail.com> wrote:
>> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
>> <al...@gmail.com> wrote:
>>> Hello folks,
>>>
>>> I want to know your opinion concerning the following matter. Recently
>>> I asked if one knows a free anti-plagiarism for scanning my project
>>> before an incubation. There was no answer if one knew.
>>>
>>> I think of suggesting this task for GSoC. One may use Google code
>>> search for detecting suspicious comments and code constructs in new
>>> contributions. The code search allows using of regular expressions,
>>> which allows whitespace and variable name differences to be neglected
>>> during comparison. What do you think?
>>
>> this is - i think - nearly in scope for RAT
>>
>> opinions?
>>
>> - robert
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
>
> --
> С уважением,
> Алексей Федотов,
> http://people.apache.org/~aaf/
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Robert,
Thanks for a quick answer. I have read RAT's proposal, and I agree
that it is a right place to go. Do I understand correctly that you and
your project fellows are open to start this GSoC task? I'm open for
any form of collaboration, e.g. co-mentorship.

Wtih best regards, Alexei

P.S. Technically, it is ok to create the anti-plagiarism tool using
Java: Google provides Javi API. The next similarity is in heuristic
nature of comparison. Could you please provide a pointer how RAT
heuristics are kept in the code?

On Tue, Mar 3, 2009 at 3:46 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
> <al...@gmail.com> wrote:
>> Hello folks,
>>
>> I want to know your opinion concerning the following matter. Recently
>> I asked if one knows a free anti-plagiarism for scanning my project
>> before an incubation. There was no answer if one knew.
>>
>> I think of suggesting this task for GSoC. One may use Google code
>> search for detecting suspicious comments and code constructs in new
>> contributions. The code search allows using of regular expressions,
>> which allows whitespace and variable name differences to be neglected
>> during comparison. What do you think?
>
> this is - i think - nearly in scope for RAT
>
> opinions?
>
> - robert
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Ross Gardler <rg...@apache.org>.
2009/3/3 Robert Burrell Donkin <ro...@gmail.com>:
> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
> <al...@gmail.com> wrote:
>> Hello folks,
>>
>> I want to know your opinion concerning the following matter. Recently
>> I asked if one knows a free anti-plagiarism for scanning my project
>> before an incubation. There was no answer if one knew.
>>
>> I think of suggesting this task for GSoC. One may use Google code
>> search for detecting suspicious comments and code constructs in new
>> contributions. The code search allows using of regular expressions,
>> which allows whitespace and variable name differences to be neglected
>> during comparison. What do you think?
>
> this is - i think - nearly in scope for RAT
>
> opinions?

I think this is a great idea. There are some tools out there that look
for cut and paste code in order to help in refactoring. It would be
interesting to do an initial search using Google Code, Koders and
other such repositories. When a possible match is found we could then
check-out the full project and use an existing tool to do a more
thorough analysis. I'm not suggesting the latter should be part of a
GSoC proposal, just indicating that this could be further developed.

+1 for a proposal.

Ross

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Alexei Fedotov <al...@gmail.com>.
Robert,
Thanks for a quick answer. I have read RAT's proposal, and I agree
that it is a right place to go. Do I understand correctly that you and
your project fellows are open to start this GSoC task? I'm open for
any form of collaboration, e.g. co-mentorship.

Wtih best regards, Alexei

P.S. Technically, it is ok to create the anti-plagiarism tool using
Java: Google provides Javi API. The next similarity is in heuristic
nature of comparison. Could you please provide a pointer how RAT
heuristics are kept in the code?

On Tue, Mar 3, 2009 at 3:46 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
> <al...@gmail.com> wrote:
>> Hello folks,
>>
>> I want to know your opinion concerning the following matter. Recently
>> I asked if one knows a free anti-plagiarism for scanning my project
>> before an incubation. There was no answer if one knew.
>>
>> I think of suggesting this task for GSoC. One may use Google code
>> search for detecting suspicious comments and code constructs in new
>> contributions. The code search allows using of regular expressions,
>> which allows whitespace and variable name differences to be neglected
>> during comparison. What do you think?
>
> this is - i think - nearly in scope for RAT
>
> opinions?
>
> - robert
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: An anti-plagiarism tool for GSoC Was: [pulse] A lab or an incubator?

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Tue, Mar 3, 2009 at 11:46 AM, Alexei Fedotov
<al...@gmail.com> wrote:
> Hello folks,
>
> I want to know your opinion concerning the following matter. Recently
> I asked if one knows a free anti-plagiarism for scanning my project
> before an incubation. There was no answer if one knew.
>
> I think of suggesting this task for GSoC. One may use Google code
> search for detecting suspicious comments and code constructs in new
> contributions. The code search allows using of regular expressions,
> which allows whitespace and variable name differences to be neglected
> during comparison. What do you think?

this is - i think - nearly in scope for RAT

opinions?

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org