You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by Daniel Gong <da...@gmail.com> on 2009/03/30 11:12:47 UTC

My Proposal

Hi all,
   I'm a postgraduate student from Fudan University, Shanghai, China.
   This is my first time joining GSoC and I was not quite clear that I
should exchange my ideas with possible mentors. I've submitted my proposal
today. It's lucky that I can still modify it.
   Here is my proposal, any criticism and suggestions are welcome~

================================================

*Abstract: *

The main idea dealing with the subject is to treat XML DOM structure as a
DOM tree and translate the problem to computing diffs between tree
structures. Some algorithms exist for tree diff computing, such as Tree Edit
Distance. Some small modification should be made to adapt the algorithm to
the context.

*Detailed Description: *

The implementation of the module can be divided into 4 parts:

   1. Parse the XML text to get the DOM structure;
   2. Translate the DOM structure to tree structure;
   3. Employ some algorithm to computer the diffs;
   4. Translate the tree diffs to XML diffs;
   5. Display the diffs and maybe mail them.

*Initial Algorithm Design*

According to my past research experience, Tree Edit Distance is a class of
algorithms that using edit distance to measure tree similarity. The
algorithms define 3 types of edit operations on labled tree: insert, delete
and relabling. To measure the distance, the algorithms assign weights to
operations, and define the minimum weight summary of all possible edit
sequences between two trees as the edit distance.There is a corresponding
best edit sequence with the minimum weight. The sequence can be translated
to describe the diffs between XML texts.

*Draft Timeline*

   - Week 1 Complete a survey in the related area to decide the
   algorithm to employ;
   - Week 2-3 Implement the module of the XML parser and translater;
   - Week 4-6 Implement the algorithm chozen to compute tree diffs;
   - Week 7-8 Implement the module which translate the tree diffs to XML
   diffs and display them;
   - Week 9 Implement the module which can mail the diffs to certain mail
   address;
   - Week 10 Debug the whole module and make necessary modifications
   to successfully complete the subject.

*Additional Information:*

I've been learning and using Java since 3 years ago. Although my experience
in dealing with XML text with Java is not that vast, my knowledge in
programming, software architecture and algorithm can help me to learn fast
and handle the problem.

I'm 23 years old, living in Shanghai, China, attending Fudan University.

================================================

Re: My Proposal

Posted by Richard Frovarp <rf...@apache.org>.
Daniel Gong wrote:
> Hi Richard,
>  
> Some more question come to me. Will the XML text be large? By large I 
> mean maybe more than 100KB or 1MB?
> Besides, I've modified my proposal on http://socghop.appspot.com 
> <http://socghop.appspot.com/>. Thanks~
>  
> Yours Daniel
Well, in theory we are talking about the content section of a web page. 
That should only be a few KB at the top end. I have seen the editor go 
nuts and end up with a 1 MB title tag, but under normal circumstances 
that shouldn't happen. So, I would guess 10 KB at the top end. It's been 
a while since I've looked at page sizes. Maybe someone else on the list 
could give an estimate of their page sizes?

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: My Proposal

Posted by Daniel Gong <da...@gmail.com>.
Thanks Richard, I will try my best to make it as clear as possble.

Yours Daniel
On Thu, Apr 2, 2009 at 12:27 PM, Richard Frovarp
<Ri...@ndsu.edu>wrote:

> Daniel Gong wrote:
>
>> Hi Richard,
>>  Some more question come to me. Will the XML text be large? By large I
>> mean maybe more than 100KB or 1MB?
>> Besides, I've modified my proposal on http://socghop.appspot.com <
>> http://socghop.appspot.com/>. Thanks~
>>
>>
> It's looking good as updated. You may want to clarify what modification to
> the algorithm is needed. Anything else you can add to make it as clear as
> possible always help. The user display section is going to be important as
> it needs to be done in a fashion that someone who knows very little about
> web pages can understand at least part of what they are being shown.
>
>
> Richard
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail: dev-help@lenya.apache.org
>
>

Re: My Proposal

Posted by Richard Frovarp <Ri...@ndsu.edu>.
Daniel Gong wrote:
> Hi Richard,
>  
> Some more question come to me. Will the XML text be large? By large I 
> mean maybe more than 100KB or 1MB?
> Besides, I've modified my proposal on http://socghop.appspot.com 
> <http://socghop.appspot.com/>. Thanks~
>  
It's looking good as updated. You may want to clarify what modification 
to the algorithm is needed. Anything else you can add to make it as 
clear as possible always help. The user display section is going to be 
important as it needs to be done in a fashion that someone who knows 
very little about web pages can understand at least part of what they 
are being shown.

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: My Proposal

Posted by Daniel Gong <da...@gmail.com>.
Hi Richard,

Some more question come to me. Will the XML text be large? By large I mean
maybe more than 100KB or 1MB?
Besides, I've modified my proposal on http://socghop.appspot.com. Thanks~

Yours Daniel

On Thu, Apr 2, 2009 at 9:21 AM, Richard Frovarp <rf...@apache.org> wrote:

> Daniel Gong wrote:
>
>> Hi Richard,
>>  One more question, what does it mean by saying "The module could also
>> generate diffs based on rendered text" in the requirement? Does it mean the
>> module should be designed for general purpose and it can display diffs
>> between any XML texts, not only the revision history?
>>  Yours Daniel
>>
> The diffs for rendered text would be based on the text that the end users
> would see on the page, instead of just using the pure XML. White space of
> course would be ignored. Showing the user that they changed a chunk of text
> from underlined to strong visually instead of showing the differences in
> tags would be one example of the difference in rendered text.
>
>
> Richard
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail: dev-help@lenya.apache.org
>
>

Re: My Proposal

Posted by Andreas Hartmann <an...@apache.org>.
Hi Richard,

Richard Frovarp schrieb:
>> One more question, what does it mean by saying "The module could also 
>> generate diffs based on rendered text" in the requirement? Does it 
>> mean the module should be designed for general purpose and it can 
>> display diffs between any XML texts, not only the revision history?
>>  
>> Yours Daniel
> The diffs for rendered text would be based on the text that the end 
> users would see on the page, instead of just using the pure XML. White 
> space of course would be ignored. Showing the user that they changed a 
> chunk of text from underlined to strong visually instead of showing the 
> differences in tags would be one example of the difference in rendered 
> text.

do you consider the diff of the resulting text an additional 
requirement, or an alternative to the DOM-based diff? I think a 
DOM-based diff is more generic and needed in any case, there might be 
scenarios where the XML is not transformed to a human-readable output 
format at all.

-- Andreas


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: My Proposal

Posted by Richard Frovarp <rf...@apache.org>.
Daniel Gong wrote:
> Hi Richard,
>  
> One more question, what does it mean by saying "The module could also 
> generate diffs based on rendered text" in the requirement? Does it 
> mean the module should be designed for general purpose and it can 
> display diffs between any XML texts, not only the revision history?
>  
> Yours Daniel
The diffs for rendered text would be based on the text that the end 
users would see on the page, instead of just using the pure XML. White 
space of course would be ignored. Showing the user that they changed a 
chunk of text from underlined to strong visually instead of showing the 
differences in tags would be one example of the difference in rendered 
text.

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: My Proposal

Posted by Daniel Gong <da...@gmail.com>.
Hi Richard,

One more question, what does it mean by saying "The module could also
generate diffs based on rendered text" in the requirement? Does it mean the
module should be designed for general purpose and it can display diffs
between any XML texts, not only the revision history?

Yours Daniel

On Thu, Apr 2, 2009 at 1:07 AM, Richard Frovarp <rf...@apache.org> wrote:

> Daniel Gong wrote:
>
>> Hi Richard,
>>  I'm glad to hear from you and hope everything will be better with you.
>> Thanks for you advice. Answer a simple question first, my undergraduate
>> degree is also in Computer Science. I'm going to sleep now and I will expand
>> my proposal according to your advice several hours later. I hope it is not
>> very late for me to improve my proposal~
>> Best wishes to you and your family!
>>  Yours Daniel
>>
> Daniel,
>
> Sounds good. I see that I'm 13 hours behind you (I'm -5 hours at the
> moment). Send a notice to the list when you've updated it and I can take a
> look at it. If you get something up by 11am your time, I will still be awake
> and be able to respond before I go to bed.
>
>
> Richard
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail: dev-help@lenya.apache.org
>
>

Re: My Proposal

Posted by Richard Frovarp <rf...@apache.org>.
Daniel Gong wrote:
> Hi Richard,
>  
> I'm glad to hear from you and hope everything will be better with you.
> Thanks for you advice. Answer a simple question first, my 
> undergraduate degree is also in Computer Science. I'm going to sleep 
> now and I will expand my proposal according to your advice several 
> hours later. I hope it is not very late for me to improve my proposal~
> Best wishes to you and your family!
>  
> Yours Daniel
Daniel,

Sounds good. I see that I'm 13 hours behind you (I'm -5 hours at the 
moment). Send a notice to the list when you've updated it and I can take 
a look at it. If you get something up by 11am your time, I will still be 
awake and be able to respond before I go to bed.

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: My Proposal

Posted by Daniel Gong <da...@gmail.com>.
Hi Richard,

I'm glad to hear from you and hope everything will be better with you.
Thanks for you advice. Answer a simple question first, my undergraduate
degree is also in Computer Science. I'm going to sleep now and I will expand
my proposal according to your advice several hours later. I hope it is not
very late for me to improve my proposal~
Best wishes to you and your family!

Yours Daniel

On Thu, Apr 2, 2009 at 12:42 AM, Richard Frovarp <rf...@apache.org>wrote:

> Daniel Gong wrote:
>
>> Hi all,
>>   I'm a postgraduate student from Fudan University, Shanghai, China.
>>   This is my first time joining GSoC and I was not quite clear that I
>> should exchange my ideas with possible mentors. I've submitted my proposal
>> today. It's lucky that I can still modify it.
>>   Here is my proposal, any criticism and suggestions are welcome~
>>
> Daniel,
>
> Sorry for taking so long to reply. Yes, working with mentors, and in fact
> the community is part of the process, which is why it should go through the
> dev list.
>
>  ================================================
>>
>> *Abstract: *
>>
>> The main idea dealing with the subject is to treat XML DOM structure as a
>> DOM tree and translate the problem to computing diffs between tree
>> structures. Some algorithms exist for tree diff computing, such as Tree Edit
>> Distance. Some small modification should be made to adapt the algorithm to
>> the context.
>>
>> *Detailed Description: *
>>
>> The implementation of the module can be divided into 4 parts:
>>
>>   1. Parse the XML text to get the DOM structure;
>>   2. Translate the DOM structure to tree structure;
>>   3. Employ some algorithm to computer the diffs;
>>   4. Translate the tree diffs to XML diffs;
>>   5. Display the diffs and maybe mail them.
>>
>> We would like to see some more detail in the detailed description. This
> should be at least a couple of paragraphs long. It should show that you
> understand the scope, goals, and awareness of issues you may have while
> doing the project.
>
>  1.
>>
>>
>> */Initial Algorithm Design/*
>>
>> According to my past research experience, Tree Edit Distance is a class of
>> algorithms that using edit distance to measure tree similarity. The
>> algorithms define 3 types of edit operations on labled tree: insert, delete
>> and relabling. To measure the distance, the algorithms assign weights to
>> operations, and define the minimum weight summary of all possible edit
>> sequences between two trees as the edit distance.There is a corresponding
>> best edit sequence with the minimum weight. The sequence can be translated
>> to describe the diffs between XML texts.
>>
>> */Draft Timeline/*
>>
>>    * Week 1 Complete a survey in the related area to decide the
>>      algorithm to employ;    * Week 2-3 Implement the module of the XML
>> parser and translater;
>>    * Week 4-6 Implement the algorithm chozen to compute tree diffs;
>>    * Week 7-8 Implement the module which translate the tree diffs to
>>      XML diffs and display them;
>>    * Week 9 Implement the module which can mail the diffs to certain
>>      mail address;
>>    * Week 10 Debug the whole module and make necessary modifications
>>      to successfully complete the subject.
>>
>> Good detail in the timeline. It is important to note in your proposal how
> much time you have to dedicate towards the project and any other interfering
> factors (tests, job, etc).
>
>   *
>>
>>
>> *Additional Information:*
>>
>> I've been learning and using Java since 3 years ago. Although my
>> experience in dealing with XML text with Java is not that vast, my knowledge
>> in programming, software architecture and algorithm can help me to learn
>> fast and handle the problem.
>>
>> I'm 23 years old, living in Shanghai, China, attending Fudan University.
>>
>> ================================================
>>
>>
> You should further expand on your qualifications to do this job. What sort
> of projects you have done in the past that are similar and show that you
> have the skills to complete the task. For example, what is your
> undergraduate degree in?
>
> Richard
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail: dev-help@lenya.apache.org
>
>

Re: My Proposal

Posted by Richard Frovarp <rf...@apache.org>.
Daniel Gong wrote:
> Hi all,
>    I'm a postgraduate student from Fudan University, Shanghai, China.
>    This is my first time joining GSoC and I was not quite clear that I 
> should exchange my ideas with possible mentors. I've submitted my 
> proposal today. It's lucky that I can still modify it.
>    Here is my proposal, any criticism and suggestions are welcome~
Daniel,

Sorry for taking so long to reply. Yes, working with mentors, and in 
fact the community is part of the process, which is why it should go 
through the dev list.

>  
> ================================================
>
> *Abstract: *
>
> The main idea dealing with the subject is to treat XML DOM structure 
> as a DOM tree and translate the problem to computing diffs between 
> tree structures. Some algorithms exist for tree diff computing, such 
> as Tree Edit Distance. Some small modification should be made to adapt 
> the algorithm to the context.
>
> *Detailed Description: *
>
> The implementation of the module can be divided into 4 parts:
>
>    1. Parse the XML text to get the DOM structure;
>    2. Translate the DOM structure to tree structure;
>    3. Employ some algorithm to computer the diffs;
>    4. Translate the tree diffs to XML diffs;
>    5. Display the diffs and maybe mail them.
>
We would like to see some more detail in the detailed description. This 
should be at least a couple of paragraphs long. It should show that you 
understand the scope, goals, and awareness of issues you may have while 
doing the project.

>   1.
>
>
> */Initial Algorithm Design/*
>
> According to my past research experience, Tree Edit Distance is a 
> class of algorithms that using edit distance to measure tree 
> similarity. The algorithms define 3 types of edit operations on labled 
> tree: insert, delete and relabling. To measure the distance, the 
> algorithms assign weights to operations, and define the minimum weight 
> summary of all possible edit sequences between two trees as the edit 
> distance.There is a corresponding best edit sequence with the minimum 
> weight. The sequence can be translated to describe the diffs between 
> XML texts.
>
> */Draft Timeline/*
>
>     * Week 1 Complete a survey in the related area to decide the
>       algorithm to employ; 
>     * Week 2-3 Implement the module of the XML parser and translater;
>     * Week 4-6 Implement the algorithm chozen to compute tree diffs;
>     * Week 7-8 Implement the module which translate the tree diffs to
>       XML diffs and display them;
>     * Week 9 Implement the module which can mail the diffs to certain
>       mail address;
>     * Week 10 Debug the whole module and make necessary modifications
>       to successfully complete the subject.
>
Good detail in the timeline. It is important to note in your proposal 
how much time you have to dedicate towards the project and any other 
interfering factors (tests, job, etc).

>    *
>
>
> *Additional Information:*
>
> I've been learning and using Java since 3 years ago. Although my 
> experience in dealing with XML text with Java is not that vast, my 
> knowledge in programming, software architecture and algorithm can help 
> me to learn fast and handle the problem.
>
> I'm 23 years old, living in Shanghai, China, attending Fudan University.
>
> ================================================
>

You should further expand on your qualifications to do this job. What 
sort of projects you have done in the past that are similar and show 
that you have the skills to complete the task. For example, what is your 
undergraduate degree in?

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org