You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ctakes.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2013/07/18 03:35:46 UTC

[jira] [Commented] (CTAKES-217) create a tool for "diff"-ing two CASes

    [ https://issues.apache.org/jira/browse/CTAKES-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711911#comment-13711911 ] 

ASF subversion and git services commented on CTAKES-217:
--------------------------------------------------------

Commit 1504339 from [~steven.bethard] in branch 'ctakes/trunk'
[ https://svn.apache.org/r1504339 ]

CTAKES-217: Revises CompareFeatureStructures to use java-diff-utils. The search for FeatureStructure equality is the same, but now nested uses of DiffUtils produce what is hopefully better output. In particular, there should now be more useful output for the case where annotations have been inserted or deleted (not just changed).
                
> create a tool for "diff"-ing two CASes
> --------------------------------------
>
>                 Key: CTAKES-217
>                 URL: https://issues.apache.org/jira/browse/CTAKES-217
>             Project: cTAKES
>          Issue Type: New Feature
>            Reporter: Steven Bethard
>
> It would be handy to be able to easily get a "diff" of two CASes. Some possibilities:
> (1) Just diff the XMIs. This doesn't work very well because the IDs are typically different in different XMIs generated from the same annotations.
> (2) Output all annotations, using their .toString(), and diff that file using a standard diff algorithm. This might mostly work if we could guarantee a consistent ordering of the annotations in the CAS. (That's easy to do for Annotations, but not always possible for TOPs.) But some things aren't displayed in the .toString(), e.g. the values inside FSArrays and FSLists.
> In r1504269, I added CompareFeatureStructures which isn't either of these, but is a bit closer to (2). It sorts annotations by offset (and for TOPs, looks through their features to find offsets), and then compares each pair of FeatureStructures by walking the tree of their features. I'm mostly happy with how it handles the comparison of two FeatureStructures (though .toString() is a bit hacky).
> The main issue is that it doesn't really do anything useful if you have different numbers of annotations in the two CASes. It just prints a message saying that the numbers are different. Instead, it should be able to identify insertions and deletions of annotations. Probably there's a way to do this with java-diff-utils, though I wasn't able to figure one out on my first attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

RE: [jira] [Commented] (CTAKES-217) create a tool for "diff"-ing two CASes

Posted by "Masanz, James J." <Ma...@mayo.edu>.
have you taken a look at 
org.apache.ctakes.utils.xcas_comparison.Compare

Without looking at the source, I've forgotten most of the little I once knew about it.
But we had suggested it in cTAKES 1.0 for helping people compare some parts at least
Maybe you will find some part of it helpful?

-- James
________________________________________
From: notifications-return-715-Masanz.James=mayo.edu@ctakes.apache.org [notifications-return-715-Masanz.James=mayo.edu@ctakes.apache.org] on behalf of ASF subversion and git services (JIRA) [jira@apache.org]
Sent: Wednesday, July 17, 2013 8:35 PM
To: notifications@ctakes.apache.org
Subject: [jira] [Commented] (CTAKES-217) create a tool for "diff"-ing two CASes

    [ https://issues.apache.org/jira/browse/CTAKES-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711911#comment-13711911 ]

ASF subversion and git services commented on CTAKES-217:
--------------------------------------------------------

Commit 1504339 from [~steven.bethard] in branch 'ctakes/trunk'
[ https://svn.apache.org/r1504339 ]

CTAKES-217: Revises CompareFeatureStructures to use java-diff-utils. The search for FeatureStructure equality is the same, but now nested uses of DiffUtils produce what is hopefully better output. In particular, there should now be more useful output for the case where annotations have been inserted or deleted (not just changed).

> create a tool for "diff"-ing two CASes
> --------------------------------------
>
>                 Key: CTAKES-217
>                 URL: https://issues.apache.org/jira/browse/CTAKES-217
>             Project: cTAKES
>          Issue Type: New Feature
>            Reporter: Steven Bethard
>
> It would be handy to be able to easily get a "diff" of two CASes. Some possibilities:
> (1) Just diff the XMIs. This doesn't work very well because the IDs are typically different in different XMIs generated from the same annotations.
> (2) Output all annotations, using their .toString(), and diff that file using a standard diff algorithm. This might mostly work if we could guarantee a consistent ordering of the annotations in the CAS. (That's easy to do for Annotations, but not always possible for TOPs.) But some things aren't displayed in the .toString(), e.g. the values inside FSArrays and FSLists.
> In r1504269, I added CompareFeatureStructures which isn't either of these, but is a bit closer to (2). It sorts annotations by offset (and for TOPs, looks through their features to find offsets), and then compares each pair of FeatureStructures by walking the tree of their features. I'm mostly happy with how it handles the comparison of two FeatureStructures (though .toString() is a bit hacky).
> The main issue is that it doesn't really do anything useful if you have different numbers of annotations in the two CASes. It just prints a message saying that the numbers are different. Instead, it should be able to identify insertions and deletions of annotations. Probably there's a way to do this with java-diff-utils, though I wasn't able to figure one out on my first attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira