You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jason Baldridge (JIRA)" <ji...@apache.org> on 2011/06/08 18:17:58 UTC

[jira] [Created] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Addition of prepositional phrase attachment dataset and unit test for it
------------------------------------------------------------------------

                 Key: OPENNLP-200
                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
             Project: OpenNLP
          Issue Type: New Feature
          Components: Maxent
            Reporter: Jason Baldridge
            Priority: Minor


I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:

http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059811#comment-13059811 ] 

Jörn Kottmann commented on OPENNLP-200:
---------------------------------------

Ok, then lets remove it from our svn repository and attach it as a patch to this issue, when the IP clearance is done we can commit the patch.

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092381#comment-13092381 ] 

Joern Kottmann commented on OPENNLP-200:
----------------------------------------

The test is using the platform default encoding to read the data set. Since the default encoding is platform and location dependent this test will fail on other machines, or produce different results.

To fix this always specify the encoding when opening the data, and it should be retrieved via the class path instead.

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: OPENNLP-200.patch, ppa.tar.gz
>
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Jason Baldridge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059664#comment-13059664 ] 

Jason Baldridge commented on OPENNLP-200:
-----------------------------------------

+1 Fine to remove it.

Sorry not to have moved on this. I've been busy, and you said to discuss the
issues on the list, and I had referential failure and didn't get back to it.


2011/7/1 Jörn Kottmann (JIRA) <ji...@apache.org>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058271#comment-13058271 ] 

Jörn Kottmann commented on OPENNLP-200:
---------------------------------------

This issue is actually a release blocker, because we cannot release things which are not IP cleared.
Basically this leaves us with two options, get the IP clearance done soon, or defer.
I am actually +1 to defer. That would mean to remove the test and data, wait until the IP clearance is done and add it again. The reason I would like to defer is that I fear that doing the clearance takes too long and puts us in a state where we cannot release.

Yeah, I also think that doing all this paper stuff is annoying, and that is sucks to remove this nice test, but that are the rules the ASF agreed on, and which we have to follow as an ASF project.

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Adwait Ratnaparkhi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adwait Ratnaparkhi updated OPENNLP-200:
---------------------------------------

    Attachment: ppa.tar.gz

Prepositional Phrase Attachment Dataset from 

Ratnaparkhi, Reynar, & Roukos. "A Maximum Entropy Model for Prepositional Phrase Attachment". ARPA HLT 1994. 

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: OPENNLP-200.patch, ppa.tar.gz
>
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Jason Baldridge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053473#comment-13053473 ] 

Jason Baldridge commented on OPENNLP-200:
-----------------------------------------

No... what is the procedure?

2011/6/22 Jörn Kottmann (JIRA) <ji...@apache.org>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann updated OPENNLP-200:
-----------------------------------

    Fix Version/s: maxent-3.0.2-incubating
                   tools-1.5.2-incubating

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: OPENNLP-200.patch
>
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jörn Kottmann updated OPENNLP-200:
----------------------------------

    Attachment: OPENNLP-200.patch

The patch contains the rolled-back change and should be applied again when the IP clearance is done.

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>         Attachments: OPENNLP-200.patch
>
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Closed] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann closed OPENNLP-200.
----------------------------------

    Resolution: Fixed
      Assignee: Joern Kottmann

Added more tests for the perceptron training, and added a test for maxent training.

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Assignee: Joern Kottmann
>            Priority: Minor
>              Labels: data, testing
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: OPENNLP-200.patch, ppa.tar.gz
>
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092389#comment-13092389 ] 

Joern Kottmann commented on OPENNLP-200:
----------------------------------------

I fixed the issues mentioned above. Re-factored the test a little, and added an additional test for maxent.

We should add more tests, to test the training code with various different settings.

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: OPENNLP-200.patch, ppa.tar.gz
>
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053842#comment-13053842 ] 

Jörn Kottmann commented on OPENNLP-200:
---------------------------------------

I am not sure what the process is in this case, maybe the original creator of the data has to sign a SGA. Please discuss the issue on the mailing list.

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-200) Addition of prepositional phrase attachment dataset and unit test for it

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053069#comment-13053069 ] 

Jörn Kottmann commented on OPENNLP-200:
---------------------------------------

Any updates here on the IP clearance?

> Addition of prepositional phrase attachment dataset and unit test for it
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-200
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-200
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>            Reporter: Jason Baldridge
>            Priority: Minor
>              Labels: data, testing
>
> I have obtained permission from Adwait Ratnaparkhi to include his prepositional phrase attachment dataset in the distribution as a test case. Jorn correctly points out that we need to see whether this is ASF compliant. Here is the original dataset:
> http://sites.google.com/site/adwaitratnaparkhi/publications/ppa.tar.gz?attredirects=0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira