You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by lewis john mcgibbney <le...@apache.org> on 2019/02/15 20:46:15 UTC
Fwd: February 2019 Newsletter - LDC

---------- Forwarded message ---------
From: Mcgibbney, Lewis J (398M) <Le...@jpl.nasa.gov>
Date: Fri, Feb 15, 2019 at 10:56
Subject: Fwd: February 2019 Newsletter - LDC
To: lewismc@apache.org <le...@apache.org>




Sent from my iPhone

Begin forwarded message:

*From:* Penn LDC <ld...@ldc.upenn.edu>
*Date:* February 15, 2019 at 08:11:05 PST
*To:* "'ldc-customers1@ldc.upenn.edu'" <ld...@ldc.upenn.edu>
*Subject:* *February 2019 Newsletter - LDC*

*In this newsletter:*



*Only two weeks left to enjoy 2019 membership discounts*



*Spring 2019 LDC Data Scholarship recipients*



*LDC’s new language game*



*New publications:*

DEFT Chinese Committed Belief Annotation
<https://catalog.ldc.upenn.edu/LDC2019T03>

IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b
<https://catalog.ldc.upenn.edu/LDC2019S03>

Multi-Language Conversational Telephone Speech 2011 -- Arabic Group
<https://catalog.ldc.upenn.edu/LDC2019S02>

Multilingual ATIS <https://catalog.ldc.upenn.edu/LDC2019T04>

_____________________________________________________________________________



*Only two weeks left to enjoy 2019 membership discounts*

There is still time to save on 2019 membership fees. Through March 1, all
organizations receive a discount on the 2019 membership fee (up to 10%)
when they choose to join or renew. For more information on membership
benefits, visit Join LDC <https://www.ldc.upenn.edu/members/join-ldc>.



*Spring 2019 LDC Data Scholarship recipients*

Congratulations to the recipients of LDC's Spring 2019 Data Scholarships:

Colin Annand: University of Cincinnati (USA); PhD. Psychology. Colin is
awarded a copy of Switchboard-1 Release 2 for his research involving the
relationship between speech patterns and conversation content.

Si Chen: Huazhong University of Science and Technology (China); B.S.
Communication Engineering. Si is awarded a copy of ACE 2005 Multilingual
Training Corpus for his work on event extraction.

Noor-e-Hira: Fatima Jinnah Women University (Pakistan); MSc. Computer
Sciences. Noor is awarded a copy of NIST 2008 Open Machine Translation
(OpenMT) Evaluation for her research in machine translation.

Matthew Roddy: Trinity College Dublin (Ireland); Ph.D. Electrical
Engineering. Matthew is awarded copies of 2000 HUB5 English Evaluation
Speech and Transcripts for his work in spoken dialogue systems.

Ammara Zafar: Fatima Jinnah Women University (Pakistan); MSc Computer
Sciences. Ammara awarded a copy of NIST 2009 Open Machine Translation
(OpenMT) Evaluation for her research in machine translation.

For information about the program, visit the Data Scholarship page
<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.

*LDC’s new language game*

LDC’s new language game, NameThatLanguage, tests your skill at recognizing
the language spoken in short audio clips. The game includes thousands of
clips to prevent memorization and offers a real challenge that increases as
you progress. In addition to being fun, the game provides useful data on
language confusability and linguistic diversity. Game results will be
shared freely for research. New clips and more languages continue to be
added providing ongoing challenges and new research data. Help support
language research by playing! https://namethatlanguage.org

_____________________________________________________________________________



*New publications:*



(1) DEFT Chinese Committed Belief Annotation
<https://catalog.ldc.upenn.edu/LDC2019T03> was developed by LDC and
consists of approximately 83,000 tokens of Chinese discussion forum text
annotated for "committed belief," which marks the level of commitment
displayed by the author to the truth of the propositions expressed in the
text.



DARPA's Deep Exploration and Filtering of Text (DEFT) program aimed to
address remaining capability gaps in state-of-the-art natural language
processing technologies related to inference, causal relationships, and
anomaly detection. LDC supported the DEFT program by collecting, creating,
and annotating a variety of data sources.



DEFT Chinese Committed Belief Annotation is distributed via web download.



2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $1000.



*



(2) IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b
<https://catalog.ldc.upenn.edu/LDC2019S03> was developed by Appen
<http://www.appen.com/> for the IARPA (Intelligence Advanced Research
Projects Activity) Babel
<http://www.iarpa.gov/index.php/research-programs/babel> program. It
contains approximately 210 hours of Lithuanian conversational and scripted
telephone speech collected in 2013 and 2014 along with corresponding
transcripts.



The Lithuanian speech in this release represents that spoken in the
Aukštaitian and Samogitian dialect regions of Lithuania. The gender
distribution among speakers is approximately equal; speakers' ages range
from 16 years to 71 years. Calls were made using different telephones
(e.g., mobile, landline) from a variety of environments including the
street, a home or office, a public place, and inside a vehicle.



IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b is distributed
via web download.



2019 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2019
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for $25.



*



(3) Multi-Language Conversational Telephone Speech 2011 -- Arabic Group
<https://catalog.ldc.upenn.edu/LDC2019S02> was developed by LDC and is
comprised of approximately 117 hours of telephone speech in distinct
dialects of colloquial Arabic: Iraqi, Levantine and Maghrebi.



The data were collected primarily to support research and technology
evaluation in automatic language identification, and portions of these
telephone calls were used in the NIST 2011 Language Recognition Evaluation (
LRE <https://www.nist.gov/itl/iad/mig/2011-language-recognition-evaluation>).
LRE 2011 focused on language pair discrimination for 24 languages/dialects,
some of which could be considered mutually intelligible or closely related.



Multi-Language Conversational Telephone Speech 2011 -- Arabic Group is
distributed via web download.



2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $2500.



*



(4) Multilingual ATIS <https://catalog.ldc.upenn.edu/LDC2019T04> was
developed by Google Inc. and consists of 5,871 utterances from ATIS2 (
LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26)
annotated and translated into Hindi and Turkish.



The ATIS (Air Travel Information Services) collection was developed to
support the research and development of speech understanding systems.
Participants were presented with various hypothetical travel planning
scenarios and asked to solve them by interacting with partially or
completely automated ATIS systems. The resulting utterances were recorded
and transcribed. Data was collected in the early 1990s at five US sites:
Raytheon BBN, Carnegie Mellon University, MIT Laboratory for Computer
Science, National Institute for Standards and Technology, and SRI
International.



The original English utterances were manually translated into Hindi and
Turkish. This release also includes the original English utterance and the
machine translation back into English of the manual target language
utterance translation. Each utterance is annotated with named entities via
table lookup; markers include city, airline, airport names, and dates.



Multilingual ATIS is distributed via web download.



2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data at no cost.





Membership Office

Linguistic Data Consortium <http://ldc.upenn.edu>

University of Pennsylvania

T: +1-215-573-1275

E: ldc@ldc.upenn.edu
<https://maps.google.com/?q=3600+Market+St.+Suite+810+%0D%0A+Philadelphia,+PA+19104&entry=gmail&source=g>

M: 3600 Market St. Suite 810
<https://maps.google.com/?q=3600+Market+St.+Suite+810+%0D%0A+Philadelphia,+PA+19104&entry=gmail&source=g>

      Philadelphia, PA 19104
<https://maps.google.com/?q=3600+Market+St.+Suite+810+%0D%0A+Philadelphia,+PA+19104&entry=gmail&source=g>







-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc