You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by lewis john mcgibbney <le...@apache.org> on 2019/09/25 19:48:18 UTC

Fwd: FW: [EXTERNAL] September 2019 Newsletter - LDC

FYI folks

---------- Forwarded message ---------
From: Mcgibbney, Lewis J (172B) <le...@jpl.nasa.gov>
Date: Wed, Sep 25, 2019 at 09:40
Subject: FW: [EXTERNAL] September 2019 Newsletter - LDC
To: lewis john mcgibbney <le...@apache.org>






Dr. Lewis John McGibbney Ph.D., B.Sc.(Hons)

Enterprise Search Technologist

Web and Mobile Application Development Group (172B)

Application, Consulting, Development and Engineering Section (1722)

Info & Engineering Technology Planning and Development Division (1720)
<https://www.google.com/maps/search/4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109?entry=gmail&source=g>

Jet Propulsion Laboratory
<https://www.google.com/maps/search/4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109?entry=gmail&source=g>
<https://www.google.com/maps/search/4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109?entry=gmail&source=g>

California Institute of Technology

4800 Oak Grove Drive
<https://www.google.com/maps/search/4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109?entry=gmail&source=g>

Pasadena, California 91109
<https://www.google.com/maps/search/4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109?entry=gmail&source=g>
-8099

Mail Stop : 600-172A

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibbney@jpl.nasa.gov

ORCID: orcid.org/0000-0003-2185-928X



           [image: signature_1752146123]



 Dare Mighty Things



*From: *Ldc-customers1 <ld...@ldc.upenn.edu> on behalf of
Penn LDC <ld...@ldc.upenn.edu>
*Date: *Monday, September 16, 2019 at 9:30 AM
*To: *Penn LDC <ld...@ldc.upenn.edu>
*Subject: *[EXTERNAL] September 2019 Newsletter - LDC




*In this newsletter: *

*LDC at Interspeech 2019 *
*New Publications: *CALLFRIEND Canadian French Second Edition
<https://catalog.ldc.upenn.edu/LDC2019S18>
BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training
<https://catalog.ldc.upenn.edu/LDC2019T13>
Machine Reading Phase 1 NFL Scoring Training Data
<https://catalog.ldc.upenn.edu/LDC2019T14>





* LDC at Interspeech 2019 *LDC is exhibiting at Interspeech 2019, September
15-19 in Graz, Austria. Stop by Booth F16 to learn more about recent
developments at the Consortium and new publications.
Be on the lookout for The Second DIHARD Speech Diarization Challenge
(DIHARD II)
<https://interspeech2019.org/program/special_sessions_and_challenges/#the-second-dihard-speech-diarization-challenge-dihard-ii>,
a special session co-organized by LDC, and the following presentations
featuring LDC work:


*The Second DIHARD Diarization Challenge: Dataset - task - and
baselines *Neville
Ryant, Christopher Cieri, Mark Liberman (LDC), Kenneth Church (Baidu, USA),
Alejandrina Cristia (Laboratoire de Sciences Cognitives et
Psycholinguistique), Jun Du (University of Science and Technology of
China), Sriram Ganapathy (Indian Institute of Science)
Oral Session, Tuesday September 17, 10:00 – 10:20, Hall 3


*Automatic Detection of Prosodic Focus in American English *Sunghye Cho and
Mark Liberman (LDC), Yong-cheol Lee (Cheongju University)
Poster Session, Wednesday September 18, 16:00 – 18:00, Gallery B


*Automatic detection of ASD in children using acoustic and text features
from brief natural conversations *Sunghye Cho, Mark Liberman, Neville Ryant
(LDC), Meredith Cola, Robert T. Schultz, Julia Parish-Morris (Children's
Hospital of Philadelphia)
Oral Session, Wednesday September 18, 16:45 – 17:00, Hall 3

LDC will post conference updates via our Twitter feed
<https://twitter.com/LDCupenn> and Facebook page.
<https://www.facebook.com/ldc.upenn> We hope to see you there!




*New publications:*

(1) CALLFRIEND Canadian French Second Edition
<https://catalog.ldc.upenn.edu/LDC2019S18> was developed by LDC and
consists of approximately 26 hours of unscripted telephone conversations
between native speakers of Canadian French. This second edition updates the
audio files to wav format, simplifies the directory structure, and adds
documentation and metadata. The first edition is available as CALLFRIEND
Canadian French (LDC96S48 <https://catalog.ldc.upenn.edu/LDC96S48>).

All data was collected before July 1997. Participants could speak with a
person of their choice on any topic; most called family members and
friends. All calls originated in North America. The recorded conversations
last up to 30 minutes.

CALLFRIEND Canadian French Second Edition is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $1000.

*

(2) BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training
<https://catalog.ldc.upenn.edu/LDC2019T13> was developed by LDC for the
DARPA BOLT <https://www.ldc.upenn.edu/collaborations/current-projects/bolt>
(Broad Operational Language Translation) program and consists of 388,027
words of Chinese and English parallel text enhanced with linguistic tags to
indicate word relations.

This release consists of Chinese source text message and chat conversations
collected using two methods: new collection via LDC's collection platform,
and donation of SMS and chat archives from BOLT collection participants.
The source data is released as BOLT Chinese SMS/Chat (LDC2018T15
<https://catalog.ldc.upenn.edu/LDC2018T15>).

The BOLT word alignment task was built on treebank annotation. LDC
automatically extracted Chinese source tokens, including empty
categories/traces, from word-segmented files provided by the BOLT Chinese
Treebank annotation team at Brandeis University
<http://www.cs.brandeis.edu/~clp/clpg/home.html>. The word-segmented tokens
were then used to automatically generate ctb (Chinese Treebank) alignment,
as well as tokenized for character alignment by inserting white spaces to
separate characters.

BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training is
distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $1750.

*

(3) Machine Reading Phase 1 NFL Scoring Training Data
<https://catalog.ldc.upenn.edu/LDC2019T14> was developed by LDC for use in
the DARPA (Defense Advanced Research Projects Agency) Machine Reading
program. It contains 110 U.S. NFL (National Football League) scoring source
documents and 110 standoff annotation files, manually annotated for
instances of NFL Scoring annotation categories defined with respect to a
NFL Scoring ontology.

The Machine Reading program aimed to develop automated reading systems to
bridge the gap between knowledge contained in natural language texts and
knowledge accessible to formal reasoning systems. The reading systems
designed by program participants were required to extract and reason about
facts from text in multiple domains.

The data in this release constitutes the training data for the NFL Scoring
Use Cases evaluation, which tested the sports domain by extracting
information about scoring events and game outcomes and aligning that
information with an NFL Scoring ontology.

Machine Reading Phase 1 NFL Scoring Training Data is distributed via web
download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $1000.





*


Membership Office

Linguistic Data Consortium <http://ldc.upenn.edu>

University of Pennsylvania
<https://www.google.com/maps/search/3600+Market+St.+Suite+810+%0D%0A+Philadelphia,+PA+19104?entry=gmail&source=g>

T: +1-215-573-1275

E: ldc@ldc.upenn.edu

M: 3600 Market St. Suite 810
<https://www.google.com/maps/search/3600+Market+St.+Suite+810+%0D%0A+Philadelphia,+PA+19104?entry=gmail&source=g>

      Philadelphia, PA 19104
<https://www.google.com/maps/search/3600+Market+St.+Suite+810+%0D%0A+Philadelphia,+PA+19104?entry=gmail&source=g>








-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc