You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by lewis john mcgibbney <le...@apache.org> on 2018/04/18 16:35:17 UTC

Fwd: FW: April 2018 Newsletter - LDC

---------- Forwarded message ----------
From: Mcgibbney, Lewis J (398M) <Le...@jpl.nasa.gov>
Date: Mon, Apr 16, 2018 at 9:57 AM
Subject: FW: April 2018 Newsletter - LDC
To: lewis john mcgibbney <le...@apache.org>






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive
<https://maps.google.com/?q=4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109&entry=gmail&source=g>

Pasadena, California 91109
<https://maps.google.com/?q=4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109&entry=gmail&source=g>
-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibbney@jpl.nasa.gov

ORCID: orcid.org/0000-0003-2185-928X



           [image: signature_334080446]



 Dare Mighty Things

*From: *Ldc-customers1 <ld...@ldc.upenn.edu> on behalf of
Penn LDC <ld...@ldc.upenn.edu>
*Date: *Friday, April 13, 2018 at 7:47 AM
*To: *Penn LDC <ld...@ldc.upenn.edu>
*Subject: *April 2018 Newsletter - LDC






*In this newsletter: *

*LDC at ICASSP 2018*

*LDC at the Philadelphia Science Carnival*

*New Publications:*

*Concretely Annotated New York Times*
<https://catalog.ldc.upenn.edu/LDC2018T12>

*H2, E2, ERK1 Children's Writing* <https://catalog.ldc.upenn.edu/LDC2018T05>

*TRAD Arabic-French Parallel Text -- Newsgroup*
<https://catalog.ldc.upenn.edu/LDC2018T13>
____________________________________________________________
__________________

*LDC at ICASSP 2018*

LDC will be exhibiting at ICASSP 2018, held this year April 15-20 in
Calgary, Canada. Stop by booth B2 to learn more about recent developments
at the Consortium and new publications.

Also, be on the lookout for the following presentations featuring LDC work:


*Enhancement and Analysis of Conversational Speech: JSALT 2017 *Tuesday,
April 17, 16:00 - 18:00
Session: Speech Analysis

*Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings*
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification


*A Novel LSTM-based Speech Preprocessor for Speaker Diarization in
Realistic Mismatch Conditions *Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification

LDC will post conference updates via our Twitter feed and Facebook page. We
hope to see you there!


*LDC at the Philadelphia Science Carnival*

LDC will share the fun of language with the community  on Saturday, April
28, with a booth at the Philadelphia Science Carnival
<https://www.fi.edu/psf/science-carnival>. Visitors will enjoy three
language-oriented educational activities that include a language
identification game and Chinese character recognition..

The Philadelphia Science Carnival is an annual event organized by
Philadelphia’s Franklin Institute to acquaint children and adults with the
joys of science.

____________________________________________________________
___________________


* New publications:*



(1) Concretely Annotated New York Times
<https://catalog.ldc.upenn.edu/LDC2018T12> was developed by Johns Hopkins
University's Human Language Technology Center of Excellence
<http://hltcoe.jhu.edu/>. It adds multiple kinds and instances of
automatically-generated syntactic, semantic, and coreference annotations to
The New York Times Annotated Corpus (LDC2008T19
<https://catalog.ldc.upenn.edu/LDC2008T19>). Concrete
<http://hltcoe.github.io/> is a schema for representing structured,
hierarchical, and overlapping linguistic annotations. This release provides
multiple tool outputs producing the same annotation types as different
annotation theories under a shared tokenization. Concretely Annotated New
York Times contains all of the 1.8 million articles in The New York Times
Annotated Corpus.

Concretely Annotated New York Times is distributed via hard drive.

2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Any organization that licensed The New York Times Annotated Corpus
(LDC2008T19) may request a copy of Concretely Annotated New York Times
(LDC2018T12) for a $250 media fee.  Non-members may license this data for
$300.


*



(2) H2, E2, ERK1 Children's Writing
<https://catalog.ldc.upenn.edu/LDC2018T05> was developed by the Cooperative
State University Baden-Württemberg, University of Education.
<http://www.dhbw.de/english/dhbw/about-us.html> It consists of
approximately 2,000 texts written over four months by 173 German school
children age six through eleven years. The data in this corpus was
collected by elementary schools in Baden Württemberg, Germany, and
digitized at the Cooperative State University during the 2016/2017 school
year. Three second, third, and fourth grade classrooms participated in the
collection. Texts were written within regular class settings. The students
were presented with a picture and were asked to write a story to describe
the picture or, if unable to write a text, to list what they saw in the
picture.

There were 173 total participants. 100 students were multilingual, and
further metadata is available for 166 of the 173 children. The following is
included for each text in the database: school week of collection; school
type; age; gender; grade/classroom; language spoken at home; and school
materials used.

LDC has also released H1 Children's Writing (LDC2016T01
<https://catalog.ldc.upenn.edu/LDC2016T01>).

H2, E2, ERK1 Children's Writing is distributed via web download.



2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for $750.



*



(3) TRAD Arabic-French Parallel Text -- Newsgroup
<https://catalog.ldc.upenn.edu/LDC2018T13> was developed by ELDA
<http://elda.org/en/> as part of the PEA-TRAD project
<http://www.elra.info/en/projects/archived-projects/pea-trad/>. It contains
French translations of a subset of approximately 10,000 Arabic words from
GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1 (LDC2009T03
<https://catalog.ldc.upenn.edu/LDC2009T03>). The PEA-TRAD project
(Translation as a Support for Document Analysis) was supported by the
French Ministry of Defense (DGA). Its purpose was to develop
speech-to-speech translation technology for multiple languages (e.g.,
Arabic, Chinese, Pashto) from a variety of domains. This release consists
of 398 segments (translations units) from 17 documents. The source data is
Arabic newsgroup text collected and translated into English by LDC for the
DARPA GALE (Global Autonomous Language Exploitation) program.

LDC has also released TRAD Chinese-French Parallel Text -- Blog (LDC2018T02
<https://catalog.ldc.upenn.edu/LDC2018T02>).

TRAD Arabic-French Parallel Text -- Newsgroup is distributed via web
download.



2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for $300.


*



Membership Office

Linguistic Data Consortium <http://ldc.upenn.edu>

University of Pennsylvania

T: +1-215-573-1275

E: ldc@ldc.upenn.edu

M: 3600 Market St. Suite 810
<https://maps.google.com/?q=3600+Market+St.+Suite+810+%0D%0A+%0D%0A+Philadelphia,+PA+19104&entry=gmail&source=g>

      Philadelphia, PA 19104
<https://maps.google.com/?q=3600+Market+St.+Suite+810+%0D%0A+%0D%0A+Philadelphia,+PA+19104&entry=gmail&source=g>







-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc