Automatically Annotated Repository of Digital Audio and Video Resources Community


Dates: October 15-16 2014
(with dinner before the conference at 7 pm on the 14th of Oct.)
Location: Glickman Conference Center, University of Texas at Austin

The workshop will focus on technology and the future of digital language archives.  This topic not only dovetails with the objectives of AARDVARC  but it also seems timely for the important archives founded to preserve documentation of endangered languages. 

In 2012, the AARDVARC (Automatically Annotated Repository of Digital Video and Audio Resources Community, BCS 1244713) project was funded  to explore the possibility of deploying advances in speech-to-text processing to address the problem of untranscribed, and therefore unavailable, documentation of understudied language.  As part of the NSF initiative devoted to "Building Data Communities," AARDVARC was designed to build an interdisciplinary community interested in developing a repository and suite of tools to  break the 'transcription bottleneck' for language data. 

Two previous AARDVARC workshops have led us to conclude that completely automated transcription of under-resourced languages is still impractical, despite several promising “bootstrapping” projects, and that these technologies remain out of reach of most individual linguists.  Nevertheless much can be accomplished through automating parts of the transcription and annotation process, e.g. alignment, phone recognition, and scene identification.  Even partial automation will facilitate the work of the analyst and appreciably increase the amount of transcribed audio and video available to researchers. 

Thus the workshop will explore the application of these technologies to already archived material.  Many archives of endangered languages documentation currently report that they are underfunded and underused.  Adding services such as automated alignment and phone recognition might boost archive use by producing more usable corpora;  and prototyping such services might be an attractive project for  potential funders. For this reason, we are inviting approximately 20 experienced archivists, documentary linguists, and computational linguists to meet in Austin and educate each other about the possibilities and pitfalls of such an enterprise. It is expected that this two-day workshop will foster collaborations and lay the groundwork for one or more relevant grant proposals.

Organizing Committee:
Doug Whalen, CUNY
Damir Cavar, Indiana U.
Malgosia Cavar, Indiana U.
Anthony Aristar, LINGUIST List (retired)
Helen Aristar-Dry, UT-Austin (Affiliated Researcher)