You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Ross Gardler <rg...@apache.org> on 2005/07/05 13:11:22 UTC
Forrest-Voice proposal
Copied below is HANAX's proposal for the Google Summer of Code
programme. As I said elsewhere, it is a real shame that a proposal of
such quality had to be rejected due to the small (but very generous)
number of awards available to Apache.
I am thrilled to see that Hanax is here to help us implement this
plugin, as you will see from his proposal, whilst he has little
experience of Forrest he does have experience of VoiceXML.
--------
Goal
Apache Forrest is a publishing framework that transforms
input from various sources into a unified presentation in
one or more output formats. At present there are several
output formats that Forrest is capable of producing. Some
people are not able to access web content. Forrest is
about publishing content in many formats so it makes sense
for Forrest to allow differently abled people to have
access. In addition, visually impaired people are unable
to efficiently access not only Forrest based content but
content developed in other document formats, such as
MS Word, Open Office, Docbook, HTML etc. Since Forrest is
able to accept a wide range of input formats
this project will help address this additional need.
I'd like to add voice accessibility, so that Forrest:
1. will be able to read content via a voice synthesiser.
Reading should be clear, contrete and without redundant
informations (all metainformations (like bold text)
should be done by prosody).
2. will have capability to access content by speech.
Navigation should be intuitive.
Achieving the goal
Using X+V technology. This stands for XHTML + Voice which
allows one to create web-based voice-controlled
applications with voice output. The advantage of this
technology is a quite straightforward mapping of visual
elements (document sections, TOC...) to audio input/output
structures (VoiceXML).
Apache Forrest excels in transforming various input
source into various output sources. This project will
extend Forrest to allow it to automatically produce an
X+V document. X+V defines a sophisticated system of
structures that can be used to separate individual
semantic blocks (paragraphs, section...). These
structures map well to the existing internal document
format of Apache Forrest and hence any Apache Forrest
content will be capable of being rendered via TTS
(text-to-speech) engine. Similarly, Forrests internal
structures for site navigation can be used to create X+V
menus for voice control. This will result in the relevant
portion of the "content being either read by the text to
speech engine or displayed on the browser, as appropriate
for the individual user. In addition to producing X+V
document we need to also produce grammars for recognition.
We will need at least one global grammar for navigation
and quick access(e.g. 'go to section 4', 'go to menu').
Benefits
The succesful completion of this project will result in
an increased level of accesibility for visually impaired
and physically challenged to Apache Forrest produced
content. It is important to realise that Forrest is used
in projects such as Burrokeet which produce learning
objects and is also used to produce documentation for
a wide range of projects, including many Apache projects.
Therefore, the addition of this plugin will extend
the accessibility to the outputs of those projects.
Of longer term importance is the fact that Forrest can
accept documents in a wide range of input formats,
therefore the creation of this plugin will facilitate
the creation of a tool enabling almost any document to
be made accessible.
My approach, milestones
1. Familiarization.
Get familiar with basics how Forrest works.
Which formats are supported and how the final structure
of document is created.
2. Research.
This research should answer these questions
a) How to interpret Forrest content features such as menus,
navigation bars and lists of sections as X+V structures
(menus, fields...).
b) In order to easily access content via voice command we
need to devise an intuitive naviagtion system. Unlike
visual models, where can people access data "randomly",
in audio models we will have strict sequential access to
the document. That is, the reader cannot know what is at
end of page until they hear it. But we can predefine
some bookmarks and let the user skip to them. This is
flow control - some kind of virtual cursor. The main
challenge will be develop an intuitive mechanism to
navigate the document. I think that navigation is the
main problem in structured sites accesed via voice.
Reading of content is quite straightforward thanks to
TTS engine.
In this phase I'll try to make first draft of flow
control - how many "navigation chunks" will be used -
is document one big chunk or is there any way to separate
it into several smaller ones and navigate between them
via "goto" jumps?
Next question is to determine, which fields will use
global grammar, which helps to make some special keywords
be global and used anytime in the navigation.
c) How can semantics and events help. Using well designed
semantic tags will improve navigation logic. Also smart
use of events can be helpful to reach goal more
efficiently.
3. Implementation.
XML
a) Basic content - separation to sections, marking,
navigation within it.
b) Menus and navigation bar - mapping to sections, making
shortcuts - probably user doesn't want to say whole
title, but only specific abbr or section number or
whatever, but short.
c) List of links, connection between documents
4. Optimizing
While X+V document will be automatically generated,
it will need some optimization:
- Joining duplicate code.
- Optimizing semantics.
5. Documentation
Describing functionality, how to use,
creating some samples...
My skills
- I'm currently implementing my diploma thesis which is
completely based on X+V. I use static page and make
changes via JavaScript, which can work with VoiceXML
variables in quite comfortable way. With some "tricks"
this technology can be used not in "transaction" way as
it is usually presented (ordering by phone), but also in
quite interactive way. This "tricks" means using events
and forcing to revisit fields by clearing them accordingly
to user input. This makes applicationloop live forever and
gives user freedom in "what to say next" way.
After my experiences I believe that even structured site
can be transformed into voice-readed with quite intuitive
navigation.
I can rank here myself as advanced.
- JSGF (java speech grammar format). I prefer it against
SRGS (XML based one) because it seems more readable.
Also files are reauseable in some (java) speech
recognition engines. I'm also familiar how to work with
semantic tags in JSGF, which has very important part in
joining grammars with document.
I can rank here myself as advanced.
- I have quite good experiences with XSLT (I translated
documents in several formats into XSL:FO). My work was
concerned on creating exact visual copy of original
(with XML mapping). Knowledge of XSL will be useful
because much of the internal processing of Forrest is done
by a chain of XSL templates.
I can rank here myself as intermediate.
- I have solid knowledge and skills in Java, about 2 years
of practical use, mainly writing objects for backend
processing (not GUI). But I don't know much about related
technologies (like servlets, applets...)
I'm familiar with Eclipse, CVS and UML (as important part
of team work, I hope)
In pure java I can rank myself as advanced.
In JAVA-related technologies it's beginner.
I have discussed the extent of my Java knowledge with the
project mentor who believes my skills are adequate.
Re: Forrest-Voice proposal
Posted by Nicola Ken Barozzi <ni...@apache.org>.
Ross Gardler wrote:
> Copied below is HANAX's proposal for the Google Summer of Code
> programme. As I said elsewhere, it is a real shame that a proposal of
> such quality had to be rejected due to the small (but very generous)
> number of awards available to Apache.
>
> I am thrilled to see that Hanax is here to help us implement this
> plugin, as you will see from his proposal, whilst he has little
> experience of Forrest he does have experience of VoiceXML.
Excellent! :-)
Being a new piece of funtionality, he has all the freedom he needs to
implement it the way he wants. It's now all up to him 8-)
--
Nicola Ken Barozzi nicolaken@apache.org
- verba volant, scripta manent -
(discussions get forgotten, just code remains)
---------------------------------------------------------------------