CHAPTER I

A REFUTATION APPROACH TO AUTOMATIC SPEECH RECOGNITION

1.1. The problem area

Research into Automatic Speech Recognition (ASR) encompasses a wide variety of problems. These may be divided into two areas: the scientific problem of speech communication addressed from the viewpoint of building automatic devices; and the engineering problem of building such devices. However, the engineering problem requires the pertinent scientific knowledge for its solution, and the investigation of the scientific problem requires an understanding of the associated engineering issues. There is then a mutual interdependence between the two problem areas. A satisfactory research methodology in ASR should take this interdependence into account. One approach is to try and combine them within a single methodological framework (Guzy, Connolly, Edmonds & Hashim 1980, Guzy 1982, Connolly, Edmonds, Guzy, Johnson & Woodcock 1986). However, there is no sure path by which to arrive at a suitable methodology. A major notion underlying the work of this thesis is that the technology of automatic speech recognition should rest upon the basic associated science and that this in turn should rest upon a suitable metaphysics.

This thesis is concerned with the problem of designing recognition algorithms for use in "Speech Input Interfaces". The term is here used to refer to interactive devices whose states may be deliberately modified by a speaker, through suitable manipulations of his vocal tract, for the purpose of controlling some application. Though this activity is what we normally call speaking it is not, in the context of this work and except in the loosest of senses, to be thought of as the activity of "speaking". Similarly, the machine is not to be thought of as "recognising speech". The reason for making such restrictions is that both these activities involve human mental processes. Not enough is known about these for them to be accurately emulated. Thus, for the present, any process which requires their emulation is impracticable. For the same reason, at no time is the Speech Input Interface to be credited with having a "theory" or "knowledge" in the human sense of the term.

These constraints restrict the problem area to that of designing communication systems with a human vocal tract as source transducer, a Speech Input Interface as destination transducer, and a set of well defined command protocols expressed in terms of physical phenomena associated with vocal tract activity. Speech Input Interfaces are interactive systems and employ a set of well defined protocols for the presentation of information to the user. Together these constitute the communication protocols of the system.

An important fact about the interface is that it is a deterministic device whose behaviour is dependent on its program and the physical interactions which take place between it and its environment. In the absence of noise, its recognition performance will be a direct function of speaker behaviour. So long as he behaves in a manner compatible with the recognition criteria of the interface, perfect recognition results will be obtained. However, his ability to comply with those protocols will depend largely on the way in which these have been designed. In practice, the system's performance will also depend on the communication channel characteristics.

These considerations allow a number of problems pertinent to this area to be identified:

  1. The problem of detecting instances of selected general types of physical phenomena occurring in open system environments (cf. 1.4).
  2. The problem of determining the transmission characteristics of selected speech phenomena along a variable acoustic channel with noise. Knowledge of the probabilities of a successful transmission of given phenomena under various environmental conditions will inform the design of spoken command protocols with respect to the provision of error protection.
  3. The problem of selecting particular types of speech phenomena as candidate elements of an alphabet of codes from which to construct useful spoken command protocols.
  4. The problem of determining codes and code sequences which selected populations of speakers will find easy to remember and to produce correctly.
  5. The problem of determining design guidelines for the construction of spoken command protocols having the level of error protection demanded by the application. These principles must be sufficiently obvious to be grasped intuitively by a speaker such that he is able to produce those commands which satisfy the minimum error protection requirements of the application under changing environmental conditions.

Granted that these problems are ones to which solutions must be present in the conventions of ordinary speech, their investigation could be expected to throw light on the nature of those conventions. However, consideration of the majority of the above problems lies outside of the scope of the present discussion. This thesis is solely concerned with the first of the above problems: that of detecting instances of selected general types of physical phenomena occurring in open system environments (cf. 1.4).



Please send me your comments

If you include your e-mail I may reply!  

Page last modified: 11:57 Monday 7th. November 2011