(Do not delete) SRS-0xxx Template Revision Error! Reference source not found.



DM3 ASRFE Secondary Component Interface Specification



















Sign Off Authorization



Chris Chuba_________________	__________________________________	Date:	__________
Group Leader	Signature






	Date:   	12/9/2002
	Revision	1.12
	Filename: 	FES_DEFS.doc
	Author: 	Kim Davis
	Document #: 	SWS-1346


INTEL CONFIDENTIAL

TABLE OF CONTENTS
1.	OVERVIEW	4
1.1	REVISION HISTORY	4
1.2	PURPOSE	5
1.3	SCOPE	5
1.4	REFERENCES	5
2.	THEORY OF OPERATION	6
2.1	FUNCTION OF FES	6
2.2	CONTROL OF FES	6
2.3	START-UP AND INITIALIZATION - A BRIEF SUMMARY	6
2.4	'STEADY STATE' OPERATION	7
3.	USE AS A FRONT-END FOR AUTOMATIC SPEECH RECOGNITION	8
3.1	PROCESSING DIAGRAM	8
3.1.1	Echo Cancellor	8
3.1.2	Pre-Emphasis	8
3.1.3	Framing	8
3.1.4	End-Pointer	9
3.1.5	Norm	9
3.1.6	FFT	9
3.1.7	Cepstral Coefficients	9
3.1.7.1	Mel Filtering	11
3.1.7.2	Nonlinear Transformation	12
3.1.7.3	Cepstral coefficients	13
3.2	DATA FORMATS	13
3.2.1	Output Format Mask	13
3.2.2	Clumping Factor	13
3.2.3	Partial Block Transmission	13
3.2.4	GStream Header	13
3.2.5	Vendor-Specific Header	14
3.2.6	Output Data Format	14
Table 2.  Summary of Gstream Output Formats	14
4.	ENVIRONMENTAL REQUIREMENTS	16
4.1	PROCESSOR	16
4.2	MESSAGING	16
4.3	KERNEL AND PLAYER VERSION	16
4.4	INTERNAL MEMORY	16
5.	MESSAGING PROTOCOL	17
5.1	KERNEL MESSAGES	17
5.2	STANDARD COMPONENT MESSAGES	17
5.3	DIALOGIC FES-SPECIFIC MESSAGES TO INSTANCES	17
5.3.1	FES-specific messages from FEP to FES Instance	17
5.3.2	FES-specific messages from FES Instance to FEP	17
5.4	USE OF MESSAGES TO CONTROL AN FES INSTANCE	17
6.	COMPONENT OVERVIEW	19
7.	MESSAGE SET	20
7.1	MSGSTART	21
7.2	MSGSTARTCMPLT	24
7.3	MSGPAUSESTREAMING	25
7.4	MSGPAUSESTREAMINGCMPLT	26
7.5	MSGVADDETECTED	27
7.6	MSGSTOP	28
7.7	MSGSTOPCMPLT	29
7.8	MSGRESET	30
7.9	MSGRESETCMPLT	31
7.10	MSGCONVERGED	32
7.11	REFERENCE ACTIVE	33
7.12	REFERENCE ACTIVE COMPLETE	34
7.13	REFERENCE INACTIVE	35
7.14	REFERENCE INACTIVE COMPLETE	36
7.15	MSGTONEID	37
8.	ERROR CODES	38
9.	PARAMETERS	39
10.	ATTRIBUTES	43
11.	MISCELLANEOUS EQUATES	44
1. Overview
This document describes the message set used by the DM3 Automatic Speech Recognition Front-End Secondary  (FES) Component. This is the DM3 Component which controls EC and other signal processing resources as are commonly found in ASR. 
In addition to the written description of each message there is a DM3 Message Definition Language (MMDL) representation of each of the messages embedded in the description.  This allows a C header file of equates and definitions to be generated for each of the messages from this document using the MMDL Translation Utility.  This header file is then used to format messages by any component that wishes to send or receive messages to and from this Component.  
Messages are sent using the DM3 Messaging scheme. 
This DM3 Component does not communicate directly with the Host Software.
1.1 Revision History

REVISION HISTORY
Rev.
Date of Change
Description of Change
Rev Originator
0.1
12/03/98
Initial Draft
Kim davis
0.2
05/20/99
Mention MsgStopCmplt and MsgResetCmplt are integer class messages.
Ray Bailey
0.3
06/02/99
Added FES_linearLE, FES_LinearBE and changed the bit assignment for FES_Alaw for FES Data Header.
Ray Bailey
0.4 
06/18/99
Remove the FES_ equates for the Output Format Mask
Ray Bailey
0.5
9/24/99
Added iDTMF, Tone Clamping and EC Parameters
Fred Balady
0.6
10/4/99
Added additional error codes
Fred Balady
0.7
4/12/00
Voice Activity Detection (VAD)  Mode additions:
Added MsgPauseResumeStreaming, MsgPauseResumeStreamingCmplt, MsgVADDetected, BlockSize field in MsgStart, FE_VAD_Mode bit in OutMask, ParmVadTimeout, ParmSpeechSnr, ParmSpeechThresh, ParmSpeechTrig, ParmSpeechWindow.


Mark Nardiello
0.8
6/9/00
Added VAD algorithm parameters Parm_SpeechHangTime, Parm_EnergyHangTime, Parm_SpeechProbOpen, Parm_SpeechProbClose, Parm_NoiseLowThresh, Parm_NoiseHiThresh, and Parm_SVAD.
Mark Nardiello
0.9
7/18/00
Added blockSize field to FES_MsgStartCmplt to be returned to FEP.
Mark Nardiello
1.0
8/30/00
Added FE_VADAlgoEnable mode in order to support VAD notification only mode (no pre-speech buffering)
Also added VAD detection mode VADDetected message. This will notify FEP if start of speech or end of speech is detected.
Murat Eren
1.1
9/4/00
Added DelayCompensation field to Start message.
Murat Eren
1.2
9/8/00
Changed MsgPauseResumeStreaming and MsgPauseResumeStreamingCmplt to MsgPauseStreaming and MsgPauseStreamingCmplt since resume streaming is done as soon as VAD detects speech. Also removed pauseResume parameter.
Murat Eren
1.3
9/19/00
Added ParmVadOffTimeout  ParmSpeechOffSnr ParmSpeechOffThresh ParmSpeechOffTrig ParmSpeechOffWindow.
Mark Nardiello
1.4
11/17/00
Changed Parm_SpeechThresh default value-40dB to its corresponding linear value 83886. Also changed Parm_SpeechSNR default value to 0 (When VAD is in adaptive SNR mode it will use its internal default value which is -12dB)
Murat Eren
1.5
12/04/00
Added dynamic Reference C-Stream handling messages:
ReferenceActive, ReferenceActiveCmplt,
ReferenceInactive, ReferenceInactiveCmplt.
Murat Eren
1.6
12/12/00
Moved ReferenceActive, ReferenceActiveCmplt,
ReferenceInactive, ReferenceInactiveCmplt messages to the end for backward compatibility.
Murat Eren
1.7
02/08/01
Changed definition of iDTMF_TC and iDTMF parameters usage. This change is added to enable/disable Tone Clamping during runtime (check parameter definitions for how to use).
Murat Eren
1.8
05/22/01
Added ToneID reporting parameter and sending ToneID message to FEP.
Murat Eren
1.9 
10/24/01
Added  parameter to support New EC. White Noise Gain, APIFILTERWINDOW, ERLINIT and ECMODE ( OLDEC / NEW EC-16 / NEW EC )
Himanshu Patel
1.10
02/04/02
Enabled NLP_ON
Himanshu Patel
1.11
04/06/02
Added parameter to support enable/disable of dbgstreams
Ranjan
1.12
12/09/02
Added parameters to support Silence Compress Streaming and Update Header to accommodate SCS Last Block Flag, Initial Data Flag and Time Stamp.
Himanshu Patel

1.2 Purpose
The purpose of this document is to describe the message interface between the Dialogic Standard Front-End Primary Component (FEP) and the Standard FES Component.  Also mentioned here, but described in detail elsewhere1, are the standard messages supported by all components.  
This document will be used by the developers of the DM3 Standard FES Component.
1.3 Scope
This document defines all of the messages that the Standard FES Components may send and receive. It also defines all of the parameters that apply to these Components.
1.4 References
SWS-0047				DM3 Resource Firmware SAS by Luke Kiernan
SWS-0048				DM3 Kernel Architecture Specification by Steve Magnell
SWS-0049				DM3 Kernel API  by Steve Magnell
SWS-0138				DM3 MDL Specification by Steve Magnell
SWS-0140				DM3 Recorder Component Interface Specification by Luke Kiernan
SWS-0143				Standard Interface Specification for DM3 Components by Luke Kiernan
2. Theory of Operation
2.1 Function of FES
An ASR Front-End would typically be used to do echo cancellation as well as other signal processing operations typically found in initial processing stages of Automatic Speech Recognition.
2.2 Control of FES
A mechanism must be provided to inform the FES where to get its input data, where to put its output data and what processing to perform on the data.  Messages sent to the FES provide this mechanism.  They specify when the FES instance starts and stops 'processing'.
These messages will originate in the FEP Component.  The FEP Component will, in turn, receive some control information from the host.  The Host will not, at this stage of development, communicate directly with the FES Component resident on the DSP.
2.3 Start-up and Initialization - a brief summary
Download: An FES starts life as a component of a module, a part of a component of a module, or, possibly, collection of modules (some of which may be 'passive', i.e., libraries).  This code, referred hereafter simply as the 'module', must be placed into the DSP's local memory such that its entry point is known.  This process is called downloading.
Component Initialization: An FES module that is implemented as a component on a signal processor will have a task associated with it.  This task must be 'started', allowing for some component-level initialization.  The component will also register itself with the resource manager giving it a list of the components attributes and will, in response, receive its component address.  
Instance creation: The component is now ready to receive messages.  It will probably receive one or more set parameter messages.  These may specify the number of instances to create, modify default volume settings, etc.  Eventually a Std_MsgInit message is sent to the component.  It then creates as many instances as are specified by the Std_ParmInstNum parameter, allocating any memory needed for control structures and initializing these structures.  

Fig. 4	Message flow during FES initialization.
For a more detailed description of this phase of operation refer DM3 Kernel Architecture Specification.
2.4 'Steady State' operation
Once the FES instance has been created and initialized it is ready to receive messages including 'start processing'.  The remainder of this document is concerned with the control of the FES during this steady state phase.
3. Use as a Front-End for Automatic Speech Recognition 
3.1 Processing Diagram 
The ASR Front-End may be described by the following block diagram:
















The function of each of these blocks may be described as follows:

3.1.1 Echo Cancellor
Performs LMS filtering of the Cstream input based on a user-supplied reference signal.

3.1.2 Pre-Emphasis
Simple one-pole pre-emphasis filter as given by:


The Pre-emphasis filter is parameterized as follows:


Parameter
Meaning
Type
Enable pre-emphasis
true = enabled
boolean

3.1.3 Framing
Some smoothing in the frequency domain may be achieved by overlapping the temporal data.  The framing overlap size, M, expresses the number of samples after which the next frame will begin.  This causes a fixed overlap in the time domain. 

If M is the overlap size in number of samples, we have for the K-th frame:


Parameter
Meaning
Type
Frame Overlap, M
Number of Samples
Uint16

3.1.4 End-Pointer
The purpose of an End-pointer is to isolate continuous PCM data into utterances.  This is accomplished by performing start of Speech and end-of-Speech detection.  For the purposes of parameterization we consider a generic threshold detector of the form:
 
Hon(n)=  1, S(n)/W(n) greater than tu  

               0, otherwise

Hoff(n)= 1, S(n)/W(n) less than tl

	0, otherwise

Where S(n) is an estimate of the signal energy, W(n) is an estimate of the background noise energy, and tu and tl  are the upper and lower thresholds, respectively.

This means that we use Hon(n) as the start-of-speech end-point detector, and Hoff(n) as the end-of-speech end-point detector.

S(n) is typically estimated over an integration time, T, which we may regard as some number of samples.

This suggests the following as a possible parameterization of the end-point detector:

Parameter
Units
Type
Upper threshold, tu
dB
Uint8
Lower threshold, tl
dB
Uint8
Integration Time
samples
Uint16

3.1.5 Norm
The data is normalized to unity magnitude using the equation:
  
3.1.6 FFT

The FFT is a standard radix-2 algorithm, with an optional windowing function applied.  For this discussion we consider only Hamming, Hanning, and Blackman windows.  For purposes of an ASR Front-End, sample sizes smaller than 128 or larger than 512 are not useful.  Therefore, the FFT can be parameterized as follows:

Parameter
Possible Values
Type
FFT size
128, 256, 512
Uint16
window
Hamming, Hanning, Blackman
Uint8


3.1.7 Cepstral Coefficients

Here, we consider the Mel Cepstrum as defined by the following equations.  Note that in the following the number of CEP's is taken as 13, as in the Aurora Project white paper.  For the purposes of the DM3 FES resource, this will be left as a parameter.  Thus, for 13 cepstral coefficients we have:


Where s(n) is the input to the FFT block, FFTL is the block length (128, 256 or 512 samples), and bink is the output of the FFT.

3.1.7.1 Mel Filtering
The full frequency band is divided into 24 channels. The center bin frequencies of the channels are


	

The output of the mel filter is the weighted sum of the FFT bins in each band. Triangular, half-overlapped windowing is used as follows,


3.1.7.2 Nonlinear Transformation
The output of mel filtering is subjected to a logarithm function (natural logarithm).

	.
3.1.7.3 Cepstral coefficients
13 cepstral coefficients are calculated from the output of the Nonlinear Transformation block.

	.
This suggests that the Cepstral block may be parameterized as follows:

Parameter
Range
Type
Number of CEP's
Less than 18
Uint8
CEP center frequencies
0x0 = default
0x1 = vendor-specified
boolean




3.2 Data Formats

The FES resource has the ability to turn on or off any of the above functions.  Clearly, the format of the output will change as a function of which functional blocks are selected.  In this section we describe the size and format of output data in terms of which functions are enabled/disabled. 

3.2.1 Output Format Mask
An Output Format Mask will be transmitted in-band with each block.  The meaning and sense of this mask are described below in Table 1, FES Data Header.
3.2.2 Clumping Factor
The Clumping Factor is a constant for a given Gstream that describes how many samples are accumulated before sending data.  In terms of kernel calls, this correspond to the number of samples buffered between calls to qGStreamAdvanceWrite().  In the following the clumping factor was derived from a fixed Gstream latency of 64 ms.  This represents a reasonable compromise between minimizing latency, while also minimizing the number of events sent to the host.
From the reader's viewpoint the qGStreamRead() should 

3.2.3 Partial Block Transmission
A particular case of interest is that of end-pointed samples.  In this case the size of the packet varies depending on the utterance.  In general, this will not line up with the clump size, which we select apriori (during qGStreamOpen() ).  At the same time it is important that partial blocks not experience delays greater than the 64 ms latency.  In this case the remainder of the partial block is zero-filled, and transmitted in the 64 ms clumping period. 

3.2.4 GStream Header
When end-pointed data is transmitted the last block will in general be a partial block.  Also, the packet-size of end-pointed data will depend on the utterance, and so will not be known apriori.  This suggests that we will need to mark blocks with an in-band header.  So that the data format is independent of output parameter, we will add this same header to all blocks.  The proposed header is as follows.  Note that we are leaving some fields reserved for future use:


Table 1.  FES Data Header

Bit  31 
Bits 30-29
Bits 28 - 19
Bits 18 - 16
Bits 15 - 8
Bits 7 - 0
0x1000= GStream Output Enable
Output Format:
0x00 = ulaw
0x01 = Alaw 
0x02 = linear LE
0x03 = linear BE
Output Features Mask:
'1' = enabled 
0x1 = EC
0x2= Pre-Emphasis
0x4= Framing
0x8= End-Pointing
0x10 = Normalization
0x20 = FFT
0x40= CEP's
0x200=Companding
0x300=DTMF Tone Clamping
Block Descriptor:
0x1 = Last Block
0x2 =  Intermediate Block  for SCS
0x4 = Initial Stream Block
'1'= True

Samples in this block - Hi Byte
Samples in this block  - Low Byte 


Bits 63 - 56
Bits 55 - 52
Bits 51 - 40
Bits 39 - 32


Reserved
Reserved
Time Stamp in this block - Hi Byte
Time Stamp in this block  - Low Byte





3.2.5 Vendor-Specific Header
An additional two bytes are reserved for use by vendors.  This brings the total reserved space at the beginning of each block to 8 bytes.

3.2.6 Output Data Format
The above descriptions of data formats and block header lead us to a table where can express Clumping Factor as a function of the Output Format Mask's flags.  Bits in the Output Format Mask that have no effect on the output format are marked with an X (Don't Care):

Table 2.  Summary of Gstream Output Formats

End-Pointer
FFT
CEP's
Companding
Resulting Format
Clumping Factor (bytes)
No
No
No
Yes
512 sample u-law companded blocks (8-bit)
512 + 8
No
No
No
No
512 sample linear blocks (8-bit)
1024 + 8
No
X
Yes
X
Linear blocks (little-endian format) of NCEP coefficents,
NCEP * 1024/FFTL + 8
No
Yes
No
X
512 sample complex linear (16-bit) blocks.
Each linear sample in Little-endian format, real coefficent followed by complex coefficient.
2048 + 8
Yes
No
No
Yes
Companded (8-bit) end-pointed data..  The number of blocks will vary depending on the length of an utterance.
512 + 8 
Yes
No
No
No
Companded (8-bit) end-pointed data..  The number of blocks will vary depending on the length of an utterance.
1024 + 8 
Yes
X
Yes

Linear (16-bit) end-pointed coefficients in Little-endian format. Each block will contain sets of NCEP coefficients, with the coefficents ordered 1 through NCEP.  The number of blocks will be a function of the given utterance.
NCEP * 1024/FFTL  + 8 
Yes
Yes
No

Complex linear FFT coefficients consisting of 512-point blocks.  The complex linear data is formatted as above.  The number of blocks will be a function of the given utterance.
2048  + 8
4. Environmental Requirements
4.1 Processor
The FES will execute on the signal processor.  It will have available all kernel services provided for the SP kernel.  These are described in document SWS-0049, DM3 Kernel API .
4.2 Messaging
The FES receives and transmits standard DM3 messages. The FEP will be the only Component to send messages to the FES.  That is, all messages from the host or other components containing control information for the FES (e.g., parameters) must be addressed to the primary component, the FEP.
4.3 Kernel and Player Version

4.4 Internal Memory

5. Messaging Protocol
5.1 Kernel Messages
All messages described in the DM3 Kernel API are supported by this standard component.
5.2 Standard Component Messages
All messages defined in Standard Interface Specification for DM3 Components are supported by this component.
5.3 Dialogic FES-specific Messages to Instances
All FES instances need to be given the following information:
* PCM input stream identifier,
* Host output stream identifier,
* when to start echo cancellation,
* when to stop echo cancellation processing,
The FES also needs to provide information to the FEP component including:
* parameter values,
* indication of completion of certain activities (events such as process complete)
Some of this control can be implemented using standard messages, e.g., set parameters or enable events.  Others require messages that are specific, or proprietary, to this component.  These proprietary messages are described below.
5.3.1 FES-specific messages from FEP to FES Instance

Message
MsgStart
MsgStop

5.3.2 FES-specific messages from FES Instance to FEP


Message
MsgStartCmplt
MsgStopCmplt 



5.4 Use of messages to control an FES instance
Once the FES instance has been initialized and registered (described above) it is ready to accept messages.  The first message the Dialogic Standard FES will accept is an Allocate Instance (InstAlloc).  The FES instance will, in response, allocate any data buffers that it needs and return an Allocate Instance Complete message indicating whether it was successful.  
The existence of the FES Instance is now known to the Resource Manager.  It is now possible to get messages from other components.  For the Dialogic Standard FES the next message expected is a SetParms, to instruct the instance to set its parameters to specified values. However, it is possible for a Start to follow the InstAlloc message (if the default value of the parameters is acceptable) or a Free Instance (deallocate instance) might be received if, for example, a call is terminated prior to start of process.
To aid in the understanding of the role of the remaining messages a state transition diagram is shown below.  A basic FES may be in any of  four states: deallocated, allocated but stopped ('IDLE'), started ('PROCESSING') or PAUSED (not , processing, but maintaining data buffers).  

For the most part the FES is driven from one state to another by the receipt of command messages.  For example, if the FES is in the IDLE state and receives a legal MsgStart command, it will make a transition into the PROCESSING state.
Not all messages result in state transitions.  Specifically commands to 'set' or 'get' do not result in changes of state.  
The FES instance may receive a standard Exit message in any state.  It will perform those housekeeping functions associated with the receipt of a Stop followed by an deallocate (qCompFree).
6. Component Overview
TBD.

7. Message Set
The following messages are defined using the DM3 Message Definition Language (MMDL) with accompanying textual descriptions.  The output header file from this message description will be fes_defs.H  

Message directional modifiers (i.e., in and out) are given with respect to the FES Component.

.file	"fes_defs.h"
.author	"Kim davis"
.version	"Revision A0"

<< 
MMDL comment: This file describes the list of  Standard Dialogic FES Messages.
>>
The DM3 Kernel definitions must not be used as definitions for any Component level information.

.sysdefs	"mercdefs.ext"

<< 
Standard Message Definitions
>>
.predefs "stddefs.ext"

For now we define the range for FES messages, error codes and parameters here. In future this may be automatically generated by the MMDL based on the Component Type.

.min	.message	0x2C00
.max	.message	0x2CFF
.min	.parameter	0x2C00
.max	.parameter	0x2CFF
.min	.error		0x2C00
.max	.error		0x2CFF

.component FES=0x2C
{
.uses
{
}
.defines
{
<< FES specific messages defined in following sections >>
7.1 
MsgStart

		
.message .in MsgStart
	{
	.uint24 EchoStreamID
	.uint24 RefStreamID
	.uint24 OutCStreamID
	.uint24 OutGStreamID
	.uint24 EchoLaw
	.uint24 RefLaw
	.uint24 OutCLaw
	.uint24 OutC {Off=0,On}
	.uint24 OutMask
	.uint24 BlockSize
	.uint24 DelayCompensation
	}
	
n Description
	Target
This message may be sent to an FES instance that is in the IDLE state.  This is an integer class message.
	Interpretation
This message is interpreted as follows:
Open input streams, output streams.  If any can not be opened the FES instance returns a standard error message and remains in the IDLE state.
The EchoStreamID is the receive stream where echo is embeded.
The RefStreamID is the reference stream for echo cancellation.
The OutCStreamID is the Echo cancelled output data to C stream.
The OutGStreamID is the Echo cancelled output data to G stream.
The EchoLaw specifies whether a-law or (-law data is to be received from front end and converted to linear data by MMA, this value is used to open the C-Stream.
The RefLaw specifies whether a-law or (-law data is to be received from front end and converted to linear data by MMA, this value is used to open the C-Stream.  Valid settings are QENCODING_ALAW or QENCODING_MULAW.
The OutCLaw specifies whether a-law or (-law data is to be received from front end and converted to linear data by MMA. Valid settings are QENCODING_ALAW or QENCODING_MULAW.
The OutC if set to FES_MsgStart_OutC_On then linear data will be written to the C-Stream after it is opened.
The OutMask specifies the format of GStream output.  A value of FE_GStreamEnable implies that GStream output be made available.  A full description of the Output Format Mask and how it relates to the formatting of GStreams is contained in 3.2.1.  For convenience, we repeat the bit descriptions here:
FES Define
Meaning
OutMask Value
FE_EC FE_PreEmphasis FE_Framing
FE_EndPoint FE_Norm
FE_FFT
FE_CEPs 
FE_VADModeEnable

FE_VADAlgoEnable
FE_Companding
FE_Alaw
FE_LinearLE
FE_LinearBE
FE_GstreamEnable 
EC
Pre-Emphasis
Framing
End-Pointing
Normalization
FFT
CEP's
Enable VAD Initiated Data Streaming To Host
VAD Notification only
Companding
A-law output
Linear (Little Endian)
Linear (Big Endian)
GStream Output Enable 

0x1 
0x2
0x4
0x8
0x10 
0x20 
0x40
0x80

0x100
0x200
0x400
0x800
0xC00
0x1000



BlockSize	The Block Size of the output of IBI voice data for a channel as seen by the host per transfer.  LSB = 1 byte.  If Blocksize = 0, use BlockSize = Clumping Factor which is specified in Table 2.
Note:  When the MsgStart message is sent to the FES with FE_VADModeEnable enabled, the EC will start in paused mode.  In this mode data streaming to the host application will be turned off.  A pre-speech buffer will be started which continuously stores 250 msec of pre-speech.  
Only when FES Component detects speech, will FES streaming action occur.  The internal mode will change to streaming mode.  In this mode the host will receive the detected speech.  The pre-speech data will be inserted at the beginning of the data stream.  The Pre-Speech buffer will become inactive.  The FES component will control pre-speech buffering and streaming to the host.  The FES Component will also control the voice detection (VAD).
When FE_VADModeEnable is not active, the FES will stream data in its normal fashion.
When FE_VADAlgoEnable is set only VAD notifications will be send to FEP and no pre-speech buffering will be done. 
The DelayCompensation specifies how many samples reference signal must delayed.
Upon entering the PROCESS state the FES instance issues a MsgStartCmplt to the source of MsgStart. 
	Return message
Returns MsgStartCmplt if process is successfully started.  Returns Std_MsgError message if any error condition prevents the execution of the MsgStart command.
	Error conditions
	FES instance already started
	FES instance paused
	unable to open specified PCM input streams
	unable to open specified output streams
	internal error: illegal parameter or state encountered
	Cautions
This is an integer class message.  This message type number will be added to the define QMSGCLASS_XFER_INTS in qmsg.h.  The message body will be extracted with a pointer with a C structure definition and not by a _get macro with qMsgVarFieldGet( ).
7.2 
MsgStartCmplt

		
. message .out	MsgStartCmplt
{
	.uint32	blockSize
}

	
n Description
	Target
This message may be sent to the source of the MsgStart command.
	Interpretation
This message is interpreted to mean that the MsgStart message was received by the source address and was successfully executed (i.e., the processing of data has begun.)  This is an integer class message.
	Return message
	blockSize - This is the size of the block that is written to the G-Stream.  This includes the header and this is the   number  of bytes that the reader must request on G-Streams to be guaranteed that the read operation will be at the start of the data header.
	Error conditions
None.
	Cautions
None.
7.3 MsgPauseStreaming

		
.message .in MsgPauseStreaming
		

n Description
	Target
This message may be sent to an FES instance when in VAD Mode.  This is an integer class message.
	Interpretation
This message is interpreted as follows:
This message will initially be sent sometime after the FES start message (MsgStart) has been sent. Since VAD Mode has been indicated in MsgStart, the FES will automatically come up in streaming paused mode if VAD Mode enabled. In this mode initially FES will not stream data to the host.  The FES will however start the Pre-Speech buffer to continuously pre-buffer 250 msec of EC data.
The message MsgPauseStreaming will be sent to the FEP to interrupt host voice data streaming in the event of false voice detections. In this case the host sends the FEP a message to pause streaming.  The FEP sends MsgPauseStreaming. This will cause the FES to go into streaming paused mode. In this mode the FES will not stream data to the host.  The FES will however restart the Pre-Speech buffer to pre-buffer 250 msec of EC data.
	Return message
Returns MsgPauseStreamingCmplt upon getting this message successfuly.
	Error conditions
	none
	Cautions
This is an integer class message.  This message type number will be added to the define QMSGCLASS_XFER_INTS in qmsg.h.  The message body will be extracted with a pointer with a C structure definition and not by a _get macro with qMsgVarFieldGet( ).

7.4 MsgPauseStreamingCmplt

		
.message .out MsgPauseStreamingCmplt
	
n Description
	Target
This message may be sent to the source of the MsgPauseStreaming command.
	Interpretation
This message is interpreted to mean that the MsgPauseStreaming message was received by the source address and was successfully executed.
	Return message
None.
	Error conditions
None.
	Cautions
None.
7.5 MsgVadDetected

		
.message .out MsgVadDetected
	{
	.uint24 SpeechOnOff
	}


		

n Description
	Target
This message may be sent to an FEP instance when in VAD Mode.
	Interpretation
This message is interpreted as follows:
This message will indicate that the Voice Activity Detector resident in the FES Component has detected a    speech utterance.  The message will be sent to the FEP to notify the CP that speech is being spoken.  The FEP will in turn send the message MsgPauseResumeStreaming to the FES component to start (resume) streaming data. 
	Return message
Returns VAD Detection mode. VAD will notify FEP in 2 modes: Speech On and Speech Off. Speech On is start of speech and Speech Off will be end of speech.
	Error conditions
None.
	Cautions
This is an integer class message.  This message type number will be added to the define QMSGCLASS_XFER_INTS in qmsg.h.  The message body will be extracted with a pointer with a C structure definition and not by a _get macro with qMsgVarFieldGet( ).

				

7.6 MsgStop



		
.message .in MsgStop

	
n Description
	Target
This message may be sent to an FES instance that is in the PROCESS state.
	Interpretation
This message is interpreted to mean:
1. Close input streams and output streams,
2. upon receipt of kernel message indicating successful completion of return of streams, make state transition into IDLE state,
3. send MsgStopCmplt message to source of MsgStop,
	Return message
Returns MsgStopCmplt if Process is successfully stopped.  Returns Std_MsgError message if any error condition prevents the execution of the MsgStop command.
	Error conditions
	FES instance already stopped
	unable to deallocate resources
	Cautions
None.
7.7 MsgStopCmplt


		
.message .out MsgStopCmplt
{
.uint24 Rsn {MsgStop=0, ErrStop=1, BrknPipe=2}
}

n Description
	Target
This message may be sent to the source of the MsgStop message to acknowledge receipt of message and to indicate that MsgStop command has been executed, returning the FES instance to the IDLE state.. This is an integer class message.
	Interpretation
This message is interpreted to mean 
* The MsgStop message was received,
* The input streams were closed.
* Any resources allocated upon receipt of MsgStart were deallocated.
The reason that the instance stopped FEP output stream, the number of bytes processed and the time processed in millisecond is included in the message body.
	Return message
None.
	Error conditions
None.
	Cautions
None.
7.8 MsgReset
. message .in	MsgReset

n Description - Command
<<
This message causes the EC state to be reset to its initial value.  This will not reset any parameters or interfere with the output being generated if such output is already in progress.
>>
n Errors
None.
n Related Messages
MsgResetCmplt
n Cautions
None.


7.9 MsgResetCmplt

. message .out	MsgResetCmplt

n Description - Command
<<
This message acknowledges the processing if the MsgReset message. This is an integer class message.
>>
n Errors
None.
n Related Messages
MsgReset
n Cautions
None.

7.10 MsgConverged

. message .out	MsgConverged

n Description - Command
<<
This message notifies the primary Component Instance that Echo cancellation has converged after FES_MsgStart processing.  This message is gated by the setting of the FES_Parm_Converged parameter.
>>
n Errors
None.
n Related Messages
MsgReset
n Cautions
None.
7.11 Reference Active

		
.message .in ReferenceActive
	{
	.uint24 refCStream
	.uint24 refEncoding
	.uint24 delayCompensation
	}


		

n Description
	Target
This message may be sent to an FES instance that is in ACTIVE state.
	Interpretation
	Return message
	Error conditions
	Cautions
7.12 Reference Active Complete

		
.message .out ReferenceActiveCmplt


		

n Description
	Target
This message may be sent to the source of the ReferenceActive message to indicate that command has been executed
	Interpretation
This message is interpreted as follows:
	Return message
	Error conditions
	Cautions
7.13 Reference Inactive

		
.message .in ReferenceInActive


		

n Description
	Target
This message may be sent to the source of the ReferenceActive message to indicate that command has been executed
	Interpretation
This message is interpreted as follows:
	Return message
	Error conditions
	Cautions
7.14 Reference Inactive Complete

		
.message .out ReferenceInActiveCmplt


		

n Description
	Target
This message may be sent to the source of the ReferenceInActive message to indicate that command has been executed.
	Interpretation
This message is interpreted as follows:
	Return message
	Error conditions
	Cautions


7.15 MsgToneID

		
.message .out MsgToneID
	{
	.uint24 ToneID
	}


		

n Description
	Target
This message may be sent to an FEP instance when ToneID reporting is enabled in EFES.
	Interpretation
If ToneIDReporting is enabled in iDTMF module this message will include ToneID.
	Return message
	Error conditions
None.
	Cautions
This is an integer class message.  This message type number will be added to the define QMSGCLASS_XFER_INTS in qmsg.h.  The message body will be extracted with a pointer with a C structure definition and not by a _get macro with qMsgVarFieldGet( ).

				


8. Error Codes 
The following is a list of error codes specific to the FES that are returned in the Error code field of the Std_MsgError message.
<< 
Standard Component Error Code Definitions
>>
		
.error	ErrEchoStmOpn		<< an error while opening the input PCM ECHO streams  >>
.error	ErrRefStmOpn		<< an error while opening the input PCM REF streams  >>
.error	ErrOutStmOpn		<< an error while opening the C-Stream output stream >>
.error	ErrHOStmOpn		<< an error while opening the host output stream >>
.error	ErrHOStmOvflw		<< host data not removed in time >>
.error	ErrBadStmData		<< some illegal input value, e.g. linear value not corresponding to a PCM value >>
.error	ErrGMAlloc		<< unable to allocate global memory buffer >>
.error	ErrLMAlloc		<< unable to allocate local memory buffer >>
.error	ErrLMPoolAlloc		<< unable to allocate local memory pool >>
.error	ErrBadBSpSize		<< backspace specified exceeds ParmHstOBufSiz >>
.error ErrInRecState		<< inappropriate message: FES in process state >>
.error ErrInPauseState		<< inappropriate message: FES in pause state >>
.error ErrInIdleState		<< inappropriate message: FES in idle state >>
.error ErrRefActive		<< Consecutive ReferenceActive Message >>
.error ErrRefInactive		<< Consecutive ReferenceInactive Message >>
.error ErrInternalError		<< debug 'assert' >>


	
9. Parameters
The following is a list of parameters used with the FES component.
Parameter access is classified as read only (R), write only (W) and as read/write (R/W).

Parameter level is classified as Component (C), Instance (I), or Component and Instance (C/I).

<< 
FES Parameter Definitions
>>
Parameter type and name
Access
Level
Default
Description
.parameter .uint24 Parm_EC
R/W
C/I
{def=1}
<< Enable or Disable EC >>
.parameter .uint24 Parm_NLP
R/W
C/I
{def=1}
<< Enable or Disable EC's NLP.
Default is 1 for Non-CSP
Set 0 through API for CSP >>
.parameter .uint24 Parm_ADAPT_COEFFS
R/W
C/I
{def=0}
<<0 - The EC controls adaptation,  1 - always adapt, 2 - freeze adaptation.  The default is FES_Parm_ADAPT_COEFFS_def>>
.parameter .uint24 Parm_FILTER_LENGTH
R/W
C/I
{def=128}
<<The echo tail length in 125 micros units.  The default value is FES_Parm_FILTER_LENGTH_def>>
.parameter .uint24 Parm_MU_CONST
R/W
C/I
{def=0}
<<If set to 0 then the EC Component will determine this value based on the filter length.  If this is non-zero then this will explicitly set the adaptation setp size of the filter.  The default value is FES_Parm_MU_CONST_def.
Caution: this parameter only takes affect on the subsequent EC_MsgStart message.>>
.parameter .uint24 Parm_MU_CONST_MAX
R/W
C/I
{def=0}
<<If set to 0 then the EC Component will determine this value.  The default setting is FES_Parm_MU_CONST_MAX_def>>
.parameter .uint24 Parm_CONVERGED
R/W
C/I
{Off=0,On}
<<If set to 1 then the EC Component will send a FES_MsgConverged message to the message to the primary Component.  The default setting is FES_Parm_CONVERGED_Off>>
.parameter .uint24 Parm_Frame_Overlap
R/W
C/I
{Off}
<<Describes the amount of temporal frame overlap as in 3.1.3. >>
.parameter .uint24 Parm_CEP_Center_Freqs
R/W
C/I
{def=Off}
<< If set to a value other than Parm_CEP_CENTER_DEF use vendor-specified center freq's.  Otherwise use default freq's as described in 3.1.7.>>
.parameter .uint24 Parm_NCEP
R/W
C/I
{def=13}
<< The NCEP value>>
.parameter .uint24 Parm_FFTL
R/W
C/I
{def=256}
<< The FFTL value can be 256 or 512 >>
.parameter .uint24 Parm_iDTMF
R/W
C/I
{def=0}
<< Enable or Disable iDTMF Detection during runtime. Effective only when iDTMF_TC is enabled at the startup.>>
.parameter .uint24 Parm_iDTMF_TC
R/W
C/I
{def=0}
<< If this is enabled at the start time, Tone Clamping algorithm is initialized. After initialization ends Parm_iDTMF is used to enable/disable Tone clamping feature during runtime.>>
.parameter .uint24 Parm_VadTimeout
R/W
C/I
{def=100}
<< Duration of timeout in 10 ms increments. >>
.parameter .uint24 Parm_SpeechSnr
R/W
C/I
{def=0}
<< Reciprocal of SNR for echo cancellor. Default value 0 means adaptive SNR calculation will be used in VAD. (2107123 = -12dB) >>
.parameter .uint24 Parm_SpeechThresh
R/W
C/I
{def=83886}
<< Minimum energy level to trigger VAD. (83886=-40dB) >>
.parameter .uint24 Parm_SpeechTrig
R/W
C/I
{def=10}
<< Number of 12 ms blocks whose energy is greater than ParmSpeechThresh to trigger VAD. >>
.parameter .uint24 Parm_SpeechWindow
R/W
C/I
{def=10}
<< Number of 12 ms blocks which must be surveyed to detect speech energy. >>
.parameter .uint24 Parm_SpeechHangTime
R/W
C/I
{def=5}
<< Hangover time that controls how fast a signal is declared as silence after the end of words. It should be used to control the probability of clipping vs. the probability of speech misdetection in the trailing periods of speech signals (in msecs). >>
.parameter .uint24 Parm_EnergyHangTime
R/W
C/I
{def=30}
<< Same as SpeechHang but used only for the Energy-based detection unit alone. It does not affect the overall speech detection and therefore is useful only if the Energy-based Speech Flag is used rather than the overall speech flag. (in msecs). >>
.parameter .uint24 Parm_SpeechProbOpen
R/W
C/I
{def=6710886}
<< Controls the overall sensitivity to speech. In other words, it controls the threshold for the estimated probability of speech required for the algorithm to declare the currently declared as silence signal as speech. It primarily affects the sensitivity for detection of speech (leading time of speech) vs. speech misdetection. (0.0-1.0). >>
.parameter .uint24 Parm_SpeechProbClose
R/W
C/I
{def=2097152}
<< Controls the overall sensitivity to silence. In other words, it controls the threshold for the estimated probability of speech required for the algorithm to declare the currently declared as speech signal as silence. It primarily affects the sensitivity for detection of silence (trailing time of speech) vs. silence misdetection. (0.0-1.0). >>
.parameter .uint24 Parm_NoiseLowThresh
R/W
C/I
{def=4717}
<< The lower threshold for background noise level estimate. It can be used to ignore small amplitude signals independently of their nature - speech or silence. (0.0-1.0). >>
.parameter .uint24 Parm_NoiseHiThresh
R/W
C/I
{def=838861}
<< The lower threshold for background noise level estimate. It can be used to ignore small amplitude signals independently of their nature - speech or silence. (0.0-1.0). >>
.parameter .uint24 Parm_SVAD
R/W
C/I
{def=0}
<< If reset, use Speech statistics to perform VAD. Otherwise, use energy threshold. >>
.parameter .uint24 Parm_VadOffTimeout
R/W
C/I
{def=0}
Speech Off Duration of timeout in 10 ms increments.
.parameter .uint24Parm_SpeechOffSnr
R/W
C/I
{def=0}
Speech Off Reciprocal of SNR for echo cancellor
.parameter .uint24 Parm_SpeechOffThresh
R/W
C/I
{def=0}
Speech Off Minimum energy level to trigger VAD.
.parameter .uint24 Parm_SpeechOffTrig
R/W
C/I
{def=0}
Speech Off Number of 12 ms blocks whose energy is greater than ParmSpeechThresh to trigger VAD.
.parameter .uint24 Parm_SpeechOffWindow
R/W
C/I
{def=0}
Speech Off Number of 12 ms blocks which must be surveyed to detect speech energy.
.parameter .uint24 Parm_ToneIDReporting
R/W
C/I
{def=0}
If enabled (1) then iDTMF algorithm will start reporting tone id information to FEP.
.parameter .uint24 Parm_ECMode
R/W
C/I
{def=1}
<<If Mode = 0 Old EC
    Mode = 1 New EC-16
    Mode = 2  New EC
>>
.parameter .uint24 Parm_ERLINIT
R/W
C/I
{def=0x7FFFFF}
<< For debug only - Initial  Value of Echo Return Loss >>
.parameter .uint24 Parm_ApiFiterlWindow
R/W
C/I
{def=192}
<< For debug only -  Active Window of Adaptive FIR >>
.parameter .uint24 Parm_WhiteNoiseGain
R/W
C/I
{def=0x4285FC << 0.52 >>}
<< For debug only - Comfort Noise Gain >>
.parameter .uint24 Parm_dbgStrmEnable
W
C
{def=0}
<< Component only parm for debugging >>
.parameter .uint24 Parm_TrailingSilence
R/W
C/I
{def=200}
<<Duration of Trailing Silence in 10ms increment. Range is 100 to 1000ms
>>
.parameter .uint24 Parm_InitialDataStream
R/W
C/I
{def=0}
<<Duration of Initial Silence in 10ms  increment. Range is 0 to 2 Sec.
>>
.parameter .uint24 Parm_SCSMode
R/W
C/I
{Off=0,On}
<<If SCS Mode = 0 Silence Compress Streaming Disabled.
    SCSMode = 1 Silence Compress Streaming Enabled.
>>


There are seven FES parameters can be changed by the application.  All the parameters should be linear integer  values.  The fractional value should be converted to integer by multiplying by 2 23 in the application. The following table shows the default value in decimal and hexadecimal:

Parameter Name
original value
Converted to integer * 2 23    
 in Hex
NLP
1


ADAPT_COEFFS
0


FILTER_LENGTH
128


MU_CONST
0.002


MU_CONST_MAX
0.02


PRE_EMPHASIS
TRUE


FRAME_OVERLAP
0


CEP_CENTER_FREQS
0




10. Attributes
In addition to the standard attribute types Std_ComponentType, Std_ComponentID and Std_VendID which must be initialized by this component the following are attribute types specific to the FES:
Attribute name
Description
.external .attribute Std_ComponentType 
{ Std_ComponentType = 0x2C}

<<
FES Component Type definition.
>>
.attribute Feature
<<
Front-End processing features requested.  The different processing features are discussed in 3.1.  Different FES features are requested by the bit settings defined in 3.2.1.
>>
.attribute FESType 
<<
FES defined in fes.h 
>>




11. Miscellaneous Equates
The following definitions are used by the Output Format Mask, as specified in section 3.2.1.  The Output Format Mask is passed as one of the arguments in the MsgStart, and is used to control which features are enabled/disabled on the front-end.
<
#define EFES_DTMF_1      1
#define EFES_DTMF_2      2
#define EFES_DTMF_3      3
#define EFES_DTMF_4      5
#define EFES_DTMF_5      6
#define EFES_DTMF_6      7
#define EFES_DTMF_7      9
#define EFES_DTMF_8      10
#define EFES_DTMF_9      11
#define EFES_DTMF_0      14
#define EFES_DTMF_A      4
#define EFES_DTMF_B      8
#define EFES_DTMF_C      12
#define EFES_DTMF_D      16
#define EFES_DTMF_POUND  15
#define EFES_DTMF_STAR   13
>
<<
Note:For the order of the digits check dtmfdbg.h file in idtmf algorithm
>>


<<
End of all DM3 FES Component Definitions
>>
}
}

1 Standard Interface Specification for DM3 Components
1:38 PMDM3 ASRFE Secondary Component Interface Specification SRS-0xxx
 1.12


INTEL CONFIDENTIAL
Page 45 of 45

