The speech synthesis method
by S. B.
Current note: In the meantime, the DOS programm "AUDIGIT" described in this article is technically out-dated. A new software, which also runs under Windows, can be found here. An up-to-date description of the speech synthesis method can be found here.
When approximately in the year 1990 the computer aided speech synthesis method was developed and since then was used regularly for voice recordings in the Cologne Technology Centre, it didn't spread out very much in spite of the good results. This was clear, since an attachment, the so-called digitizer, was required in addition to a computer. Its purpose was to transform analogous speech signals into digital data which could be processed by the computer. However, this digitizer wasn't available on the market, but it had to be assembled - an insurmountable obstacle for interested but electronically less experienced users. Meanwhile the age has changed - we live in the "Multimedia Age" - and a soundcard actually belongs to the standard equipment of every computer which could meanwhile already conquer its firm place next to television set and refrigerator in almost every home. What seemed more likely than to make this interesting recording method available to a wider circle of potential users by adapting the required software to soundcard support and furthermore, by extending it in many respects? Thus, if you own a computer with soundcard, the only thing you need to make recordings with the speech synthesis method is the software AUDIGIT (further informations you will find at the end of this article).
The speech synthesis method indeed was already mentioned briefly in issue P 86 on page 15 ("Der Computer-Zufallsgenerator" = "The computer random generator") but in particular to the "newcomers" of VTF this term possibly won't mean much. Therefore I would like to deal once again with it at this point:
In order to record EVP it seems to be required to offer some kind of "raw material" which is used to form meaningful linguistic comments to the experimenter's questions by physically not (yet) explainable influences. The most different methods are existing for the generation, radiation and recording of the raw material, whose frequency spectrum obviously must always be within the audible range - wether as a direct sound event or modulated onto the most different carriers (radio waves, light etc.).
With the speech synthesis method, a further interesting possibility for the generation of acoustical raw material exists. The principle of this method is explained simply: A short sequence (e.g. 25 sec.) of any speech signal is subdivided into small segments of the same length (e.g. 100 millisec.) which are then played back in random order with the aid of a computer. This "gibberish" indeed still has the sound of the original speech signal, but is no more understandable and therefore, is suitable as an acoustic background noise for EVP recordings.
The initial idea concerning this method was the random controlled composition of individual, exactly defined phonemes to a random language (random controlled phoneme synthesis). It then turned out in practice, that randomly "chopped" language yields better results with voice recordings because of its greater dynamics. Subdividing an audio signal into segments of the same length is moreover to be managed more simply than the manual excising of individual phonemes. The variation possibilities are also by far more diverse here.
Because of the control possibilities given by the use of the computer during generation of the raw material, the speech synthesis method could make an important contribution to the documentation of the paranormality of the voice phenomenon and therefore support the scientific recognition of this phenomenon.
Now something concerning the diverse possibilities which this method offers:
Any speech signals can be digitized at 8 bits resolution and stored in the computer memory. The sampling rate can be varied from approximately 4.7 to 44.2 kHz. A specific function then allows the random controlled playback of individual segments of this signal in order to generate an acoustic "raw material" for voice recordings. There are the following setting options here:
Length of segments
The length of the random controlled played back segments can be choosen between 1 millisecond and several seconds. With this the degree of the cutting up of the signal is determined. At too high values, individual snatches of words or complete words can be heard; too small values (less than 10 ms) falsify the sound of the signal, but can, however, yield interesting results!
Order of segments
The segments can be arranged either one after the other or overlapping within the audio signal. In the first case, the segments lie seamlessly behind each other in exactly defined places. In the second case, every segment may lie in any place within the audio signal, what allows more variation possibilities for the generation of the raw material.
The transition between two chronologically consecutive segments means amplitude variation resp. the alteration of the volume: With "rectangle" the volume is constant for every segment played back, the segments are "cut hard" and succeed one after another directly which sounds rather "chopped". "Triangle" causes a slowly increasing of the volume up to the peak value while simultaneously the volume of the preceding segment is decreasing. After reaching the peak value, the volume decreases again slowly, while the next segment slowly starts. This "fading" yields a "smoother" sound since the cutting points between the individual segments are no more audible. A further method of "soft cutting" is "trapezoid": Here the ascending and descending flanks of the fading are steeper and after reaching the maximum value the volume remains at the peak value for a period of half a segment length before it decreases again. All three methods of cutting are suitable for recordings.
By insertion of pauses of variable length between an also variable number of segments, the continuous playback of randomly selected segments can be interrupted in regular or accidental intervals. By this, a normal voice flow should be simulated. Moreover, this function is very helpful in order to create room for the questions of the experimenter. In addition, the later analysis is facilitated as a result. And, not least the voice formation seems to be supported by this. Often voices begin or end exactly with a sequence between two inserted pauses, which each time is very impressive.
During the random controlled playback, the used segment numbers and every other parameters like sampling rate, segment length, order, transition and pauses can be logged in order to be able to reproduce the same random signal at a later date for a control recording for verification purposes of possible acoustic transformations under identical test conditions.
All digitalized audio signals which serve as a basis for generation of raw material can be stored into files and loaded into the computer memory again for later use. Comparable conditions for series of experiments can be managed only this way. In order to be able to exchange audio samples with other applications the widespread WAVE format was chosen as a file format.
For recordings with the speech synthesis method it turned out to be useful to emit the computer-generated random speech signal acoustically via a speaker at small volume into the room and to receive it by a microphone at some distance and to feed it to a cassette recorder for the purpose of recording. In this case, the questions of the experimenter get onto the tape together with the acoustic raw material.
In order to obtain better control possibilities of acoustic "transformations", the microphone can be connected to the left channel of a stereo cassette recorder while the raw material is fed directly to the right channel by LF cable. The later evaluation then occurs alternately via the left and the right stereo channel: If a voice is detected on the left track of the recording, it can be checked by switching over to the right track whether or not this voice was already audible in the raw material. In the first case, one would have proved a modification of the original raw material via the acoustic way. However, it is not safe to say whether this is a "paranormal" modification because even by the acoustic factors (architectural acoustics, reflection of the acoustic waves, frequency response of speaker and microphone, overdriving of the recording, harmonics), the signal can be distorted so much that apparently new or other sounds are being created. A control recording with identical test conditions and making use of the same raw material can at least verify in a such case, whether it concerns an acoustically caused and therefore physically explainable and reproducible event, or an unique event which is not reproducible.
It might be mentioned finally that many very good voices are to be heard already directly in the raw material, therefore are not caused by acoustical transformations. However, this fact doesn't need to disprove their genuineness and paranormality. Here we have to do with a phenomenon which at the present time still completely evades physical explanation models.