Zurück zur Startseite
 Startseite   Kontakt   Impressum   Datenschutz   English  Zitat von Friedrich Jürgenson

The speech synthesis method

by S. B.

Current note: In the meantime, the DOS programm "AUDIGIT" described in this article is technically out-dated. A new software, which also runs under Windows, can be found here. An up-to-date description of the speech synthesis method can be found [German] here.

When approximately in the year 1990 the computer aided speech synthesis method was developed and since then was used regularly for voice recordings in the Cologne Technology Centre, it didn't spread out very much in spite of the good results. This was clear, since an attachment, the so-called digitizer, was required in addition to a computer. Its purpose was to transform analogous speech signals into digital data which could be processed by the computer. However, this digitizer wasn't available on the market, but it had to be assembled - an insurmountable obstacle for interested but electronically less experienced users. Meanwhile the age has changed - we live in the "Multimedia Age" - and a soundcard actually belongs to the standard equipment of every computer which could meanwhile already conquer its firm place next to television set and refrigerator in almost every home. What seemed more likely than to make this interesting recording method available to a wider circle of potential users by adapting the required software to soundcard support and furthermore, by extending it in many respects? Thus, if you own a computer with soundcard, the only thing you need to make recordings with the speech synthesis method is the software AUDIGIT (further informations you will find at the end of this article).

The speech synthesis method indeed was already mentioned briefly in issue P 86 on page 15 ("Der Computer-Zufallsgenerator" = "The computer random generator") but in particular to the "newcomers" of VTF this term possibly won't mean much. Therefore I would like to deal once again with it at this point:

In order to record EVP it seems to be required to offer some kind of "raw material" which is used to form meaningful linguistic comments to the experimenter's questions by physically not (yet) explainable influences. The most different methods are existing for the generation, radiation and recording of the raw material, whose frequency spectrum obviously must always be within the audible range - wether as a direct sound event or modulated onto the most different carriers (radio waves, light etc.).

With the speech synthesis method, a further interesting possibility for the generation of acoustical raw material exists. The principle of this method is explained simply: A short sequence (e.g. 25 sec.) of any speech signal is subdivided into small segments of the same length (e.g. 100 millisec.) which are then played back in random order with the aid of a computer. This "gibberish" indeed still has the sound of the original speech signal, but is no more understandable and therefore, is suitable as an acoustic background noise for EVP recordings.

The initial idea concerning this method was the random controlled composition of individual, exactly defined phonemes to a random language (random controlled phoneme synthesis). It then turned out in practice, that randomly "chopped" language yields better results with voice recordings because of its greater dynamics. Subdividing an audio signal into segments of the same length is moreover to be managed more simply than the manual excising of individual phonemes. The variation possibilities are also by far more diverse here.

Because of the control possibilities given by the use of the computer during generation of the raw material, the speech synthesis method could make an important contribution to the documentation of the paranormality of the voice phenomenon and therefore support the scientific recognition of this phenomenon.

Now something concerning the diverse possibilities which this method offers:

Any speech signals can be digitized at 8 bits resolution and stored in the computer memory. The sampling rate can be varied from approximately 4.7 to 44.2 kHz. A specific function then allows the random controlled playback of individual segments of this signal in order to generate an acoustic "raw material" for voice recordings. There are the following setting options here:

Length of segments

The length of the random controlled played back segments can be choosen between 1 millisecond and several seconds. With this the degree of the cutting up of the signal is determined. At too high values, individual snatches of words or complete words can be heard; too small values (less than 10 ms) falsify the sound of the signal, but can, however, yield interesting results!

Order of segments

The segments can be arranged either one after the other or overlapping within the audio signal. In the first case, the segments lie seamlessly behind each other in exactly defined places. In the second case, every segment may lie in any place within the audio signal, what allows more variation possibilities for the generation of the raw material.


The transition between two chronologically consecutive segments means amplitude variation resp. the alteration of the volume: With "rectangle" the volume is constant for every segment played back, the segments are "cut hard" and succeed one after another directly which sounds rather "chopped". "Triangle" causes a slowly increasing of the volume up to the peak value while simultaneously the volume of the preceding segment is decreasing. After reaching the peak value, the volume decreases again slowly, while the next segment slowly starts. This "fading" yields a "smoother" sound since the cutting points between the individual segments are no more audible. A further method of "soft cutting" is "trapezoid": Here the ascending and descending flanks of the fading are steeper and after reaching the maximum value the volume remains at the peak value for a period of half a segment length before it decreases again. All three methods of cutting are suitable for recordings.


By insertion of pauses of variable length between an also variable number of segments, the continuous playback of randomly selected segments can be interrupted in regular or accidental intervals. By this, a normal voice flow should be simulated. Moreover, this function is very helpful in order to create room for the questions of the experimenter. In addition, the later analysis is facilitated as a result. And, not least the voice formation seems to be supported by this. Often voices begin or end exactly with a sequence between two inserted pauses, which each time is very impressive.

During the random controlled playback, the used segment numbers and every other parameters like sampling rate, segment length, order, transition and pauses can be logged in order to be able to reproduce the same random signal at a later date for a control recording for verification purposes of possible acoustic transformations under identical test conditions.

All digitalized audio signals which serve as a basis for generation of raw material can be stored into files and loaded into the computer memory again for later use. Comparable conditions for series of experiments can be managed only this way. In order to be able to exchange audio samples with other applications the widespread WAVE format was chosen as a file format.

For recordings with the speech synthesis method it turned out to be useful to emit the computer-generated random speech signal acoustically via a speaker at small volume into the room and to receive it by a microphone at some distance and to feed it to a cassette recorder for the purpose of recording. In this case, the questions of the experimenter get onto the tape together with the acoustic raw material.

In order to obtain better control possibilities of acoustic "transformations", the microphone can be connected to the left channel of a stereo cassette recorder while the raw material is fed directly to the right channel by LF cable. The later evaluation then occurs alternately via the left and the right stereo channel: If a voice is detected on the left track of the recording, it can be checked by switching over to the right track whether or not this voice was already audible in the raw material. In the first case, one would have proved a modification of the original raw material via the acoustic way. However, it is not safe to say whether this is a "paranormal" modification because even by the acoustic factors (architectural acoustics, reflection of the acoustic waves, frequency response of speaker and microphone, overdriving of the recording, harmonics), the signal can be distorted so much that apparently new or other sounds are being created. A control recording with identical test conditions and making use of the same raw material can at least verify in a such case, whether it concerns an acoustically caused and therefore physically explainable and reproducible event, or an unique event which is not reproducible.

It might be mentioned finally that many very good voices are to be heard already directly in the raw material, therefore are not caused by acoustical transformations. However, this fact doesn't need to disprove their genuineness and paranormality. Here we have to do with a phenomenon which at the present time still completely evades physical explanation models.

(Source: VTF-Post P 87, issue 2/97)

Experimenting with the speech synthesis method

If you are interested in making experiments with the speech synthesis method, you can download the required software AUDIGIT. The program and the documentation are written in German (sorry), but I have translated some terms into English - see below! The program can only be run under DOS. It won't work within a DOS box of Windows! So if you're using Windows 3.x, please exit! If you're using Windows 9x, please reboot in DOS mode. I have no experience in Windows NT. The memory size available under DOS is restricted to 640 KB. To get the most out of your memory, you should load any DOS drivers into the high memory area. Maybe there will be an AUDIGIT for Windows some day...

[Update: Meanwhile there exists a new Program called EVPmaker which runs under Windows!]

As 'raw material' (the 25 sec recording) you can use any spoken text, e.g. taken from the radio or your own voice. Try male and female voices. There is a variety of possibilities and you can't predict which will work best. Once recorded, you can save the sample as a WAVE file (mono, 8 bit) and load it for future use. After recording or loading a sound file, choose 'Zufallsgesteuerte Wiedergabe' to generate the 'random speech'. Try to alter some parameters (segment length, order or transition between the segments (hard or smooth)). Record this 'gibberish' via microphone. If you want to ask a question, stop it and restart it after you've finished. It might be useful to activate the 'Pauses'. Sometimes 'voices' are starting or ending exactly at the borders of a sequence between two pauses. If there are any voices, they are often very loud and clear and related to the experimentator's question.

Make your experiments over a longer period of time, because the quantity and the quality of the received voices is varying independent of known physical facts. I suppose this has something to do with the emotional state of the experimenter. Maybe there is a kind of 'energy' that can be used to produce this paranormal phenomenon (if it is paranormal...). An example: When I called a specific person shortly after her death, the quantity of the received voices was very high within the first 10 days (up to ten voices per recording). Then suddenly there were no voices any more! After a pause of 10 days I received again some voices, but only a few. I have always used the same equipment.

Now I wish you good luck for your own experiments. Please let me know about your experiences.

Translation of menu commands of the AUDIGIT software

Aktion                                 Action
keine                                  none

Funktionen                             Functions
Aufnahme                               Record
Wiedergabe                             Play back
Zufallsgesteuerte Wiedergabe           Random controlled playback
Datei laden                            Load file
Datei speichern                        Save file

Sample                                 Sample
Sample-Frequenz: 22095 Hz [+] [-]      Sampling rate: (frequency)
Laenge des Samples: 0.000/17.062 s     Length of Sample: (current/max)

Segmente                               Segmets
Anzahl: 0 von 157                      Number: (current/max)
Laenge: 109 ms                         Length: (time)
Anordnung: ueberschne. | hintereinand. Order: overlapping | one after another
Uebergang: Rechteck | Dreieck | Trapez Transition: rectangle | triangle | trapezoid

Pausen                                 Pauses
Pausen: ein | aus                      Pauses: on | off
Dauer: 100 bis 1000 ms                 Duration: <min> to <max> millisec.
Haeufigkeit: alle 5 bis 20 Segmente    Frequency: every <min> to <max> segments

Effekte                                Effects
Signal umdrehen                        Reverse signal
Signal mit Echo/Hall versehen          Echo

Sonstiges                              Misc.
Eingang fuer Aufnahme: Line In | Mik.  Input for recording: Line In | Microphone
Soundkarte/Digitizer initialisieren    Initialize soundcard/digitizer
Konfiguration speichern                Save configuration
Ende                  [Alt-X] [Esc]    Exit                  [Alt-X] [Esc]

Protokolldatei                         Protocol file
Name:                                  Name: (enter or choose new/existing file)
Mitprotokollieren: ein | aus           Protocol: on | off
Abspielen                              Playback