National Academy of Sciences (NAS)

Report of the Committee on
Ballistic Acoustics


 

 

REPORT OF THE
COMMITTEE ON BALLISTIC ACOUSTICS

COMMISION ON PHYSICAL SCIENCES,
MATHEMATICS, AND RESOURCES

NATIONAL RESEARCH COUNCIL

NATIONAL ACADEMY PRESS
Washington, DC 1982

 

APPENDIX B

ANALYSES OF SOUND SPECTROGRAMS OF "HOLD EVERYTHING . . ."

B-1. Time and Frequency Analysis

Several sound spectrograms were made of the first and second halves of the "hold everything . . ." expression on Channels I and II. Two of these pairs are given in Figures B-1 and B-2. Although some similar features can be observed in comparing the two channels in Figures B-1 and B-2, it is difficult to tell if the similar features occur somewhat at random or if corresponding features occur at corresponding times over the entire 3 1/2 second duration of the message, as must be the case if the corresponding features are associated with the same transmission. For this reason, the following analysis was made of two successive pairs of sound spectrograms which were butted together, with an overlapping sound spectrogram being used to ensure that the sound spectrograms were combined properly. The result is shown in Figure B-3.

Twenty-seven corresponding features have been marked on Channels I and II in Figure B-3. Since the timings of the corresponding features were to be studied later, two observers were used in the identifications to diminish the danger that human prejudice on the timing would affect the identification. The first observer, looking at the sound spectrograms of both Channels I and II but making no measurements, marked on Channel II 27 points which he felt were sufficiently characteristic and sufficiently well reproduced on Channel I to be identifiable there by an independent observer. Then a series of 27 xerographic copies were prepared of different portions of Channel II, extending 1/2 second to each side of the single identified characteristic and with no indication of time scale on any of the Channel II strips. These strips and the Channel I sound spectrogram were presented to a second observer who was asked to mark on the Channel I sound spectrogram what he considered to be the most similar characteristic to the one marked on each Channel II strip. He was asked to do so by pattern recognition and not by measurement. His marks located the black dots on the Channel I tape of Figure B-3. It was found that the second observer correctly identified 26 out of the 27 characteristics. In the one case of disagreement (characteristic I) the second observer subsequently agreed that the intended identification was better than the one he selected.

Only after all the identifications had been made were the times a frequencies of each characteristic measured and recorded in Table B-1. These are plotted in Figure 4. It can be seen that the points fall markedly close to a straight line with the only exception being the misidentified characteristic I.

A straight line of the form

T' = α + βT" + u

was fit to the (T' , T") data. Under the copy hypothesis that the signal on Channel I is a noisy copy of that on Channel II, the values of u are determined by measurement errors in the presence of noise, and there may occasionally be an outlier due to the matching noncorresponding features on the two channels. The robust linear regression routine RLIN in the Minitab 80.1 interactive statistical package(5) yields the estimated fit

T' = - 0.0253 + 1.0599T" + u*

A sequence of regressions in which outliers are dropped one or two at a time yields the fit

T' = - 0.0216 + 1.0593T" + u.

Here, the points 9, 11, 13, 18, and 19 were dropped. The standard deviation of the fitted residuals of the remaining 22 points (adjusted for 20 degrees of freedom) is su = 0.0092 and the estimated standard deviations of the two u coefficients above are 0.0037 and 0.0016, respectively.

The five outliers in the column labeled u = ΔT are marked with a D. All other values of u are less than 0.015 in absolute value.

The ratios R = F"/F' of the measured frequencies at the paired points in the two channels were computed. A sequence of averages in which outliers are dropped eliminates four ratios numbered (1, 5, 8, 13) and yields an average -- R* = 1.064 and standard deviation sR = 0.0272. R* is an estimate of β. The standard deviation of R* is σR/√22 = 0.0058 so that R* is a less accurate estimate of β than that derived from the regressions above. The values of

v = R - 1.064 = ΔR

are listed with the outliers marked by a D. Finally we calculate and list

w = F' - F"/1.0593 = ΔF.

Except for the two outliers, marked D, corresponding to points (1, 5) all of the values of ΔF are less than 0.09 kHz in absolute value and have a sample mean of -6.92 Hz and standard deviation of 48 Hz.

These data are consistent with the copy hypothesis, a probability of about 1/4 or less of an incorrect match and relatively small measurement errors in the time and frequency measurements. To be more specific, let us Suppose (i) that the Channel II markings are precise, (ii) the Channel I markings may be either wrong or correct, but displaced by an amount due to the noise, and (iii) each measurement has a reading error.

For example suppose

t' = α + βt + un
T' = t' + u'e
T' = t" + u"e

where t' and t" are the exact times of the corresponding events, un is the contribution of the distortion due to noise, T' and T" are the observed times, and u'e and u" e are the reading errors. Then

T' = α + βT" + (un + u'e - β u"e) = α + βT" + ΔT

and, assuming independence of the residuals,

σ2ΔT = σ2un + (1 + β2) σ2ue ≈ σ2un + 2σ2ue

(The lack of statistical independence between T" and u = ΔT raises a technical problem which is minor in the present context and won't be discussed here.)

Because the process of discarding outliers tends to bias the estimated standard deviation downwards, one would expect σu to be about 0.01 which is consistent with σue ≈ 0.005 and σun ≈ 0.007, although other combinations are also plausible considering the data and the measurement techniques. The five outliers, one of which is much larger than the others, suggest that the probability of incorrect match may be as large as 1/4.

A similar analysis may be applied to the frequencies. If we write

f' = β-1f" + vn
F" = f" + v"e
F' = f' + v'e

where f' and f" are the exact frequencies, vn is the contribution of the distortion due to noise, F' and F" are the observed frequencies, and v'e and v"e are the reading errors. Then

F = F"/F' = β + ΔR

where the probability distribution of ν = ΔR can be approximated by one with mean 0 and standard deviation

σΔR = (f')-12σ2vn + (1+β22ve]1/2 ≈ [σ2vn + 2σ2ve]1/2/f'

which averages out to approximately [(F')-1][σ2vn + 2σ2ve]1/2 where [(F')-1] is the average of the (F')-1 values. Also

ΔF = F' - β-1F"

has mean 0 and standard deviation

σΔF = {σ2vn + σ2ve[1 + β-2]}1/2 ≈ [σ2vn + 2σ2ve]1/2

and

σΔR ≈ (F'-1)*σ ΔF

The relation

σΔR ≈ (F'-1ΔF

is approximately maintained by the estimates.

Could the observed coincidences have occurred even if the message on Channel I were not a copy of that on Channel II? Suppose that as an alternative hypothesis we assume that it was a different message and the time coincidences took place because the features marked maxima, minima, flats, and downward slopes occur frequently on Channel I and a similar feature could, at random, be close by to one being sought. For example, there are 18 peaks in a 3.6 second interval. Thus at random, peaks would occur at an average spacing of 0.2 seconds and, according to the Poisson process calculation, the probability of at least one peak within a time of δ seconds from a specified time would be p = 1 - e-2δ/0.2.

The frequencies of the other features are no greater than that of peaks; hence, the probability of a coincidence within |ΔT| is less than or equal to 0.015 is p = 1 - e-0.15 = 0.14. We have 22 such coincidences out of 27 trials. Granted that we selected estimates of α and β to increase the number of such coincidences, we may, to be conservative, eliminate two of these coincidences. We then have 20 out of 25 coincidences. Assuming independence, the number of such coincidences has a binomial distribution with mean 25 × (0.14) = 3.5 and standard deviation 1.73. Then 20 is 9.51 standard deviations away from the mean and the probability of getting as many as 18 coincidences is about 2.1 × 10-13.

Note that 25 of 27 values of ΔF are less than 0.1 kHz in absolute value. If each of these ΔF were uniformly distributed over a narrow range of ±0.3 kHz, the probability of 25 or more independent absolute values less than 0.1 would be very small (2 × 10-10). In fact, this probability would be 0.001 even if the range of values of ΔF were cut in half to ±0.15 kHz.

The sound spectrograms shown in Figure 4 are similar to those in Figure B-3 except that one recording is slowed down 6.7% to bring the ratio of apparent recorder speeds closer to unity. The black dots indicate the same features as in Figure B-3 except for a few points, such as I, that have been adjusted for a better fit in Figure 4.

 

B-2. Measurements of Easily Identified Frequency Ratios on Sound Spectrograms

A casual inspection of the original sound spectrograms of sections of Channel I and Channel II recordings for the time interval identified as containing the phrase "hold everything . . ." show marked similarities, but with the most clearly defined frequencies on Channel I being somewhat lower than those on corresponding sections of Channel II. Since the analysis of the preceding section shows that the measured times between corresponding events on Channel I are longer than on Channel II, by about 6%, it seemed worth measuring the frequency ratios of corresponding signals that were particularly well suited for frequency measurement; if the two sound spectrograms really did originate from a single 3.5-second long signal on Channel II, which was fed by cross talk onto Channel I, then the frequency ratio must depart from unity by that same approximately 6%. This was our working hypothesis at the time, so the frequency ratio measurements provided a test of the hypothesis -- if the frequency ratio was not approximately 1.06 the hypothesis would have been totally disproved.

One of the Committee members, therefore, measured the frequency ratio at five corresponding sections of the records. The sections to be measured were selected by a simple criterion that can be used by any interested person. The frequency must stay constant (a horizontal band, by visual inspection) for at least 1/30 second, and it must be clearly visible on both channels at corresponding times. It is not required that the frequency bands originate from speech components of the signals on Channel II. Anyone listening to this section of Channel II will hear, in addition to the sentence starting with "Hold everything secure . . .," a number of tones that are both amplitude and frequency modulated. These tones are as useful as the speech components in proving that a signal on Channel II was imprinted by cross talk onto Channel I at the time of the conjectured "shots."

The above mentioned criterion was satisfied by five sections of the two tapes, which are identified by their original times T', on Channel I. They are as follows:

 

Section       Time
1 centered at T' = 0.67 seconds
2 centered at T' = 2.19 seconds
3 centered at T' = 3.13 seconds
4 centered at T' = 3.31 seconds
5 centered at T' = 3.52 seconds

 

The measurement of each frequency was made in the following way: an indentation was made in the surface of the glossy print, near the center of each "band," with a sharp point. The observer then looked at the mark, to check that it was as nearly centered as possible in the vertical direction. On the few occasions that it appeared to be above or below the center of the band, a new mark was made, and checked to be adequately centered. Only after the observer was satisfied that he had placed ten marks correctly -- one for each of five bands, on two spectrograms -- did the measurements begin. The measurement consisted of a linear interpolation between adjacent kilohertz lines using a millimeter scale as the measuring device. The following five ratios came out of the measurements just as described:

 

Section Frequency Ratio
1 1.054
2 1.066
3 1.065
4 1.052
5 1.067
Mean Value 1.061±0.007

 

This value is consistent with the time ratio 1.059±0.002 found from the slope of the line relating the time coordinates on the two channels in Figure 5.

Another Committee member made independent measurements of the average of the same frequency ratios and found a mean value of 1.063±0.007.

In view of the close agreement between this pair of independent measurements, we conclude that the mean frequency ratio is

R = 1.062±0.007

The excellent agreement between the time-derived, and the frequency-derived ratio of tapes speeds lends strong support to the hypothesis that the "hold everything . . ." signals observed on Channel I were imprinted by cross talk from Channel II.

 

B-3. ALTERNATIVE TIME AND FREQUENCY ANALYSES OF SOUND SPECTROGRAMS

The analyses in Appendixes B-1 and B-2 may be subject to some criticism. A certain amount of subjectivity derives from the fact that the first observe was looking at the sound spectrograms from both channels while he marked points on Channel I. The strips in Channel II were one second wide, which is a substantial portion of the entire 3.5 second spectrogram. Consequently the 27 strips had large overlapping parts. To the extent that observer 2 recalled what he did on previous matches or to the extent that he used the same cues in the overlapping portions, the resulting times were dependent observations. A theory that uses estimates and conclusions based on indepenence assumptions may overestimate the significance or accuracy of these conclusions and estimates.

However, this experiment was supplemented by several variations that derived similar results. Some of these were more careful to avoid the subjectivity and to reduce considerably the dependence aspects of the experiment presented here. These are not reported in detail, because they were carried out using xerographic copies of photographs using several scales, and relatively crude measuring instruments (graph paper in place of rulers). A presentation here would be more complex and the photographs would lack clarity.

 

a) Initial experiments

In chronological order, an initial experiment was carried out where 28 pairs of corresponding points were measured on both Channel I and Channel I by an observer who studied both spectrograms simultaneously for characteristic features. A least squares analysis of these highly subjective data gave the fitted relation

T' = -0.0402 + 1.0673T"

and the ratios of the observed frequencies

R = F"/F'

averaged to 1.0728.

A "robust" analysis of the pairs (T', T") in the first experiment, where three outliers were discarded, gave the estimated relation

T' = -0. 0235 + 1.0633 T" + u

where the residual u had estimated standard deviation 0.0159 and the estimated standard deviation of the coefficient 8 of T" was 0.0028.

An alternate robust linear regression, implemented on the Minitab-1980 interactive statistical package(5) under the command RLIN, gave

T' = -0.0295 + 1.0626T" + u

A second experiment was by an observer who measured the central frequencies of 5 distinct pairs of broad horizontal sections appearing at comparable times and with relatively high frequencies. The ratios of these central frequencies R averaged R* = 1.060 and had sample standard deviation 0.0072.

 

b) A more objective experiment on the timing

At this point a more objective procedure was carried out using xerographic copies of a reduced photograph of the spectrograms. The observer was given the experimenter's explanation of the theory that messages were broadcast on Channel II and picked up by the stuck microphone located near a receiver of Channel II. The observer was shown copies of Channel I and II for two other messages that had been well duplicated; Y -- "You want . . . Stemmons" and S --"Says they came from . . ." It was explained that dark portions meant loud signals and sharp changes that were dark would probably be well reproduced under the theory. The observer was asked to mark about 20 spots on Channel II that would be likely to be well reproduced. The observer was not given an opportunity to study Channel I of H -- the spectrogram suspected of being "Hold everything secure . . ."

Twenty strips of Channel II of H, each between 0.2 and 0.3 seconds long, were reproduced by xerox with the marked point in the center. The estimate

^T = -0.0402 + 1.0673T" was used to locate corresponding points on Channel I. Strips of a xerox of Channel I were cut out. These strips were 3/4 seconds long and were centered at a point displaced from ^T by a random quantity uniformly distributed on the interval (-.3, .3) in seconds. Corresponding strips were paired and these pairs were arranged in random order.

A second observer was asked to align the two strips of each pair and to locate on Channel I a vertical mark corresponding to the time of the mark in Channel II. This observer was allowed to use as much context as was available, in the approximately 0.3 second of Channel II and 0.75 seconds of Channel I in the pair, to help him make the mark. It was not necessary for him to find a feature corresponding to the point marked. He, too, had the theory explained to him, and he was informed that there might be a consistent difference in the frequencies on the two channels.

This experiment requires some balance in selecting the widths of the strips. If both strips are too narrow, one is bound to get (T', T") points that lie close to the line T' = 1.0673T" - 0.0404 and a good fit will not be convincing. If the strip on Channel II is too narrow and that of Channel I is very wide, it will be very easy for the observer to be misled by similar characteristics elsewhere. This would reduce the efficiency and power of the experiment. If the strip on Channel II is wide, then the different strips will overlap substantially and memory and the cues the observer uses may make results on different strips dependent. As the experiment was carried out the 20 strips of Channel II had pairs with some overlap, but in the random order of presentation these small strips looked quite distinct.

When the times were measured, the deviations, ΔT = T' - (1.0633T" - 0.0235), between the measured time in Channel I and the time anticipated by the robust estimate of the straight line, were calculated. Thirteen of these were no larger than 0.054 seconds, one was 0.075 seconds and the remaining 6 of 0.203 seconds or more. The mean and standard deviation of the thirteen smaller deviations were -0.016 seconds and 0.026 seconds. The root mean square deviation was 0.029 seconds.

These results are consistent with the copy hypothesis if one anticipates misclassifications about 1/4 of the time and measurement error due to noise and measurement accuracy of about 0.03 second (about 0.07 inch on the scales used).

Under the randomness alternative hypothesis, that the two messages are unrelated and any matching of features is randomly located, one may estimate that the probability of being within 0.054 seconds of the expected point to be about 0.2.*

[*Under the randomness hypotheses, the distribution of the discrepancy corresponds to the sum of the off center random displacement (uniform from -.3 to .3) and an independent random choice in (-.375, .375) along the Channel I strip. Since this latter choice is almost uniform except for a possible bias toward the center, it was modeled as the sum of two uniforms from (-.3 to .3) which has a symmetric triangular distribution from -.6 to .6. The probability that this Sum is between -.054 and +.054 is 1 - (.6-.054/.6)2 = 0.17 is less than or equal to 0.2.]

The number of such coincidences out of 20 independent trials would be binomially distributed with mean 4 and standard deviation 1.79 and 13 successes corresponds to (13-4-.5)/(1.79) = 4.75 standard deviations from the mean and is highly unlikely.

Moreover, subtracting 2 of the 13 successes to compensate for the choice of the linear fit would still make this match very unlikely. Then we would have (11-4-.5)/1.79 = 3.63 standard deviations with P = 0.0006.

The poor quality of the xerographic copies with which this experiment was carried out and the low-quality measuring instruments explain in part why the standard deviation of the observed discrepancies were much larger than those observed with the data presented in Table B-1.

 

c) A more objective experiment on the frequencies

The experimenter selected 14 dark horizontal bands on a xerox copy of Channel II. The time points T" of these horizontal bands were measured. Corresponding times on Channel I given by ^T = 1.0633T" - 0 .0225 were located. The subject was requested to mark the central frequency of the bands on Channel II. Then the subject was requested to locate bands on Channel I at the times marked and to mark the central frequency.

These central frequencies were measured and labeled Fl and F2 for the two channels. The ratio R = F2/Fl was calculated and ranged from 1.337 to 1.024. Deleting 4 outliers, the average was R* = 1.0665 and the sample standard deviation was sR = 0.0116.

These data are consistent with a hypotheses that Channel I is a noisy version of Channel II which leads to a wrong pairing about 1/3 of the time and that when the correct pairing is made, the noise distortion and measurement error in the individual central frequency readings corresponds to about F*2sR/√2 = 0.015 kHz or about 15 Hz.

By no stretch of the imagination could these readings be consistent with a purely random location of horizontal bands theory. Even a much more restrictive hypothesis, assuming that another speech was uttered in a similar cadence with similar frequencies of vowels and mechanisms yielding strong horizontal bands, was shown to be implausible as long as these bands were allowed to fluctuate at random within narrow ranges determined by the empirical data.

 

B-4. Digital Calculations of Cross Correlations Between Channel I and Channel II

If indeed "hold everything . . ." on Channel II was transmitted to and recorded on Channel I at the time occupied by the assumed "shots", then the digital cross-correlation of the short-time acoustic (energy) spectra of the two Channels should show a correlation substantially larger than that which would be achieved by chance. This was studied by a member of the Committee and two collaborators. The Channel I and Channel II recordings were digitized and the short-term acoustic spectra were taken and stored in a digital computer. The printouts of these spectra are shown in Figures B-4, B-5 and B-6. These digital spectrograms were computed directly from magnetic tapes and did not involve the use of the FBI sound spectrogram equipment.

An objective measure of similarity of two spectral matches is obtained from the cross correlation coefficient, defined as for the functions X and Y by

ccc = (Σ X·Y)/ [(Σ X·X) · (Σ Y·Y)]1/2.

This cross correlation coefficient would be reduced if one of the recordings were played at the wrong speed, or if the recording at one time were compared with the same or a different recording at a different time.

The first cross correlation coefficients were made from the same Channel I and II recorded copies that were used in preparing Figures 3, 4, B-1, and B-2. It was found that the biggest peak for the cross correlation coefficient occurred for a relative warp (or speed ratio) of 1.06 in agreement with the other two manual approaches for comparing Channels I and II, a 1% deviation of warp from optimum diminished the peak substantially. Unfortunately, that Channel II copy contains many repeats caused by the Gray Audograph machine in playback. Accordingly another tape copy was prepared by members of the Committee directly from the original Audograph plastic disk itself and by the use of a standard turntable and tone arm, thus producing a tape without compensation for the fact that the disk was originally recorded at constant linear track speed. It was this tape that was used in preparing the sound spectrograms shown in Figures B-4, B-5, and B-6. The Channel II signals are from the 7.5 ips tape recording of the Gray Audograph record played on a turntable (12/9/81). The tape was played at 3.75 ips when digitized for these experiments: hence, the rate of change of the correction factor was assumed to be half the measured rate of 0.0005 per second. The signals were digitized at 20000 samples per second, and a 400-pt Fourier transform was computed every 200 samples (10 millisec), using a 400-pt Blackman window. The correlations were performed on portions of the 200-pt spectra, which have a point spacing of 50 Hz. The high frequencies of the Channel I spectra were boosted at a rate of 6 db per 1000 Hz and then normalized to a constant energy in the band of interest.

Figure 6 gives the cross correlation coefficient for the "hold everything . . ." segments when the relative speed was selected to give the largest peak that occurred when the Channel II signal was sped up slightly by compressing the time scale by a factor that varied from 0.957 to 0.961 (changing at the rate of 0.00025 per sec). Figure 6 is a plot of the 750 correlation coefficients obtained by sliding 2.50 secs of Channel I along 10.00 secs of Channel II, 0.01 secs at a time, using frequencies in the band 600 Hz to 3500 Hz. For comparison the cross correlation coefficients of the unambiguous segment "You want . . .Stemmons" are plotted in Figure 7 with the time scale of Channel II stretched by a factor that varied from 1.013 to 1.015. The shape of the peak is very similar to that for the "hold everything . . ." segment. The background is somewhat smoother simply because there is less noise in Channel I at this time. Channel I, however, in neither case gives a perfect reproduction of Channel II. It has lost some of the high and low frequencies and, as one would expect, there are tones present on Channel I that are not on Channel II.

The marked narrow peaks of the cross correlation curves clearly show by an objective test that the "hold everything . . ." segment of Channel II is present on Channel I at the same location as the acoustic impulses.

Inspection of the spectrograms of Figure B-6 shows the presence of a Channel II brief tone beginning at time 32.00 secs and extending to 32.08 secs. It resumes at time 32.24 and disappears once more at 32.43. The Channel II brief tone is clearly visible in the Channel I spectrogram aligned by the relative timing obtained from Figure 6. A strong Channel I heterodyne is observed to begin at time 32.03 and to end at 32.17 secs. The resumption of the Channel II brief tone in Channel I at 32.24 secs is clearly weak and gradually grows in strength. These observations can be made more quantitatively from Figures B-7 and B-8, which are "printer plots" of the relevant regions of the Channel II and Channel I spectra. The vertical bars outlining the Channel I) brief tone (and the same time-frequency bins in Channel I) not only guide the eye, but allow the quantitative calculation of the energy between the bars. The digits printed are the "bin energy" in decibels, each unit corresponding to a 4-db range. By the end of the first Channel II brief tone at time 32.08, it has been suppressed by about 10 db relative to its value before the Channel I heterodyne appeared at 32.03. When the Channel II brief tone reappears at 30.24 secs, the AGC has suppressed it by approximately 20 db, and it recovers to its original value only at about 32.43 secs, some 0.26 secs after the end of the Channel I heterodyne at 32.17 secs. That this AGC action is not due to a later recorder or a re-recording is demonstrated by the fact that much stronger Channel II brief tones are present on the Channel I recording, without showing the drop in intensity which is induced by the Channel I heterodyne.

 

Back to the top

Back

Next

Table of Contents

 

Back to archive of acoustical evidence

Back to JFK reports and documents menu

Back to JFK menu

Dave Reitzes home page

 

REFERENCES

1. Appendix to Hearings Before the Select Committee on Assassinations of the House of Representatives Ninety-Fifth Congress, Volume VIII, US Government Printing Office, Washington, DC, 1979.

2. James C. Bowles, The Kennedy Assassination Tapes: A Rebuttal to the Acoustical Evidence Theory (copyrighted and unpublished).

3. 3. Hearings Before the President's Commission on the Assassination of President Kennedy, US Government Printing Office, Washington, DC, 1964.

4. Report released December 1, 1980, by the Federal Bureau of Investigation and prepared by the FBI Technical Services Division, Washington, DC, and dated November 19, 1980.

5. Minitab Manual, by Thomas A. Ryan, Jr., Brian J. Jainer, and Barbara F. Ryan, published by Minitab Project, Statistics Department, 215 Pond Laboratory, Pennsylvania State University, University Park, PA 16802.

 

Back to the top

Back

Next

Table of Contents

 

Search this site
 
    powered by FreeFind
 

Back to archive of acoustical evidence

Back to JFK reports and documents menu

Back to JFK menu

Dave Reitzes home page