J.B.
You argue that the 1983 OTA report "states quite clearly that polygraph does show a better than chance ability to detect deception." And you cite the following from Chapter 7 of the report:
Quote:| The preponderance of research evidence does indicate that, when the control question technique is used in specific-incident criminal investigations, the polygraph detects deception at a rate better than chance, but with error rates that could be considered significant. |
As a preliminary matter, note that this statement by the OTA is not inconsistent with my statement that CQT polygraphy has not been proven by peer-reviewed scientific research to differentiate between truth and deception at better than chance levels of accuracy under field conditions. The OTA relied on both field studies and analog (laboratory) studies. Of the field studies,
only two appeared in a peer-reviewed scientific journal:
Bersh, P. J. "A Validation Study of Polygraph Examiner Judgments,"
Journal of Applied Psychology, 53:399-403, 1969.
Horvath, F. S., "The Effect of Selected Variables on Interpretation of Polygraph Records,"
Journal of Applied Psychology, 62:127-136, 1977.
(By the way, the FAS website does not include the OTA report's list of references. You'll find it in the
PDF version available on Princeton University's Woodrow Wilson School of Public and International Affairs website.)
Bersh's study involved both the Zone [of] Comparison "Test" (a form of probable-lie "Control" Question "Test") and the General Question "Test" (a form of the Relevant/Irrelevant technique). The polygraphers used "global" scoring, that is, they reached their determinations of guilt or innocence based not only on the charts, but also on their clinical impression or "gut feeling" regarding the subject.The decision of a panel of judges (four Judge Advocate General attorneys) was used as "ground truth." Assuming the panel's judgement to be correct, the OTA report notes that the polygraphers' determinations were (overall) 70.6% correct with guilty subjects and 80% correct with innocent subjects.
David T. Lykken provides an insightful commentary on Bersh's study at pp. 104-106 of the 2nd edition of
A Tremor in the Blood: Uses and Abuses of the Lie Detector. Because the discussion we are having of polygraph validity is an important one, I will cite Lykken's treatment of Bersh's study here in full for the benefit of those who do not have ready access to
A Tremor in the Blood (which now seems to be out of print):
Quote:
Validity of the Clinical Lie Test
In view of the millions of clinical lie tests that have been administered to date, it is surprising that only one serious investigation of the validity of this method has been published, Bersh's 1969 Army study.[reference deleted] Bersh wanted to assess the average accuracy of typical Army polygraphers who routinely administered clinically evaluated lie "tests" to military personnel suspected of criminal acts. He obtained a representative sample of 323 such cases on which the original examiner had rendered a global diagnosis of truthful or deceptive. The completed case files were then given to a panel of experienced Army attorneys who were asked to study them unhindered by technical rules of evidence and to decide which of the suspects they believed had been guilty and which innocent. The four judges discarded 80 cases in which they felt there was insufficient evidence to permit a confident decision. On the remaining 243 cases, the panel reached unanimous agreement on 157, split three-to-one on another 59, and were deadlocked on 27 cases. Using the panel's judgment as his criterion of ground truth, Bersh then compared the prior judgments of the polygraphers against this criterion. When the panel was unanimous, the polygraphers' diagnosis agreed with the panel's verdict on 92% of the cases. When the panel was split three-to-one, the agreement fell to 75%. On the 107 cases where the panel had divided two-to-two or had withheld judgment, no criterion was of course available.
Bersh himself pointed out that we cannot tell what role if any the actual polygraph results played in producing this level of agreement. In another part of that same Defense Department study, polygraphers like those Bersh investigated were required to "blindly" rescore one another's polygraph charts in order to estimate polygraph reliability. The agreement was better than chance but very low. As these Army examiners then operated (they have since converted to the Backster method [of numerical scoring], which is more reliable), chart scoring was conducted so unreliably that we can be sure that Bersh's examiners could not have obtained much of their accuracy from the polygraphs: validity is limited by unreliability. But, although these findings are a poor advertisement for the polygraph itself, can they at least indicate the average accuracy of a trained examiner in judging the credibility of a respondent in the relatively standardized setting of a polygraph examination?
Bersh's examiners based their diagnoses in part on clinical impressions or behavior symptoms, which, we know from the evidence mentioned above, should not have permitted an accuracy much better than chance. But they also had available to them at the time of testing whatever information was then present in that suspect's case file: the evidence then known against him, his own alibi, his past disciplinary record, and so on. In other words, the polygraphers based their diagnoses in part on some portion of the same case facts that the four panel judges used in reaching their criterion decision. This contamination is the chief difficulty with the Bersh study. When his judges were in unanimous agreement, it was presumably because the evidence was especially persuasive, an "open-and-shut case." It may be that much of that same convincing evidence was also available to the polygraphers, helping them to attain that 92% agreement. When the evidence was less clear-cut and the panel disagreed three-to-one among themselves, the evidence may also have been similarly less persuasive when the lie tests were administered--and so the polygrapher's agreement with the panel dropped to 75% (note that the average panel member also agreed with the majority 75% of the time). An extreme example of this contamination involves the fact that an unspecified number of the guilty suspects confessed at the time of the examination. Because the exams were clinically evaluated, we can be sure that every test that led to a confession was scored as deceptive. Since confessions were reported to the panel, we can be sure also that the criterion judgment was always guilty in these same cases. Thus, every lie test that produced a confession was inevitably counted as an accurate test, although, of course, such cases do not predict at all whether the polygrapher would have been correct absent the confession. That the polygraph test frequently produces a confession is its most valuable characteristic to the criminal investigator, but the occurrence of a confession tells us nothing about the accuracy of the test itself.
Thus, the one available study of the accuracy of the clinical lie test is fatally compromised. Because of the contamination discussed above, the agreement achieved when the criterion panel was unanimous is clearly an overestimate of how accurate such examiners could be in the typical run of cases. When the panel split three-to-one, then at least we know that there was no confession during the lie test or some other conclusive evidence available to both the panel and the examiner. The agreement achieved on this subgroup was 75%, equal to the panel judges' agreement among themselves. As we have seen, Bersh's examiners could not have improved much on their clinical and evidentiary judgments by referring to their unreliable polygraphs. |
As Lykken makes clear,
Bersh's study does little to support the validity of CQT polygraphy.The second peer-reviewed field study cited in the OTA report is that by Horvath. In this study, confessions were used as the criterion for ground truth. In Horvath's study, 77% of the guilty and 51% of the innocent were correctly classified, for a mean accuracy of 64%.
Lykken again provides cogent commentary regarding Horvath's study (as well as a later peer-reviewed field study conducted by Kleinmuntz and Szucko). The following is an excerpt from pp. 133-34 of
A Tremor in the Blood (2nd ed.):
Quote:| The studies by Horvath and by Kleinmuntz and Szucko both used confession-verified CQT charts obtained respectively from a police agency and the Reid polygraph firm in Chicago. The original examiners in these cases, all of whom used the Reid clinical lie test technique, did not rely only on the polygraph results in reaching their diagnoses but also employed the case facts and their clinical appraisal of the subject's behavior during testing. Therefore, some suspects who failed the CQT and confessed were likely to have been judged deceptive and interrogated based primarily on the case facts and their demeanor during the polygraph examination, leaving open the possibility that their charts may or may not by themselves have indicated deception. Moreover, some other suspects, judged truthful using global criteria, could have produced charts indicative of deception. That is, the original examiners in these cases were led to doubt these suspects' guilt in part regardless of the evidence in the charts and proceeded to interrogate an alternative suspect in the same case who thereupon confessed. For these reasons, some undetermined number of the confessions that were criterial in these two studies were likely to be relatively independent of the polygraph results, revealing some of the guilty suspects who "failed" it.... |
Again, Horvath's study (and for that matter, that of Kleinmuntz & Szucko) does little to support the validity of CQT polygraphy.In
A Tremor in the Blood, Lykken addresses three peer-reviewed field studies that post-date the OTA review. I won't address those studies individually for the time being, but I think it's fair to say that
the available peer-reviewed research has not proven that CQT polygraphy works at better than chance levels of accuracy under field conditions.Do you disagree? If so, why? What peer-reviewed field research proves that CQT polygraphy works better than chance? And just how valid does that research prove it to be?
The other statement I've made (and you've noted) is that because CQT polygraphy lacks both standardization and control, it can have no validity. You'll find that explained in more detail in Chapter 1 of
The Lie Behind the Lie Detector. I'll be happy to discuss it further, but before I do, I would ask whether you disagree with me regarding this, and if so, why?