The Scientific Appraisal of Psychotherapy…and of Neurofeedback

by Siegfried Othmer | December 1st, 2005

The November 5 issue of Science News features an article by one of the regular contributors, Bruce Bower, on the encroachment of the Evidence-based Medicine juggernaut onto psychotherapy and its practitioners. The impetus was a policy statement on the subject issued by the American Psychological Association back in August. Despite the finality of a pronouncement of policy, the issue seems far from being put to rest. There does not appear to be much unanimity even within the committee of the APA which propounded the document about how such research on efficacy of psychotherapy should proceed, and whether weight should be given to clinical judgment along with more quantifiable scientific findings. Basic disagreements about the operative models survive: “Do treatments cure diseases, or do relationships heal people?” asks John Norcross, a member of the committee.

When it comes to testability for efficacy, neurofeedback would certainly seem to be in a favored position with respect to psychotherapy, for example. Why should we not relish this extraordinary opportunity to put some daylight between our methods and those that are less easily reducible to a procedure? It’s the arbiters of reimbursement who are pushing these standards, and if we meet them, do we not win? Perhaps. But then why is the hair on the back of my neck standing up in anticipation of such a win? Isn’t the problem that acceptance of neurofeedback on these terms would mean a kind of domestication that would rob it of its wild and boundary-breaking essence? Neurofeedback does not fit well in the above dichotomy: It is neither a treatment that cures disease, nor is it a mere appendage to talk therapy.

Neurofeedback needs to find a third way. It is not treatment; nor is it reducible to a therapy. It is, however, about relationships–both inside the brain and out. It is about health and functionality more than about disease and canonical disorders. It is not diagnostically specific, nor should it cease with success in symptom suppression.

Laws have been passed hereabouts that put mental health on a par with physical well-being with respect to reimbursement. This does not appear to have mattered in any respect whatsoever “where the rubber meets the road.” I can only conclude that the third-party payers are not fundamentally interested in mental health outcome. This is the same phenomenon writ large that has allowed the health field basically to ignore autism. It is a mental illness. Leave it alone. Now perhaps that attitude simply reflects the understanding that there is not much that can be accomplished in “curing” mental illness. In that case a select new set of “facts” should be able to turn that around. Perhaps.

More likely, it would be a matter of looking to the fox to solve the problem of foxes and chickens. It is simply not possible that insurance companies are currently unaware of the potential benefit of neurofeedback for many of the chronic conditions that cost them a lot of money. Their problem is that once that door is opened the demand for neurofeedback would quickly mushroom out of all bounds, and that is why the lid has to be kept on the potential powder keg.

When Barry Sterman first started promoting the “SKIL” program for QEEG analysis, he made the case for its superiority over existing programs on three grounds. The first had to do with inclusion of time-of-day data in the analysis; the second had to do with the fact that the data needed to be logarithmically converted in order to achieve more Gaussian distributions; and third may have had to do with the handling of artifacts. When brief epochs are cut out of the data stream because of corruption with artifacts, the consequences to the rest of the data may not be negligible. New transients are introduced by the excisions. Additionally, the altering of the time-line corrupts coherence analyses at very low frequency.

The point of the retelling is that when insurance companies got wind of what Sterman was claiming, they were all interested. His phone started ringing. Now it would have been nice if this interest had been driven by the desire for more precise characterization of the head-injured patient. Alas, however, their only interest in Sterman’s claims was in the opportunity they presented for bashing other QEEG analysis programs in court. Their preferred outcome would have been mutual annihilation among all of the QEEG players, including SKIL.

So it is clear that the insurance companies were aware of what was occurring in this obscure corner of the world, and when it was in their interest to do so they showed up. Their interest, however, was antithetical to that of their clients. The same is likely true of neurofeedback—the insurance companies cannot be unaware of what is happening in our field. In short, we are consorting with foxes. I am reminded of a phrase lawyers commonly use in their briefs: they “pray for relief” to the judge. To hope that insurance companies will change their ways is to “pray for relief” before stone gods.

Now the death struggle that the insurance companies hoped to unleash in the QEEG field raises the question of whether all the mutual hostility that prevails within neurofeedback is purely a natural phenomenon, or whether it too is being abetted by forces behind the scenes. All that is necessary to put the entire enterprise into stress is to raise the bar on the level of evidence required for proof. Mutual hostility will then be assured among the players. Unsurprisingly, the other major interest group here, big PhRMA, has a well-defined interest in making the world safe for pharmacology, or, equivalently, to purge it of competitive threats. By insisting on its own methods of proof as being definitive, psychodynamic alternatives start out with a significant handicap.

It is probably a waste of time and scarce resources for us to jump through the hoops of research via randomized controlled trials just to make nice with the decision makers at insurance companies. We do have an advantage over pure psychotherapy in that we have a procedure that could in principle be tested like drugs. But we should not allow neurofeedback to be reduced to a mere procedure. Once we are down that road, there is no back-tracking. Neurofeedback is not just the better Ritalin. So we are in fact fellow-travelers with the psychodynamic practitioners in this regard. Neurofeedback, at its best, should be performed in a setting where the psychodynamic dimension of disorders can be addressed as well. When tested in research, it should be tested at its best, not stripped of the environment in which it flourishes.

So we have reason to be interested in how the psychodynamic practitioners are handling the challenge of evidence-based medicine. Suggested Drew Westen of Emory University: “Researchers need to use clinical practice as a natural laboratory to learn about psychological interventions worth testing more rigorously.” (The quotation is from the article; these are not Westen’s exact words.) If researchers actually showed up to watch neurofeedback in real life, they would alter their research methodology profoundly.

Bruce Wampold of the University of Wisconsin suggests that the quality of the working relationship plays a larger role than which particular technique is employed. Hence in conventional research procedures “[t]he methodological tail is wagging the therapeutic dog.” The RCT design is simply unsuitable for studying human interactions. The essential elements cannot be controlled.

According to the article, Brent Slife of Brigham Young “filed the equivalent of a philosophical antitrust suit against psychotherapy researchers.” He lamented the “almost dogmatic status” of the philosophy of empiricism in assessing psychotherapy. In my own view, it is reminiscent of the logical positivism movement a hundred years ago, and of the constraints imposed by behaviorism on psychological research early on. Both limited the kinds of questions that could legitimately be asked, and that in turn limited the kinds of answers that would be attended to.

So what is the answer for our field? It is to grow in such a manner that there is no creeping dependency on third-party reimbursement, where such flawed perspectives are in vogue. We accept reimbursement, of course, but we should do so largely on our terms. We must always have one foot outside of the reimbursement regime so that we are in a position to say, “No thank you. I am not going to agree to do six sessions of neurofeedback for this condition. I am not going to agree to X dollars per session.”

Further, the answer for us is to insist on research methodology that reflects actual clinical complexity. We really do need to reduce the level of internal hostility within our field, and we do so by not signing on to inappropriate research designs and unattainable research criteria by which nearly everyone falls short and things end up in mutual recrimination. In particular, we have to take issue with the presumption that group studies are the only way to establish new techniques. That standard neatly annihilates most of the work that has been done in this field over thirty years. By implicitly accepting that standard, we allow the obtuseness of mainstream thinking to be reframed as our deficiency and our burden! We don’t need to buy into such a flawed standard. Charles Darwin established the entire field of evolution on the basis of a mere handful of observations. When you are breaking new ground, a handful may be enough to point the way.

That said, it was in fact group studies that kicked things off (Sterman’s and Lubar’s). How many more do you need? If somebody wants to fund or do group studies, they will not find us standing in the way. But the absence of group studies does not mean the absence of firm knowledge.

Ultimately the health care field has to come to us, not the other way around. In the interim, we just have to move toward greater efficacy, greater efficiency, and greater affordability as best we can. Ours is already the most cost-effective as well as the most efficacious of all mental health disciplines. But we have more to do along those same lines. Finally, we have to move toward greater solidarity in what is, after all, a common cause.

Junk DNA

Only a few short years ago, we had collectively declared 90% of our precious DNA to be junk. DNA was about coding for proteins; that involved only 10% of the DNA; the rest was a wasteland. How wrong we were. The junk DNA, it turns out, has something to do with regulation of protein synthesis, and this role consumes 90% of DNA real estate. Could that be telling us something about the importance of the regulatory regime in our biological makeup? Could it be that the problems of regulation in healthcare generally may also amount to a figurative 90% of all health care issues?

If we reframe psychopharmacology as a technique of re-regulation rather than as one of fixing neuromodulator deficits we are probably closer to the truth of the matter. If we then lump all of the conditions against which either psychopharmacology or biofeedback/neurofeedback can be usefully deployed, we cover the whole range of mental health and a lot of somatic health issues as well. Attention Deficit Disorder can be seen as a deficit in the regulation of our attentional networks. Most crises of the heart show up as problems of regulation. Parkinsonism can be looked at as a problem in the regulation of movement. Allergies and autoimmune disease can be reframed as a problem in the down-regulation of our immune system. We could go on and on.

The problem is that health issues have not been framed in those terms. The medical model has compelled problems of dynamic regulation to be forced into a mode of thinking in terms of static deficits. We are in for a huge paradigm shift all across the board, and neurofeedback just happens to be at the head of the phalanx at the moment, getting bloodied. This re-conceptualization also has implications for how research must be done. It is the technique itself that will mandate how it must be studied, not the other way around. The RCT paradigm has stability as an operative assumption; similarly, it assumes the problem to be largely reducible to a single dominant variable, or at most a few. The technique is not suitable to the study of highly interacting dynamic systems of many dimensions, systems that interact on many timescales.

Allocation Concealment

I just ran across an article in an old Scientific American (May, 1996) in which researchers looked at the problem of “allocation concealment” in RCTs. Turns out that there is considerable skullduggery going on here. MDs signing up to participate in clinical studies may end up biasing things by arranging to usher certain promising candidates into the treatment arm. They may be aware, for example, that people showing up on odd-numbered days will be placed in the treatment arm, so they may arrange their schedule accordingly.

The study of allocation bias, which was published in the Journal of the AMA, found more than half of the studies they looked at subject to allocation bias, and these turned out to be 30% higher in apparent efficacy than those where the bias was not present. That 30% is large enough to account fully for the differential between placebo and drug response for SSRI anti-depressants. Hmmmm.

Null Science

The principal driver toward group studies is that they facilitate the empirical gold standard, which is null hypothesis testing. But the null hypothesis testing fetish has also been subject to some criticism. I just unearthed an old Science News article where this topic was discussed. Geoffrey Loftus, then editor of the journal Memory and Cognition, saw “a research landscape dotted with dense stands of conflicting data that strangle theoretical advances at their roots….This conceptual muddle…reflects a deeply flawed approach to doing science.” (The quotations are the words of the author of the article, Bruce Bower, paraphrasing Loftus.)

Said John Richters, then head of the disruptive disorders program at the NIMH: “Even the brightest people use empirical research only to keep their careers going. When I talk to them in private they express much more sophisticated views about mental functioning than what you see in their published reports.”

Geoffrey Loftus again: “Social scientists have embraced null hypothesis tests because they provide the illusion of objectivity.” “But such objectivity is not sufficient for insight. [It provides] only the illusion of insight, which is worse than providing no insight at all.”

Gerd Gigerenzer, of the Max Planck Institute for Psychological Research in Munich, weighed in with a rather severe indictment of the whole enterprise: “Null hypotheses are set up in an extremely mechanical way reminiscent of compulsive hand washing.” “There is widespread anxiety surrounding the exercise of informed personal judgment in matters of hypothesis testing. It’s a question of intellectual morality.”

It almost appears as if null hypothesis testing represents a kind of dumbing down of the science of psychology. The technique is good enough to identify certain correlations, but in other ways it is not sufficiently discriminating: it does not necessarily point to a path forward. At the same time, the existence of this standard means that one “cannot pass Go” without having done homage. And the dominance of the standard means the neglect of other, far more discerning statistical methods. But Gigerenzer goes even further: “We should ban the ritualistic mindless use of statistics, whether it revolves around significance testing or any other technique.”

My own view on this is that null hypothesis testing against a null procedure (the placebo) is a game not worth the candle. Proving neurofeedback against a placebo is just not a very high hurdle. I am reminded of the old New England custom of bundling boards. Said an adventuresome betrothed, “If you can’t even climb over that bundling board to see me, then you aren’t going to amount to much as a husband.” It is a test, but it is a minimal test indeed.

Jerome Kagan of Harvard can see a role for significance testing when it comes to something like proving the existence of extrasensory perception. “When you are theoretically barren, you rush to statistical methodology,” Kagan says. “If you have a powerful theory that predicts something of importance, then you don’t need significance testing.”

Gigerenzer sounds a similar theme on the importance of theory: “Data without theory have a low life expectancy, like a baby without a parent.” That was of course the problem with Sterman’s and Lubar’s early studies. They could not be swaddled in a congenial model. The early hypotheses were of necessity limited and ad-hoc. This is no longer the case, but the bottleneck remains the theoretical one. Once our critics open their minds to the model, they too will no longer be interested in significance testing. On the other hand, no amount of significance testing will open their minds to the model.

So there we are. We do have a powerful theory, and it makes powerful predictions. So let us be about the business of testing a strong hypothesis, not a weak one. If the majority of problems that walk in our door can be conceptualized as problems of regulation (in their physiological dimension), and if we have a technique that enhances regulatory function broadly, then a strong hypothesis is that ‘every client who comes for neurofeedback and stays for twenty sessions should derive significant, measurable benefit for his exertions.’ Now we will fall somewhat short of 100% success, but this at least is a hypothesis worthy of our efforts.

Fringe benefits are that we will be the masters of our own fate in the conduct of this work. No one can tell us what to do in the clinic and how to do it. Researchers will only be permitted to judge the outcome. At the outset, our clinic will be our “closed laboratory,” where we cannot be second-guessed, and where anything goes. The target is disregulation; the remedy is self-regulation, by whatever means it may be achieved.

One of the key challenges I see in the field of psychology is that in the adversarial culture that comes from not having a single dominant paradigm the instrumentalities of power are largely negative. Various authorities have enough power to undermine other viewpoints, more so than that they have power to establish something unique and positive on their own. Energies therefore go in the negative direction. We have to recognize this and simply adopt a posture of virtual immunity to outside criticism. We know what we have to do. We need some time and space to do it. We invite your participation.

In furtherance of this objective, we have built a website for the efficient tracking of symptoms in neurofeedback clients. It is EEG Expert, at We invite you to take a look.


Bruce Bower, “Questions on the Couch; Researchers spar over how best to evaluate psychotherapy,” Science News, 168, 299-301, November 5, 2005

Paul Wallich, “Not so blind, after all,” Scientific American, May 1996, 20-22

Bruce Bower, “Null Science, Psychology’s statistical status quo draws fire,” Science News, 151, 356-7, June 7, 1997

Leave a Reply