Generalizability is overrated .




The absence of evidence

Here is an all-too-familiar scenario: patient asks psychiatrist about some sort of sort of supplement or alternative treatment that hasn’t been advertised in Journal of Clinical Psychiatry, psychiatrist shrugs and says “we don’t have any good evidence to support that” (which is probably true) and then says to herself “god damn, not another one of these patients.”

The patient will probably end up leaving with a prescription for Latuda instead (unless they’re poor, in which case it will be generic quetiapine), be upset, and not fill the prescription. Or they might start Latuda, get fat, and then be upset (“But you said there was no weight gain!?!”). Either way, now the patient hates psychiatry, and they only accept medical advice from sketchy blogs. On behalf of sketchy blogs everywhere, I’d like to thank Sunovion Pharmaceuticals for another reader.

If only we had good randomized controlled trials to tell us that hot knifing CBD oil is a good anxiolytic, or that Ashwagandha is a good antidepressant, or that Bacopa monnieri helps ADHD, or whatever it is patients are interested in, then maybe some of these unpleasant Latuda debacles could be avoided.

Seduced by whores

Scientific knowledge accumulates slowly, and any progress you make will be punctuated by multiple disappointments along the way. Science will flirt with you, tease you, like your Instagram posts, and call you when it’s drunk and needs a ride, but it won’t put out unless you work for it for a long ass time. The Internet, on the other hand, has an endless and supply of pseudo-knowledge to give away. It will fuck you right here right now and whisper in your ear whatever sweet nothings you want to hear. Sometimes you’ll have to pay, but most of the time you’ll just have to be nice to it for awhile.

“Your psychiatrist was soo out of line for asking you to do a drug test! They would never believe your marijuana is medicinal anyways.”

“I’m not a doctor, but it definitely sound like your depression was caused by systemic Candida, silver deficiency, and a lack of adequate hydration. I’m not surprised your psychiatrist tried to give your an addictive drug, you know they get paid by Pharma.”

Still waiting for that Pharma paycheck.

Clinical trials are like antidepressants: helpful in some situations, useless in others

The goal of a clinical trial is to make an inference about the effect of a drug in a certain patient population. To do that, we measure the average drug effect in a sample of people drawn from that population, usually in comparison to the effect of something else, like a placebo. At the end of it all we’re left with the average effect of the medication in our sample.

We might see the desired effect, or it might have the opposite effect, or there may be no effect, or the effect might be really small and not statistically significant. In the past we could just keep recruiting more and more people until a really small effect became statistically significant, but now that trial preregistration is a thing, we have to wait for someone to meta-analyze all the small effects to show us the same small effect with a larger sample and a narrower confidence interval. Now the ineffective treatment has Level 1 supportive evidence.

Inferential statistics have their use in psychiatry. They do, presumably, give you a somewhat reliable estimate of the average drug effect in the population you sampled from (unfortunately, often not the population the drug is being prescribed to). And when it comes to comparing two treatments at the population (sample) level, and side effect profiles, then clinical trials are a good source of information (although you may have to tunnel your way through some bullshit first).

Unfortunately clinical trials fail miserably if we’re relying on them to tell us anything about what a drug will do, or is doing, for individual patients. You would have to seriously misunderstand clinical trial statistics to think that clinical trial/meta-analysis effect sizes apply uniformly to everyone who receives a medication. They don’t. Irvine Kirsch lied to you (and everyone else). Some people do better than average, some people do worse. And if you consider multiple continuous variables, there might not have even been anyone who experienced exactly the “average” effect across the board.

If I have learned one thing from prescribing drugs, taking drugs, and talking to people who take drugs, it’s this: Psychoactive substances reliably produce a wide range of effects depending on the person who is taking them. It’s not to hard to imagine how clinical trials that only consider group effects miss this.

Putting theory into action, theoretically

Does marijuana calm you down? Helps you relax and unwind? Mellows you out? Well it makes me anxious and paranoid, and I always have a terrible time at concerts. I’m not sure why it took me 15 years to figure this out, because it happens every god damn time (unless I’m also a good 7/10 drunk, or take Valium first).

Let’s pretend you’re a researcher doing a 2-week trial with 50 participants to see if CBD improves anxiety compared to a placebo. After the trial, you find that 20% were considerably less anxious, 10% got a bit less anxious, 30% got considerably more anxious, and 40% were about the same. You see something similar in the placebo group: 30% got better, 30% got worse, and 30% didn’t change that much.

You look at the mean HAM-A change scores between groups and see that CBD only produced a very small improvement relative to placebo, say 3.1 points. The difference wasn’t statistically significant. Now, if you were Sunovion Pharmaceuticals, you would have recruited 400 participants, not 50, and the difference would have been statistically significant. But you’re not, so you’re forced to conclude there is no benefit of CBD on anxiety.

You consider the possibility that CBD might just effect people differently, and because some people’s anxiety got worse, the effect size in the “responders” is not accurately represented by the average effect for everyone. The problem with this conclusion though is that you saw something similar to the drug effect in the placebo group, a variation on the rule of thirds. So was this treatment effect heterogeneity, or just a placebo effect?

Let’s say you decide to repeat the trial, add a two-week cross-over phase, and then repeat it all over again (8 weeks). You look at the data and see the same pattern of results for each of the individual trials, and it’s more or less the rule of thirds with only a small, nonsignificant effect size for CBD versus placebo.

“Fuck research, this is bullshit!”

But you still don’t want to give up. You really need this publication on your CV. So you go in “post-hoc” and look at each person’s data individually. God you must be desperate.

As it turns out, most of the people in the CBD group who had their anxiety improve noticed improvements both times they were getting CBD, and more anxious closer to their baseline when they were getting placebo. And most of the people who got more anxious with CBD also got more anxious the second time they were on CBD, and closer to baseline in the placebo condition. And the people who initially got placebo experienced the same consistency in their individual responses each time they were switched to CBD and then back again.

Congratulations, you just found some reasonable evidence of treatment effect heterogeneity for CBD on anxiety, and you did it by inadvertently conducting 50 N-of-1 trials at the same time. Unfortunately you won’t be getting a promotion since you chose to study study CBD instead of Abilify.

Forrest Gump vs. Pulp Fiction

Clinical trials are crowd pleasers. Everyone walks out of the theater reasonably satisfied. N-of-1 trials won’t placate the masses, but you’ll know for sure if this one was for you.

An N-of-1 trial is basically a randomized blinded controlled trial with one participant. This person is randomized to get either the treatment or the control intervention, and then they switch to the other, and the process is repeated a few times.

If you can get your doctor on board, compounding pharmacies will make you a placebo that matches your medication. Yes, you’ll have to pay. But a placebo is only necessary if you want to rule out a placebo effect. Some people have ethical concerns about prescribing a new medication for the first time if that medication has serious side effects. If it seems safer, you can always start with an open-label drug run-in phase.

An N-of-1 trial is what you do when you want to try something that has not been studied, or if your patient is already on something but there’s uncertainty about the effects. In the last scenario the “effect” doesn’t have to be a treatment effect, it could be anything, and in this sense is relevant to even very well-researched medications (e.g., is bupropion making your patient’s migraines worse? We know it causes headaches as a side effect, but is it making this patient’s headaches worse?)

That’s the beauty of it: assuming a few general conditions apply, you can test even the most far-out ideas and patient observations that will probably never get studied in a large trial (e.g., does melatonin at night make Adderall more effective the next day? Does theanine improve Adderall-induced bruxism? Does zinc increase the effect of Adderall after tolerance has developed?)

N-of-trials work best when the intervention is expected to take effect quickly (e.g., within a couple days), doesn’t have strong carryover/withdrawal effects, and won’t be obviously unblinded by side effects. This makes various supplements/vitamins/nootropics great candidates for N-of-1 trials, especially since these are also the things that usually haven’t been rigorously marketed I mean researched.

The problem in question should be relatively stable (e.g., trait anxiety, ADHD, dysthymia) as opposed to something that is temporary (e.g., major depression) or non-existent (e.g., mania prophylaxis). If these conditions aren’t satisfied it doesn’t make it impossible, but you might have to consider an active control, tapers, longer trial arms (e.g., 1-2 months vs. 1-2 weeks), etc.

You have to have some sort of system to measure whatever it is you’re interested in. The main reason doctors and patients end up in a scenarios where they don’ know what a medication is is actually doing is because they didn’t measure outcomes, or have appropriate follow-up, or because they made too many fucking med changes at once. Weekly questionnaires work good. Smartphone tracking outcomes with several measurements a day gives you more accurate and nuanced information.

You also need baseline measurements. These will come in handy if your patient starts spending too much time with internet whores and then suddenly decides their medications have actually been making them worse this whole time.

Every time you get prescribed a medication or start a new treatment you should be treating this as an N-of-1 trial. No, you don’t necessarily need to be blinded and have a placebo. But you need to have baseline measurements and a systematic way of prospectively measuring effects. How else are you going to know what’s happening? Retrospective recall every 3 months? Most people can’t remember what they had for breakfast 2 days ago.

“Oh that’s cute, the doctor is playing scientist.”

Fuck you you piece of shit. This is what happens when all the psychiatry researchers stop doing psychiatry research and just keep writing about the same pre-existing research over and over, in other words, scientist playing literary critic (or blogger).

N-of-1 trials are not useless as a research endeavor. They aren’t only for clinical decision making. You can pool the results from a large number of N-of-1 trials to get an estimate of the average effect in the population. If the trials were conducted in actual practices with real patients then this estimate will be exponentially more useful that whatever was found in the industry-sponsored trials.

Conversely, a billion fucking randomized controlled trials and meta-analyses of randomized controlled trials can’t tell you if a medication is, or will be, effective and tolerable on an individual level. For this you’ll actually have to do your job, in a systematic way, and follow up with your patients.

This is how you make it happen.

Step 1. Present idea to current physician.

Step 2. Accept rejection with grace, try another physician.

Step 3. Repeat Steps 1 and 2 until you’ve found a physician who’s willing to play ball.

Step 4. Decide intervention, outcomes, time-frame, etc.

Step 5. Procure treatment. Begin monitoring baseline outcomes. Weekly questionnaires will work if you can’t do anything more technical than this.

Step 6. Pay pharmacy to create a placebo, and randomize into an A-B-A-B-A-B… schedule. Ideally both of you are blinded from the beginning. Somewhere there should be a record of whether A or B was placebo, you’ll need to look at this after.

Step 7. Treatment begins. Follow-up during the trial should be enough to ensure safety, depending on the intervention.

Step 8. Treatment ends. Hopefully nothing terrible happened.

Step 9. Doctor breaks blinding, tallies scores for each time period, and looks at the results. No inferential statistics necessary (usually).

Step 10. Doctor and patient meet to review results, discuss side effects, decide if the effect was useful, if the intervention is worth maintaining, etc.

Here’s an example of what the results might look like, using CBD for generalized anxiety disorder (Hamilton anxiety scores in each box).


In this case B was placebo, so it seems like CBD was somewhat effective at reducing your anxiety. You just barely avoided Latuda… for now.