That's a difficult question to answer. A simple response is perhaps "more than one". Usually you end up balancing the weight of evidence supporting the therapy against the weight of evidence arguing against it.
Each paper has to be read cautiously and skeptically. Authors often overstate their findings, and sometimes make claims that aren't supported by the data.
As previous posters have said, methodology is key. There's some basic elements you'd like to see in a good trial:
(1) Placebo control. We need to compare the treatment to something else, we need to know that it's better than nothing. If I have 500 patients with pneumonia, and give them antibiotics, some of them will die, and some will get better. If I have 500 patients with pneumonia and give them nothing, some will die, and some will get better. I need to know that more people get better in the group receiving antibiotics.
(2) Random group allocation. Ideally, once we establish that a patient is a candidate for a trial, we want to randomly put them in either the control or treatment group. Hopefully this results in the two groups being similar in nature. We don't want to have sicker patients in one group than the other, otherwise we're going to make that group look worse before we even start treating. (Even if we randomly allocate, we still have to worry about whether the groups end up the same)
(3) Blinding. It's better if the patient, and the people treating the patient don't know whether the patient is receiving placebo or control. Otherwise there's a tendency for bias to creep in, and the patients in the two groups to start getting treated differently.
(4) Prospective design. We want to plan what we're doing before we do it. It's a lot easier to take a mass of data and then backtrack through it with a hypothesis in mind, and then find it. We can do this by intentionally or unintentionally picking criteria for study design that favour the results of our hypothesis. If you state the study design explicitly before you carry out the study, the results carry more weight.
(5) Large / multi-center. The more patient we enroll, the better the statistics get. If we have a lot of patients we can accurately measure a smaller difference, and we can be more certain that what we see reflects a real change. If we can do it in multiple different locations, each with slightly different practices, and patient demographics, it suggests that the results of our study might be applicable to other regions.
Sometimes it's not possible to placebo control, e.g. CPR versus no CPR wouldn't be ethical. Or it's not possible to blind the patient or caregivers, e.g. mechanical CPR versus manual CPR. Or we have to use a historical control group -- and then we risk seeing a change in outcome due to a change in other variables over time. These studies are lower quality, and can't be considered to be as important as better designed trials. Lower down on the scale you have "case series", e.g. reports of a dozen patients who received a given therapy, or "case reports", a report a single patient. On the lowest rung you have "expert opinion", the personal beliefs of someone who has a reputation for being knowledgeable in a given field. Even if we have a fairly weak trial, we're going to consider it more highly than the opinion of any one individual.
I would also add as a last point, that a major weakness in a lot of research surrounding emergency and prehospital medicine is a lack of appropriate outcomes. If I do a study that shows a new drug given during CPR improves the number of people arriving at the hospital with the pulse, it might seem reasonable to think that this translate into more people walking out of the hospital at 60 days, or more people alive in a year. But we can't make that leap. If we want to study cardiac arrest survival then we need to actually look at the number of survivors at an appropriate endpoint.