Tuesday 16 July 2013

What is Power and How to Calculate Sample Size?

What is Power and How Do We Calculate Sample Size?


One of the favourite sections of the Statistics for Non-Statisticians Course is the section on Sample Size and Power. If you've never done sample sizing yourself before or would just like a little more insight into how it is done then read on.

In a protocol you usually see a sample size/power statement that is often scanned over with minimal interest apart from the final value that states how many subjects are needed in the trial. If we look at this in more detail the components are fairly straight forward. These are Assumptions, Power, Significance, Withdrawals/Non-evaluable subjects and the final number being how many subjects. Let's pick these up one by one and explain what they all mean.

Assumptions: 


In order to perform sample size calculations the first thing that is required is a set of assumptions. The major assumption is what the actual 'real underlying' results are, i.e. what the outcomes to the treatments would be if an entire population was included in the trial. For simple binary endpoint data (e.g. response/no response type) this would be written like the following, e.g. The expected response rate for Treatment A is 65% and the expected response rate for Treatment B is 40%.

The assumptions (in our case 65% and 40%) would come from previous trials and publications. These are usually easier to find for the comparator product than the test product.

Significance Level:


This is usually set to be 5%. In simple terms it is the predefined value that the p-value at the end of the trial would need to be lower than to declare a significant result. We'll discuss p-values in a future Blog.

Power:


If the assumptions are correct (in our case the real response rates are 65% and 40%) we want the trial to have a good chance of giving us a positive result (i.e. statistical significance). The percentage chance of the trial giving us a significant result if our assumptions are correct is called POWER. Most trials have either 80% or 90% power. 

If a trial has 80% power then, if the assumptions are correct, this still has a 20% (or 1 in 5 chance) of failure. This 20% is referred to as type II error. If a trial has 90% power then the chance of failure is 1 in 10. Therefore moving from 80% power to 90% power does not sound a big change but in effect it is halving the chance of failure (from 20% to 10%).

At this point it is worth noting that the power that is stated in the protocol is only true if the assumptions are correct. If the assumptions are not correct the power will be different as demonstrated later in this Blog.

Calculations


Once you have the assumptions, significance level and desired power level the next thing to do is to calculate the sample size. This requires a sample size calculator. These calculators are reasonably straightforward, the only complicating factor is that there are several of them. The reason that there are different ones is that each data type and each comparison type require a different formula (and hence calculator). In our example we are looking at detecting a difference between two groups for a binary outcome variable so we would use the appropriate calculator. The link below will take you to this:

http://www.pharmaschool.co/size5.asp

In our example we would include 65% in box (a), 40% in box (b), 5% significance level in box (c), and for 80% power we would put 80 in box (d). If we set the withdrawal rate to be 0 and click Calculate Sample Size this gives us 59 a sample size requirement of 59 subjects per group and a total of 118 subjects.


Withdrawal/Non-evaluable Rate


In most trials there will be a number of subjects who do not complete the trial or for some reason are classed as non-evaluable. The initial sample size that is calculated is the number of subjects who are required for the final analysis, therefore this needs to be increased to allow for subjects who are excluded from the analysis. This can be done with the sample size calculator by adding a % of withdrawals into box (e).

What if the initial assumptions are incorrect?


The stated power will be incorrect, this maybe higher or lower. Let us look at an example:

With the assumption of response rates of 65% vs 40% our trial required 118 subjects.

If the actual response rates were, say, 55% vs 40% (so the test drug is less effective than assumed) then for 80% power our trial would require 170 subjects per group, 340 in total. This is nearly 3 times more than initially assumed. (Replace the 65 in box (a) with 55 to calculate this)

If the trial had gone ahead with 118 subjects and the real response rates were 55% and 40% the trial would not have had the stated 80% power and instead would have had 38% power. This equates to a more than 6 in 10 chance of failure. 

Conclusion


The power stated in the protocol is dependent on the assumptions. Any change from the assumptions will affect the power. A power of 80% means that even if the assumptions hold there is still a 1 in 5 chance that the trial will fail to give a positive result. If the assumptions do not hold then it is likely that the chance of failure will increase as the power will be reduced and therefore when looking at 'failed' trial results it is always a very sensible approach to go back and look at the power of the trial, the assumptions that were made, the withdrawal rate and see whether the results are similar to these or vastly different. Trials fail sometimes through bad design and lack of proper sample sizing, not because the drug is ineffective.

I've simplified this by using binary data for illustrative purposes. There are additional assumptions required when using continuous data or time to event data that I can discuss in future blogs if the interest is there.

For more sample size calculators visit:  http://www.pharmaschool.co/size.asp

Please feel free to post thoughts and opinions to this Blog and suggest future topics.

Adrian M Parrott BSc MSc MBA

Adrian is the PharmaSchool Subject Expert for Statistics and delivers the "Statistics for Non-Statisticians" Course. The Course has been delivered across the World for a wide variety of Pharma, Healthcare, CRO, Medical Communications and Academic Organisations in locations ranging from UK, US, Europe, India, China, UAE, South Africa, Thailand.

e: adrian.parrott@pharmaschool.co

www.pharmaschool.co



Reporting Risk in Clinical Trial Results

I am frequently asked to explain the difference between Absolute Risk, Relative Risk, Absolute Risk Reduction and Relative Risk Reduction. So here goes.....

Reading the newspaper this morning I was interested to see the headline:

"Among women who took tamoxifen for ten years, 25 per cent fewer had recurrences of breast cancer and 23 per cent fewer died, compared to those who took the drug for just five years."

"Drug can HALVE risk of breast cancer returning: Patients taking tamoxifen for 10 years instead of five 'better protected and less likely to die"

These headlines certainly grab the attention of the reader. Reading further into the articles it becomes apparent that we have a situation which would have actually gone against the guidance from the Association of the British Pharmaceutical Industry regarding advertising and promotion.

This states: "Reference to absolute risk and relative risk. Referring only to relative risk, especially with regard to risk reduction, can make a medicine appear more effective than it actually is. In order to assess the clinical impact of an outcome, the reader also needs to know the absolute risk involved. In that regard relative risk should never be referred to without also referring to the absolute risk. Absolute risk can be referred to in isolation"

So what are absolute risks, relative risks and relative risk reductions. Let me explain by way of a simple example.

Take a two group clinical trial, groups A and B. The outcome we are looking at is disease recurrence, so by the end of the trial the patients are classified as having disease recurrence or not after being on treatment for 2 years.

In Group A: 250 patients out of 404 showed disease recurrence
in Group B: 350 patients out of 402 showed disease recurrence

Absolute Risk can be looked at as the % chance of having disease recurrence in a group. So:

Absolute Risk of Disease Recurrence in A is 250/404 = 61.9%
Absolute Risk of Disease Recurrence in B is 350/402 = 87.1%

Absolute Risk Reduction is simply the difference in these %s, 87.1-61.9 = 25.2%

Relative Risk is the measure that describes the chance of observing disease recurrence in one group COMPARED to the other. It is actually the ratio of Absolute Risks.

In this example the Relative Risk is therefore: 61.9%/87.1% = 0.71

What does this mean? Well one interpretation is that the risk of disease recurrence in group A is only 71% that of group B. This is sometimes a little hard to visualise and therefore the measure that is used is Relative Risk Reduction (RRR). This is a measure of the reduced chance of disease recurrence in group A compared to B. RRR is simple to calculate as it is 1-RR.

In our example RRR = 1-0.71 = 0.29 (or 29%).

This is interpreted as the chance (or risk) of disease recurrence in group A is 29% lower than Group B. Which then leads to the statement of a 29% reduced risk.

If we then looked at a second example with the same disease recurrence measure.

In Group A: 20 patients out of 300 showed disease recurrence
in Group B: 28 patients out of 301 showed disease recurrence

If you work out the Absolute Risks they are A: 20/300 = 6.6%, B: 28/301 = 9.3%

If you work out the Absolute Risk Reduction 9.3-6.6=2.7%

If you work out the Relative Risk this is 6.6/9.3=0.71 which therefore leads to a Relative Risk Reduction of 1-0.71=0.29 (or 29%).

Interestingly that based on the Relative Risk Reduction both of the examples give the same answer, yet when looking at the Absolute Risks (or Absolute Risk Reduction) we see that the results are completely different.

It is now sensible to look back at the ABPI Guidelines and note that they require absolute risks to be presented if relative risks (or risk reductions) are presented. This allows the reader/prescriber/patient to better understand the results. A 29% reduction in relative risk may be a minimal increase in absolute risk benefit to a patient and this can be easily eroded when safety profiles, adverse events, quality of life considerations are taken into account. Whereas a 29% reduction in relative risk with a large benefit in absolute risk may be a much better result.

Going back to the Tamoxifen example. The figure presented is clearly relative and there is no clarification as to what the absolute risks are. I suspect that the we are looking at absolute risk per group in the region of 20% so the absolute risk reduction is around the 5% mark relating to a 25% relative reduction.

I expect that the Tamoxifen examples used a slightly different approach to getting the results rather than my simplistic examples above, e.g. time to event analysis and Hazard Ratios, but the same underlying questions have to be raised when looking at such media reports. Any result presenting a relative risk reduction need to be put into context by presenting the absolute risk to the patient.


Adrian M Parrott

Adrian is the PharmaSchool Subject Expert for Statistics and delivers the "Statistics for Non-Statisticians" Course. The Course has been delivered across the World for a wide variety of Pharma, Healthcare, CRO, Medical Communications and Academic Organisations in locations ranging from UK, US, Europe, India, China, UAE, South Africa, Thailand.
e: adrian.parrott@pharmaschool.co
www.pharmaschool.co