[SIZE=3]Introduction[/SIZE]
Power analysis is the name given to the process for determining the sample size for a research study. The technical definition of power is that it is the probability of detecting a “true” effect when it exists. Many students think that there is a simple formula for determining sample size for every research situation. However, the reality it that there are many research situations that are so complex that they almost defy rational power analysis. In most cases, power analysis involves a number of simplifying assumptions, in order to make the problem tractable, and running the analyses numerous times with different variations to cover all of the contingencies.
In this unit we will try to illustrate the power analysis process using a simple four group design.
[SIZE=3]Description of the Experiment[/SIZE]
We wish to conduct a study in the area of mathematics education involving different teaching methods to improve standardized math scores in local classrooms. The study will include four different teaching methods and use fourth grade students who are randomly sampled from a large urban school district and are then random assigned to the four different teaching methods.
Here are the four different teaching methods which will be examined: 1) The traditional teaching method where the classroom teacher explains the concepts and assigns homework problems from the textbook; 2) the intensive practice method, in which students fill out additional work sheets both before and after school; 3) the computer assisted method, in which students learn math concepts and skills from using various computer based math learning programs; and, 4) the peer assistance learning method, which pairs each fourth grader with a fifth grader who helps them learn the concepts followed by the student teaching the same material to another student in their group.
Students will stay in their math learning groups for an entire academic year. At the end of the Spring semester all students will take the Multiple Math Proficiency Inventory (MMPI). This standardized test has a mean for fourth graders of 550 with a standard deviation of 80.
The experiment is designed so that each of the four groups will have the same sample size. One of the important questions we need to answer in designing the study is, how many students will be needed in each group?
[SIZE=3]The Power Analysis[/SIZE]
In order to answer this question, we will need to make some assumptions and some educated guesses about the data. First, we will assume that the standard deviation for each of the four groups will be equal and will be equal to the national value of 80. Further, we expect, because of prior research, that the traditional teaching group (Group 1) will have the lowest mean score and that the peer assistance group (Group 4) will have the highest mean score on the MMPI. In fact, we expect that Group 1 will have a mean of 550 and that Group 4 will have mean that is greater by 1.2 standard deviations, i.e., the mean will equal at least 646. For the sake of simplicity, we will assume that the means of the other two groups will be equal to the grand mean.
We will make use of the Stata program fpower (findit fpower) (see How can I use the findit command to search for programs and get additional help? for more information about using findit) to do the power analysis. The fpower program needs the following information in order to do the power analysis: 1) the number of levels (or groups), 2) the effect size (called delta), and 3) the alpha level. As stated above, there are four groups, a=4. We will set alpha = 0.05, and we will compute the effect size, delta = (largest_mean - smallest_mean)/standard_deviation. Hence, delta = (646-550)/80 = 1.2 . The standard deviation we use is the pooled within-group standard deviation, i.e., the square root of the mean square error for the anova table.
[ul]
fpower, a(4) delta(1.2) alpha(0.05)
a = 4 b = 1 c = 1 r = 1 rho = 0 delta = 1.2
nobs power
2 .0906746
3 .1438119
4 .2013958
5 .2614601
6 .3224192
7 .3829314
8 .4419005
9 .49847
10 .5520059
12 .6484047
14 .7294912
16 .795521
18 .8478578
20 .8884002
25 .9512783
30 .9800673
35 .9922693
40 .9971333
45 .998977
50 .9996469
100 1
[/ul]
The table above shows that we can achieve a power of 0.8 with between 16 and 18 students per group. Let’s call it 17 students, just for the sake of argument. We can attempt to verify these numbers using a Monte Carlo simulation program simpower (findit simpower) (see How can I use the findit command to search for programs and get additional help? for more information about using findit). The grand mean for the other two groups is found by (550+646)/2 = 598.
[ul]
simpower, gr(4) n(17 17 17 17) mu(550 598 598 646) s(80 80 80 80)
Sample Sizes, Means and Standard Deviations
N1 = 17 MU1 = 550 S1 = 80
N2 = 17 MU2 = 598 S2 = 80
N3 = 17 MU3 = 598 S3 = 80
N4 = 17 MU4 = 646 S4 = 80
1000 simulated ANOVA F tests
Alpha Simulated
Level Power
0.1000 0.8840
0.0750 0.8510
0.0500 0.8070
0.0250 0.7300
0.0100 0.5930
[/ul]
The Monte Carlo results from simpower are consistent with the results from the fpower program.
While 17 students per group sound like a fine number of subjects if everything works out as planned, we should consider what would occur if things do not work out as planned. Let’s say that the treatment effect is not a large 1.2 but a more modest .75.
[ul]
fpower, a(4) delta(0.75) alpha(0.05)
a = 4 b = 1 c = 1 r = 1 rho = 0 delta = .75
nobs power
2 .0654313
3 .0840352
4 .1035826
5 .1239748
6 .1451255
7 .1669355
8 .1893014
9 .2121201
10 .2352911
12 .282309
14 .3296447
16 .3766765
18 .422875
20 .4678013
25 .5724329
30 .663641
35 .7402725
40 .8027472
45 .8524114
50 .8910493
100 .9969381
[/ul]
Now, it looks like we will need around 40 students per group to achieve a power of 0.8. Again, we will check these results versus simpower. Now if delta = 0.75 then we can compute the higher mean by 0.75*80+550 = 610. The grand mean will be (550+610)/2 = 580.
[ul]
simpower, gr(4) n(40 40 40 40) mu(550 580 580 610) s(80 80 80 80)
Sample Sizes, Means and Standard Deviations
N1 = 40 MU1 = 550 S1 = 80
N2 = 40 MU2 = 580 S2 = 80
N3 = 40 MU3 = 580 S3 = 80
N4 = 40 MU4 = 610 S4 = 80
1000 simulated ANOVA F tests
Alpha Simulated
Level Power
0.1000 0.8790
0.0750 0.8540
0.0500 0.8170
0.0250 0.7290
0.0100 0.6020
[/ul]
We will run an additional simpower in which we let the standard deviations increase along with the group means (not an uncommon occurrence).
[ul]
simpower, gr(4) n(40 40 40 40) mu(550 580 580 610) s(80 90 90 100)
Sample Sizes, Means and Standard Deviations
N1 = 40 MU1 = 550 S1 = 80
N2 = 40 MU2 = 580 S2 = 90
N3 = 40 MU3 = 580 S3 = 90
N4 = 40 MU4 = 610 S4 = 100
1000 simulated ANOVA F tests
Alpha Simulated
Level Power
0.1000 0.7880
0.0750 0.7550
0.0500 0.6920
0.0250 0.5740
0.0100 0.4520
[/ul]
It now looks like 40 students per groups is not quite enough. Let’s try it with 50 students.
[ul]
simpower, gr(4) n(50 50 50 50) mu(550 580 580 610) s(80 90 90 100)
Sample Sizes, Means and Standard Deviations
N1 = 50 MU1 = 550 S1 = 80
N2 = 50 MU2 = 580 S2 = 90
N3 = 50 MU3 = 580 S3 = 90
N4 = 50 MU4 = 610 S4 = 100
1000 simulated ANOVA F tests
Alpha Simulated
Level Power
0.1000 0.8750
0.0750 0.8450
0.0500 0.7920
0.0250 0.7160
0.0100 0.5860
[/ul]
This is pretty close to a power of 0.8. The effect size of 0.75 is considered moderate. Finally, just to be safe, we should see what sample size would be needed if the there was a small effect size of, say, 0.25.
[ul]
fpower, a(4) delta(0.25) alpha(0.05)
a = 4 b = 1 c = 1 r = 1 rho = 0 delta = .25
nobs power
2 .0516819
3 .05358
4 .0554754
5 .057375
6 .0592837
7 .0612038
8 .0631365
9 .0650824
10 .0670419
12 .0710018
14 .0750164
16 .079085
18 .0832068
20 .0873805
25 .0980343
30 .1089857
35 .1202144
40 .1317
45 .1434223
50 .1553612
100 .2825522
[/ul]
A power of 0.8 is not even on the chart. Using simpower indicates that an N of about 380 per group is needed to obtain a power of 0.8 when the effect size is 0.25.
Here are the sample sizes per group that we have come up with in our power analysis: 17 (best case scenario), 40 (medium effect size), 50 (medium effect size with a fudge factor), and 380 (almost the worst case scenario). Even though we expect a large effect, we will shoot for a sample size of between 40 and 50. For one thing, it is all that our research budget will allow and the school district won’t allow us to use more than 200 students total.
Asandi,