Statistics - Sample Assignment
Hypothesis Test | Goodness of fit | Anova | Chi square Test of independence | Confidence interval | Sample size | ProbabilityPROBLEM 1:
Faced with rising fax costs, a firm issued a guideline that transmissions of 10 pages or more should be sent by 2-day mail instead. Exceptions are allowed, but they want the average to be 10 or below. The firm examined 35 randomly chosen fax transmissions during the next year, yielding a sample mean of 14.44 with a standard deviation of 4.45 pages.
(a) At the .01 level of significance, is the true mean greater than 10?
(b) Use Excel to find the right-tail p-value.
Solution
Null hypothesis:
H0: μ ≤ 10 That is, the true mean is not greater than 10
Alternate hypothesis:
H1: μ < 10 That is, the true mean is greater than 10
Select a level of significance α = 0.01 For a right-tailed test, we want the right-tail area to be α = .01. The critical value of z that accomplishes this is z.01 = 2.326. The decision rule is Reject H0 if z > 2.326 Otherwise do not reject H0
If H0 is true, then the test statistic should be near 0 because should be near μ0.
The test statistic falls in the right rejection region, so we reject the null hypothesis H0: μ ≤ 10 and conclude the alternative hypothesis H1:μ>10 at the 1 percent level of significance. Although the difference is slight, it is statistically significant.
We conclude that the true mean is greater than 10
p-Value
To find the p-value for the test statistic z = 5.9028, we use Excel’s function =NORMSDIST(5.9028) to obtain the left-tail area for the cumulative z distribution .
Since P(z < 5.9028) = .9999 the right-tail area is
P(z > 5.9028) = 1 − .9999 = 0.0001
PROBLEM 2:
A sample of 25 concession stand purchases at the October 22 matinee of Bride of Chucky showed a mean purchase of $5.29 with a standard deviation of $3.02. For the October 26 evening showing of the same movie, for a sample of 25 purchases the mean was $5.12 with a standard deviation of $2.14. The means appear to be very close, but not the variances. At α = .05, is there a difference in variances? Show all steps clearly, including an illustration of the decision rule.
(Data are from a project by statistics students Kim Dyer, Amy Pease, and Lyndsey Smith.)
Solution:
Null hypothesis:
H0: α 21= α 22That is, there is no difference in the variances
Alternate hypothesis:
H1: α 21 ≠ α 22That is, there is a difference in the variances
Select a level of significance
α = 0.05
Identify the test statistic F = s 21 / s 22
P-value = 0.09811...
The following output is obtained from MEGASTAT
F-test for equality of variance
9.1204 variance: 22-Oct
4.5796 variance: 26-Oct
1.99 F
.0981 p-value
Formulate a decision rule
Since the p value is not less than 0.05, we do not reject the null hypothesis The critical value of 'f' is given by 2.27 at 0.05 level of significance Since the calculated value of 'f' is less than the critical value of 'f' so we accept the null hypothesis and conclude that there is no significance between the two variance
Make a decision
Thus we conclude that, there is no difference in the variances
PROBLEM 3:
For many years TV executives used the guideline that 30 percent of the audience was watching each of the prime-time networks and 10 percent were watching cable stations on a weekday night. A random sample of 500 viewers in the Tampa–St. Petersburg, Florida, area last Monday night showed that 165 homes were tuned in to the ABC affiliate, 140 to the CBS affiliate, 125 to the NBC affiliate, and the remainder were viewing a cable station. At the .05 significance level, can we conclude that the guideline is still reasonable?
Solution:
| Prime-Time Networks | Cable Station | Total | |||
|---|---|---|---|---|---|
| T.V. Channels | ABC affiliate | CBS affiliate | NBC affiliate | ||
| Number of viewers | O1 = 165 | O2 = 140 | O3 = 125 | O4 = 70 | N = 500 |
According to the guideline, there are 30% of the audiences were watching each of the prime-time networks (ABC, CBS and NBC) and 10% were watching cable stations on a weekday night which are shown in the following table:
| Prime-Time Networks | Cable Station | Total | |||
|---|---|---|---|---|---|
| T.V. Channels | ABC affiliate | CBS affiliate | NBC affiliate | ||
| Number of viewers | 30% | 30% | 30% | 10% | 100% |
Claim: The observed frequencies and the expected frequencies do not differ significantly. That is the guideline followed by the TV executives is reasonable.
Null Hypothesis:
H0: The observed frequencies and the expected frequencies do not differ significantly.
Alternative hypothesis:
Ha: The observed frequencies and the expected frequencies differ significantly.
Chi-square test Statistic:
x2 = ∑i = [(Oi - Ei)2/Ei] , where Ei = Expected number of viewers watching different channels.
On the assumption that the H0 is true, the expected frequencies can be calculated as follows:
E1 = 30% of 500 = 150
E2 = 30% of 500 = 150
E3 = 30% of 500 = 150
E4 = 10% of 500 = 50
The Chi-square statistic is calculated using the following table:
| O | E | O-E | (O-E)^2 | (O-E)^2 / E |
|---|---|---|---|---|
| 165 | 150 | 15 | 225 | 1.5000 |
| 140 | 150 | -10 | 100 | 0.6667 |
| 125 | 150 | -25 | 625 | 4.1667 |
| 70 | 50 | 2O | 400 | 8.0000 |
| 500 | 500 | Total | 14.3333 |
From the above table, we have x 2 = 14.333 Thus, the value of the test statistic is = 14.333
Level of Significance:
α = 0.05
Degrees of freedom:
v = n – 1 = 4 – 1 = 3 d.f
Critical Value of Chi-square:
At α = 0.05, for 3 degrees of freedom, the critical value of Chi-square is 7.8147 (By referring Chi-square distribution table)
Since the computed value of (14.333) is greater than the critical value of (7.8147), there is enough evidence to reject H0 at 5% level of significance. Hence, we conclude that the guideline followed by the TV executives is not reasonable.
PROBLEM 4:
Given the following sample information, test the hypothesis that the treatment means are equal at the .05 significance level.
Analysis of Variance
Treatment 1- 8, 11, 10
Treatment 2- 3, 2, 1, 3, 2
Treament 3- 3, 4, 5, 4
a. State the null hypothesis and the alternate hypothesis.
b. What is the decision rule?
c. Compute the SST, SSE, and SS Total.
d. Complete an ANOVA table.
e. State your decision regarding the null hypothesis.
f. If Ho is rejected, can we conclude that treatment 1 and treatment 2 differ? Use 95 percent level of confidence.
Null hypothesis:
H0: μ1 = μ2 = μ3H0: There is no significant difference between the means of three treatments
Alternative hypothesis:
Ha: There is a significant difference between the means of three treatments
Level of significance:
α = 0.05If the value of test statistic (F) is greater than the critical value of F, then we should take a decision to reject the null hypothesis H0. Suppose if the value of test statistic is less than the critical value of F, then we should take a decision not to reject the null hypothesis H0.
In order test the above hypotheses, we have to use the F-statistic. The calculation of F-statistic based on the SST, SSE, and SS.Total is shown below:
Under standard notations, we have
F =, where MSST = = Mean sum of square for treatment, and
MSSE = = Mean sum of square for Error or within treatment
Sum of square for Treatment:
SST = ++ - CFSum of square for Total:
SS Total = ++ - CFSSE (Error Sum Of Square) = SST – SS Total
CF = Correction Factor = ,
where N = number of observation = n1 + n2 + n3,
n1 = 3, n2 = 5, n3 = 4
T = Sum of all observations = ++
Degrees of freedom:
V= degrees of freedom for Total = N – 1 = 12 – 1 = 11
V1 = degree of freedom between Treatments = 3 – 1 = 2
V2 = degree of freedom for Error = V – V1 = 11 – 2 = 9
Using the following table, we can calculate the value of test statistic:
| X1 | X2 | X3 | X11 | X22 | X33 |
|---|---|---|---|---|---|
| 8 | 3 | 3 | 64 | 9 | 9 |
| 11 | 2 | 4 | 121 | 4 | 16 |
| 10 | 1 | 5 | 100 | 1 | 25 |
| - | 3 | 4 | - | 9 | 16 |
| - | 2 | - | - | 4 | - |
| 29 | 11 | 16 | 285 | 27 | 66 |
N = number of observation = n1 + n2 + n3 = 3 + 5 + 4 = 12
T = Sum of all observations = ++= 29 + 11 + 16 = 56
CF = Correction Factor = = = 261.33
SST = ++ - 261.33 = 280.333 + 24.2 + 61 – 261.33 = 107.2
SS Total = ++ - CF = 285 +27 + 66 – 261.333 = 116.667
SSE (Error Sum of Square) = SS Total - SST= 116.667 – 107.2 = 9.467
MSST = = = 53.6
MSSE = = = 1.0519
F = = = 50.958
The above values have been presented in the following ANOVA table:
| Source of Variation | SS | df | MS | F> | F crit |
|---|---|---|---|---|---|
| Between Groups | 107.2 | 2 | 53.6 | 50.95775 | 4.25649473 |
| Within Groups | 9.466666667 | 9 | 1.051852 | - | - |
| - | - | - | - | - | - |
| Total | 116.6666667 | 11 | - | - | - |
Thus, the value of test statistic is F = 50.958
Critical value of F:
At α = 0.05 level of significance, for V = (2, 9) degrees of freedom, the critical value of F is given by F0.05, (2, 9) = 4.2565 (By referring F-distribution table)
If the value of test statistic is greater than the critical value of F, then the null hypothesis is rejected. Otherwise the null hypothesis is not rejected. In this problem, since the value of test statistic (50.958) is greater than the critical value of F (4.2565), there is enough evidence to reject the null hypothesis H0 at 5% level.
Since the value of test statistic is greater than the critical value of F, there is enough evidence to reject the null hypothesis H0 at 5% level of significance. Hence, we can conclude that there is a significant difference between the means of three treatments.
Problem 5:
In a bumper test, three types of autos were deliberately crashed into a barrier at 5 mph, and the resulting damage (in dollars) was estimated. Five test vehicles of each type were crashed, with the results shown below. Research question: Are the mean crash damages the same for these three vehicles?
Crash Damage ($)
| Goliath | Varmint | Weasel |
|---|---|---|
| 1,600 | 1,290 | 1,090 |
| 760 | 1,40 | 2,100 |
| 880 | 1,390 | 1,830 |
| 1,950 | 1,850 | 1,250 |
| 1,220 | 950 | 1,920 |
Null hypothesis:
there is no significant difference in the mean crash damages the same for these three vehicles
Alternate hypothesis:
there is a significant difference in the mean mean crash damages the same for these three vehicles We obtain the following output using the EXCEL
| One factor ANOVA | |||||
|---|---|---|---|---|---|
| Mean | n | Std. Dev | |||
| 1,282.0 | 5 | 496.31 | Goliath | ||
| 1,376.0 | 5 | 321.84 | Varmint | ||
| 1,638.0 | 5 | 441.78 | Weasel | ||
| 1,432.0 | 15 | 424.32 | Total | ||
| ANOVA table | |||||
| Source | SS | df | MS | F | p-value |
| Treatment | 340,360.00 | 2 | 170,180.000 | 0.94 | .4188 |
| Error | 2,180,280.00 | 12 | 181,690.000 | ||
| Total | 2,520,640.00 | 14 | |||
Level of significance:
α= 5%Reject the null hypothesis if p-value < 0.05
Since the p-value is greater than 0.05, the null hypothesis is not rejected. Thus we conclude that there is no significant difference in the mean crash damages the same for these three vehicles
Problem 6:
A study regarding the relationship between age and the amount of pressure sales personnel feel in relation to their jobs revealed the following sample information. At the .01 significance level, is there a relationship between job pressure and age?
| Degree of Job Pressure | |||
|---|---|---|---|
| Age (years) | Low | Medium | High |
| Less than 25 | 20 | 18 | 22 |
| 25 up to 40 | 50 | 46 | 44 |
| 40 up to 60 | 58 | 63 | 59 |
| 60 and older | 34 | 43 | 43 |
Solution:
| Observed Frequencies (Oi) | ||||
|---|---|---|---|---|
| DEGREE OF JOB PRESSURE | ||||
| AGE | LOW | MEDIUM | HIGH | TOTAL |
| < 25 | 20 | 18 | 22 | 60 |
| 25 - 40 | 50 | 46 | 44 | 140 |
| 40 - 60 | 58 | 63 | 59 | 180 |
| >= 60 | 34 | 43 | 43 | 120 |
| TOTAL | 162 | 170 | 168 | 500 |
Claim:
There is relationship between job pressure and age.
Null Hypothesis:
H0: There is no relationship between job pressure and ageAlternative hypothesis:
Ha: There is relationship between job pressure and age.Level of Significance:
α = 0.01The above hypotheses can be tested by the use of Chi-square distribution as follows:
Test Statistic:
The Chi-square statistic is given below: =, where Ei = Expected number of sales persons feel pressure in their job On the assumption that the H0 is true, the expected frequencies can be calculated as follows: = , where = ith row total, = jth column total, and Grand Total = 235 For instance, from the given observed frequency table, we have E1,1 = = 19.44 Similarly, for all other cells, we have calculated the expected frequencies and are presented in the following table:
| Expected Frequencies (Ei) | ||||
|---|---|---|---|---|
| DEGREE OF JOB PRESSURE | ||||
| AGE | LOW | MEDIUM | HIGH | TOTAL |
| < 25 | 19.44 | 20.4 | 20.16 | 60 |
| 25 - 40 | 45.36 | 47.6 | 47.04 | 140 |
| 40 - 60 | 58.32 | 61.2 | 60.48 | 180 |
| >= 60 | 38.88 | 40.8 | 40.32 | 120 |
| TOTAL | 162 | 170 | 168 | 500 |
Using the given observed frequencies and the above expected frequencies, the value of chi-square is calculated as shown in the following table:
| O | E | (O -E) | (O - E)^2 | (O - E)^2 / E |
|---|---|---|---|---|
| 20 | 19.44 | 0.56 | 0.3136 | 0.016131687 |
| 50 | 45.36 | 4.64 | 21.5296 | 0.474638448 |
| 58 | 58.32 | -0.32 | 0.1024 | 0.00175583 |
| 34 | 38.88 | -4.88 | 23.8144 | 0.612510288 |
| 18 | 20.4 | -2.4 | 5.76 | 0.28235294 |
| 46 | 47.6 | -1.6 | 2.56 | 0.05378151 |
| 63 | 61.2 | 1.8 | 3.24 | 0.052941176 |
| 43 | 40.8 | 2.2 | 4.84 | 0.118627451 |
| 22 | 20.16 | 18.4 | 3.3856 | 0.167936508 |
| 44 | 47.04 | -3.04 | 9.2416 | 0.196462585 |
| 59 | 60.48 | -1.48 | 2.1904 | 0.036216931 |
| 43 | 40.32 | 2.68 | 7.1824 | 0.178134921 |
| 2.191490279 |
From the above table, we have ∑ i [(Oi - Ei)2/ Ei] = 2.1915
Degrees of freedom:
v = (r – 1) * (c – 1) = (4 – 1) * (3 – 1) = 6 d.fCritical Value of Chi-square:
At α = 0.01 level, for v = 6 degrees of freedom, the critical value of Chi-square is 16.812 (By referring Chi-square distribution table)
Since the computed value of (2.1915) is less than the critical value of (16.812), there is no sufficient evidence to reject the null hypothesis H0 at 1% level of significance. Hence, we conclude that there is no significant relationship between the job pressure and the age.
Problem 7:
A business wants to estimate the true mean annual income of its customers. It randomly samples 200 of its customers. The mean annual income was $52,500 with a standard deviation of $1,800. Find a 95% confidence interval for the true mean annual income of the business’ customers.
Solution:
Here n = 200, Mean x = 52500 , Standard Deviation s = 1800
α = 95%, so Zα = 1.96
Confidence interval = x ± Zα × sn
= 52500 ± 1.96 × (1800/√200)
= 52500 ± 249.4673
= (52500 – 249.4673, 52500 + 249.4673)
= (52250.5327, 52749.4673)
The 95 percent confidence interval is (52250.5327, 52749.4673)
Interpretations:
We are 95 percent confident that the true mean Gas price is within the interval 52250.5327< μ < 52749.4673. There is a 95 percent chance that an interval constructed in this manner contains μ (but a 5 percent chance that it does not).
Problem 8:
A business wants to estimate the true mean annual income of its customers. The business needs to be within $500 of the true mean. The business estimates the true population standard deviation is around $2,300. If the confidence level is 95%, find the required sample size in order to meet the desired accuracy.
Solution:
Here Standard deviation σ = 2300
E = 500
α = 95%, so Zα = 1.96
Now, Sample size n = (σZ/E)2 = [(2300 * 1.96) / 500] = 81.2883
Hence the sample size is 81
Problem 9:
The probability is 1 in 4,000,000 that a single auto trip in the United States will result in a fatality. Over a lifetime, an average U.S. driver takes 50,000 trips. (a) What is the probability of a fatal accident over a lifetime? Explain your reasoning carefully. Hint: Assume independent events. Why might the assumption of independence be violated?
Solution:
We are assuming independent events.
P (A fatal accident over a lifetime) =no of trips over a lifetime, an average U.S. driver takes /4,000,000=50,000/4,000,000=0.0125
Why might the assumption of independence be violated?
Hopefully a person becomes a more careful driver as he gains more experience.
So the probability of having an accident changes with experience.
(b) Why might a driver be tempted not to use a seat belt "just on this trip"?
He might look at the enormously small probability of an accident on this trip
and conclude the low probability somehow protects him.
Problem 10:
A certain airplane has two independent alternators to provide electrical power. The probability that a given alternator will fail on a 1-hour flight is .02. What is the probability that (a) both will fail?
Solution:
2nd one will fail
Because both alternators are independent.
The probability that both will fail=.02*.02=0.0004
(b) Neither will fail?
The probability that both neither will fail=0.98*.98=0.9604
(c) One or the other will fail? Show all steps carefully.
The probability that One or the other will fail = probability that 1st one will fail* probability that 2nd one will not fail + probability that 1st one will not fail* probability that 2nd one will fail=0.02*.98+.98*.02= 0.0392
Problem 11:
The diameters of bolts produced by a certain machine are normally distributed with a mean of 0.30 inches and a Standard deviation of 0.01 inches. What percentage of bolts will have a diameter greater than 0.32 inches?
Solution:
Let X denotes the diameter of bolts produced by a certain machine. Which is normally distributed with mean µ and Standard deviation σ
X ~ N(µ,σ)
The standard normal variate Z = [(X - μ) / σ] ~ N(0,1)
Given that,
Mean µ = 0.30 and SD σ = 0.01
To find the percentage of bolts will have a diameter greater than 0.32 inches
i.e. (X > 0.32) = P((X - μ) / σ > (0.32 - 0.30) / 0.01 )
= P(Z > 2)
= P( 0 < Z < ∞ ) – P( 0 < Z < 2)
= 0.5 - 0.4772
P(X > 0.32) = 0.0228
Hence the percentage of bolts will have a diameter greater than 0.32 inches is 2.28%.

