Statistics - Sample Assignment

Hypothesis Test | Goodness of fit | Anova | Chi square Test of independence | Confidence interval | Sample size | Probability

PROBLEM 1:

Faced with rising fax costs, a firm issued a guideline that transmissions of 10 pages or more should be sent by 2-day mail instead. Exceptions are allowed, but they want the average to be 10 or below. The firm examined 35 randomly chosen fax transmissions during the next year, yielding a sample mean of 14.44 with a standard deviation of 4.45 pages.
(a) At the .01 level of significance, is the true mean greater than 10?
(b) Use Excel to find the right-tail p-value.

Solution

Step 1: Choose the Hypotheses

Null hypothesis:

H0: μ ≤ 10 That is, the true mean is not greater than 10

Alternate hypothesis:

H1: μ < 10 That is, the true mean is greater than 10

Step 2: Specify the Decision Rule

Select a level of significance α = 0.01 For a right-tailed test, we want the right-tail area to be α = .01. The critical value of z that accomplishes this is z.01 = 2.326. The decision rule is Reject H0 if z > 2.326 Otherwise do not reject H0

Step 3: Calculate the Test Statistics

If H0 is true, then the test statistic should be near 0 because should be near μ0.

Step 4: Make the Decision

The test statistic falls in the right rejection region, so we reject the null hypothesis H0: μ ≤ 10 and conclude the alternative hypothesis H1:μ>10 at the 1 percent level of significance. Although the difference is slight, it is statistically significant.

Step 5: Interpretation

We conclude that the true mean is greater than 10

p-Value

To find the p-value for the test statistic z = 5.9028, we use Excel’s function =NORMSDIST(5.9028) to obtain the left-tail area for the cumulative z distribution .
Since P(z < 5.9028) = .9999 the right-tail area is
P(z > 5.9028) = 1 − .9999 = 0.0001

PROBLEM 2:

A sample of 25 concession stand purchases at the October 22 matinee of Bride of Chucky showed a mean purchase of $5.29 with a standard deviation of $3.02. For the October 26 evening showing of the same movie, for a sample of 25 purchases the mean was $5.12 with a standard deviation of $2.14. The means appear to be very close, but not the variances. At α = .05, is there a difference in variances? Show all steps clearly, including an illustration of the decision rule.
(Data are from a project by statistics students Kim Dyer, Amy Pease, and Lyndsey Smith.)

Solution:

Step 1:

Null hypothesis:

H0: α 21= α 22

That is, there is no difference in the variances

Alternate hypothesis:

H1: α 21 ≠ α 22

That is, there is a difference in the variances

Step 2:

Select a level of significance
α = 0.05

Step 3:

Identify the test statistic F = s 21 / s 22

P-value = 0.09811... The following output is obtained from MEGASTAT
F-test for equality of variance
9.1204 variance: 22-Oct
4.5796 variance: 26-Oct
1.99 F
.0981 p-value

Step 4:

Formulate a decision rule

Since the p value is not less than 0.05, we do not reject the null hypothesis The critical value of 'f' is given by 2.27 at 0.05 level of significance Since the calculated value of 'f' is less than the critical value of 'f' so we accept the null hypothesis and conclude that there is no significance between the two variance

Step 5:

Make a decision

Thus we conclude that, there is no difference in the variances

PROBLEM 3:

For many years TV executives used the guideline that 30 percent of the audience was watching each of the prime-time networks and 10 percent were watching cable stations on a weekday night. A random sample of 500 viewers in the Tampa–St. Petersburg, Florida, area last Monday night showed that 165 homes were tuned in to the ABC affiliate, 140 to the CBS affiliate, 125 to the NBC affiliate, and the remainder were viewing a cable station. At the .05 significance level, can we conclude that the guideline is still reasonable?

Solution:

Observed Frequencies: (Oi)
Prime-Time NetworksCable StationTotal
T.V. ChannelsABC affiliateCBS affiliateNBC affiliate
Number of viewersO1 = 165O2 = 140O3 = 125O4 = 70N = 500

According to the guideline, there are 30% of the audiences were watching each of the prime-time networks (ABC, CBS and NBC) and 10% were watching cable stations on a weekday night which are shown in the following table:

Prime-Time NetworksCable StationTotal
T.V. ChannelsABC affiliateCBS affiliateNBC affiliate
Number of viewers30%30%30%10%100%

Claim: The observed frequencies and the expected frequencies do not differ significantly. That is the guideline followed by the TV executives is reasonable.

Hypotheses:

Null Hypothesis:

H0: The observed frequencies and the expected frequencies do not differ significantly.

Alternative hypothesis:

Ha: The observed frequencies and the expected frequencies differ significantly.

Chi-square test Statistic:

x2 = ∑i = [(Oi - Ei)2/Ei] , where Ei = Expected number of viewers watching different channels.

On the assumption that the H0 is true, the expected frequencies can be calculated as follows:

E1 = 30% of 500 = 150
E2 = 30% of 500 = 150
E3 = 30% of 500 = 150
E4 = 10% of 500 = 50

The Chi-square statistic is calculated using the following table:

OEO-E(O-E)^2(O-E)^2 / E
165150152251.5000
140150-101000.6667
125150-256254.1667
70502O4008.0000
500500Total14.3333

From the above table, we have x 2 = 14.333 Thus, the value of the test statistic is = 14.333

Level of Significance:

α = 0.05

Degrees of freedom:

v = n – 1 = 4 – 1 = 3 d.f

Critical Value of Chi-square:

At α = 0.05, for 3 degrees of freedom, the critical value of Chi-square is 7.8147 (By referring Chi-square distribution table)

Decision:

Since the computed value of (14.333) is greater than the critical value of (7.8147), there is enough evidence to reject H0 at 5% level of significance. Hence, we conclude that the guideline followed by the TV executives is not reasonable.

PROBLEM 4:

Given the following sample information, test the hypothesis that the treatment means are equal at the .05 significance level.

Analysis of Variance
Treatment 1- 8, 11, 10
Treatment 2- 3, 2, 1, 3, 2
Treament 3- 3, 4, 5, 4

a. State the null hypothesis and the alternate hypothesis.
b. What is the decision rule?
c. Compute the SST, SSE, and SS Total.
d. Complete an ANOVA table.
e. State your decision regarding the null hypothesis.
f. If Ho is rejected, can we conclude that treatment 1 and treatment 2 differ? Use 95 percent level of confidence.

Solution:
(a) Hypotheses:

Null hypothesis:

H0: μ1 = μ2 = μ3

H0: There is no significant difference between the means of three treatments

Alternative hypothesis:

Ha: There is a significant difference between the means of three treatments

Level of significance:

α = 0.05
(b) Decision Rule:

If the value of test statistic (F) is greater than the critical value of F, then we should take a decision to reject the null hypothesis H0. Suppose if the value of test statistic is less than the critical value of F, then we should take a decision not to reject the null hypothesis H0.

(c) Computation of SST, SSE, and SS Total:

In order test the above hypotheses, we have to use the F-statistic. The calculation of F-statistic based on the SST, SSE, and SS.Total is shown below:
Under standard notations, we have
F =, where MSST = = Mean sum of square for treatment, and MSSE = = Mean sum of square for Error or within treatment

Sum of square for Treatment:

SST = ++ - CF

Sum of square for Total:

SS Total = ++ - CF

SSE (Error Sum Of Square) = SST – SS Total CF = Correction Factor = ,
where N = number of observation = n1 + n2 + n3, n1 = 3, n2 = 5, n3 = 4 T = Sum of all observations = ++

Degrees of freedom:

V= degrees of freedom for Total = N – 1 = 12 – 1 = 11
V1 = degree of freedom between Treatments = 3 – 1 = 2
V2 = degree of freedom for Error = V – V1 = 11 – 2 = 9
Using the following table, we can calculate the value of test statistic:

X1X2X3X11X22X33
8336499
1124121416
1015100125
-34-916
-2--4-
2911162852766

N = number of observation = n1 + n2 + n3 = 3 + 5 + 4 = 12
T = Sum of all observations = ++= 29 + 11 + 16 = 56
CF = Correction Factor = = = 261.33
SST = ++ - 261.33 = 280.333 + 24.2 + 61 – 261.33 = 107.2
SS Total = ++ - CF = 285 +27 + 66 – 261.333 = 116.667
SSE (Error Sum of Square) = SS Total - SST= 116.667 – 107.2 = 9.467
MSST = = = 53.6
MSSE = = = 1.0519
F = = = 50.958

(d) Complete an ANOVA table:

The above values have been presented in the following ANOVA table:

Source of VariationSSdfMSF>F crit
Between Groups107.2253.650.957754.25649473
Within Groups9.46666666791.051852--
------
Total116.666666711---

Thus, the value of test statistic is F = 50.958

Critical value of F:

At α = 0.05 level of significance, for V = (2, 9) degrees of freedom, the critical value of F is given by F0.05, (2, 9) = 4.2565 (By referring F-distribution table)

(e) State your decision regarding the null hypothesis:

If the value of test statistic is greater than the critical value of F, then the null hypothesis is rejected. Otherwise the null hypothesis is not rejected. In this problem, since the value of test statistic (50.958) is greater than the critical value of F (4.2565), there is enough evidence to reject the null hypothesis H0 at 5% level.

(f) If Ho is rejected, can we conclude that treatment 1 and treatment 2 differ? Use 95 percent level of confidence:

Since the value of test statistic is greater than the critical value of F, there is enough evidence to reject the null hypothesis H0 at 5% level of significance. Hence, we can conclude that there is a significant difference between the means of three treatments.

Problem 5:

In a bumper test, three types of autos were deliberately crashed into a barrier at 5 mph, and the resulting damage (in dollars) was estimated. Five test vehicles of each type were crashed, with the results shown below. Research question: Are the mean crash damages the same for these three vehicles?

Crash Damage ($)

GoliathVarmintWeasel
1,600 1,2901,090
7601,402,100
8801,3901,830
1,9501,8501,250
1,2209501,920
Solution:

Null hypothesis:

there is no significant difference in the mean crash damages the same for these three vehicles

Alternate hypothesis:

there is a significant difference in the mean mean crash damages the same for these three vehicles We obtain the following output using the EXCEL

One factor ANOVA
MeannStd. Dev
1,282.0 5496.31Goliath
1,376.0 5321.84 Varmint
1,638.0 5441.78 Weasel
1,432.0 15424.32 Total
ANOVA table
SourceSSdfMSFp-value
Treatment340,360.002170,180.000 0.94.4188
Error2,180,280.00 12181,690.000
Total2,520,640.00 14
 

Level of significance:

α= 5%
Decision Rule:

Reject the null hypothesis if p-value < 0.05

Conclusion:

Since the p-value is greater than 0.05, the null hypothesis is not rejected. Thus we conclude that there is no significant difference in the mean crash damages the same for these three vehicles

Problem 6:

A study regarding the relationship between age and the amount of pressure sales personnel feel in relation to their jobs revealed the following sample information. At the .01 significance level, is there a relationship between job pressure and age?  

Degree of Job Pressure
Age (years)LowMediumHigh
Less than 25201822
25 up to 40504644
40 up to 60586359
60 and older344343

Solution:

Given information:
Observed Frequencies (Oi)
DEGREE OF JOB PRESSURE
AGELOWMEDIUMHIGHTOTAL
< 2520182260
25 - 40504644140
40 - 60586359180
>= 60 344343120
TOTAL162170168500

Claim:

There is relationship between job pressure and age.

Hypotheses:

Null Hypothesis:

H0: There is no relationship between job pressure and age

Alternative hypothesis:

Ha: There is relationship between job pressure and age.

Level of Significance:

α = 0.01

The above hypotheses can be tested by the use of Chi-square distribution as follows:

Test Statistic:

The Chi-square statistic is given below: =, where Ei = Expected number of sales persons feel pressure in their job On the assumption that the H0 is true, the expected frequencies can be calculated as follows: = , where = ith row total, = jth column total, and Grand Total = 235 For instance, from the given observed frequency table, we have E1,1 = = 19.44 Similarly, for all other cells, we have calculated the expected frequencies and are presented in the following table:

Expected Frequencies (Ei)
DEGREE OF JOB PRESSURE
AGELOWMEDIUMHIGHTOTAL
< 2519.4420.420.1660
25 - 4045.3647.647.04140
40 - 6058.3261.260.48180
>= 60 38.8840.840.32120
TOTAL162170168500

Using the given observed frequencies and the above expected frequencies, the value of chi-square is calculated as shown in the following table:

OE(O -E)(O - E)^2(O - E)^2 / E
2019.440.560.31360.016131687
5045.364.6421.52960.474638448
5858.32-0.320.10240.00175583
3438.88-4.8823.81440.612510288
1820.4-2.45.760.28235294
4647.6-1.62.560.05378151
6361.21.83.240.052941176
4340.82.24.840.118627451
2220.1618.43.38560.167936508
4447.04-3.049.24160.196462585
5960.48-1.482.19040.036216931
4340.322.687.18240.178134921
2.191490279

From the above table, we have ∑ i [(Oi - Ei)2/ Ei] = 2.1915

Degrees of freedom:

v = (r – 1) * (c – 1) = (4 – 1) * (3 – 1) = 6 d.f

Critical Value of Chi-square:

At α = 0.01 level, for v = 6 degrees of freedom, the critical value of Chi-square is 16.812 (By referring Chi-square distribution table)

Decision:

Since the computed value of (2.1915) is less than the critical value of (16.812), there is no sufficient evidence to reject the null hypothesis H0 at 1% level of significance. Hence, we conclude that there is no significant relationship between the job pressure and the age.

Problem 7:

A business wants to estimate the true mean annual income of its customers. It randomly samples 200 of its customers. The mean annual income was $52,500 with a standard deviation of $1,800. Find a 95% confidence interval for the true mean annual income of the business’ customers.

Solution:

Here n = 200, Mean x = 52500 , Standard Deviation s = 1800
α = 95%, so Zα = 1.96
Confidence interval = x ± Zα × sn
= 52500 ± 1.96 × (1800/√200)
= 52500 ± 249.4673
= (52500 – 249.4673, 52500 + 249.4673)
= (52250.5327, 52749.4673)
The 95 percent confidence interval is (52250.5327, 52749.4673)

Interpretations:

We are 95 percent confident that the true mean Gas price is within the interval 52250.5327< μ < 52749.4673. There is a 95 percent chance that an interval constructed in this manner contains μ (but a 5 percent chance that it does not).

Problem 8:

A business wants to estimate the true mean annual income of its customers. The business needs to be within $500 of the true mean. The business estimates the true population standard deviation is around $2,300. If the confidence level is 95%, find the required sample size in order to meet the desired accuracy.

Solution:

Here Standard deviation σ = 2300
E = 500
α = 95%, so Zα = 1.96
Now, Sample size n = (σZ/E)2 = [(2300 * 1.96) / 500] = 81.2883
Hence the sample size is 81

Problem 9:

The probability is 1 in 4,000,000 that a single auto trip in the United States will result in a fatality. Over a lifetime, an average U.S. driver takes 50,000 trips. (a) What is the probability of a fatal accident over a lifetime? Explain your reasoning carefully. Hint: Assume independent events. Why might the assumption of independence be violated?

Solution:

We are assuming independent events.
P (A fatal accident over a lifetime) =no of trips over a lifetime, an average U.S. driver takes /4,000,000=50,000/4,000,000=0.0125
Why might the assumption of independence be violated?
Hopefully a person becomes a more careful driver as he gains more experience.
So the probability of having an accident changes with experience.
(b) Why might a driver be tempted not to use a seat belt "just on this trip"?
He might look at the enormously small probability of an accident on this trip and conclude the low probability somehow protects him.

Problem 10:

A certain airplane has two independent alternators to provide electrical power. The probability that a given alternator will fail on a 1-hour flight is .02. What is the probability that (a) both will fail?

Solution:

2nd one will fail
Because both alternators are independent.
The probability that both will fail=.02*.02=0.0004
(b) Neither will fail?
The probability that both neither will fail=0.98*.98=0.9604
(c) One or the other will fail? Show all steps carefully.
The probability that One or the other will fail = probability that 1st one will fail* probability that 2nd one will not fail + probability that 1st one will not fail* probability that 2nd one will fail=0.02*.98+.98*.02= 0.0392

Problem 11:

The diameters of bolts produced by a certain machine are normally distributed with a mean of 0.30 inches and a Standard deviation of 0.01 inches. What percentage of bolts will have a diameter greater than 0.32 inches?

Solution:

Let X denotes the diameter of bolts produced by a certain machine. Which is normally distributed with mean µ and Standard deviation σ
X ~ N(µ,σ)

The standard normal variate Z = [(X - μ) / σ] ~ N(0,1)
Given that,
Mean µ = 0.30 and SD σ = 0.01
To find the percentage of bolts will have a diameter greater than 0.32 inches
i.e. (X > 0.32) = P((X - μ) / σ > (0.32 - 0.30) / 0.01 )
= P(Z > 2)
= P( 0 < Z < ∞ ) – P( 0 < Z < 2)
= 0.5 - 0.4772
P(X > 0.32) = 0.0228
Hence the percentage of bolts will have a diameter greater than 0.32 inches is 2.28%.