##General Import Statements to Process Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from math import *
%matplotlib inline
df = pd.read_csv('stroopdata.csv')
Independent variable: Word Conditions (Congruent or Incongruent)
Dependent variable: Time to complete test (In Seconds)
Explanation
#Sepeatring Data into two Variables for Easier Anaylysis
cog = df['Congruent']
in_cog = df['Incongruent']
Specify your null and alternative hypotheses, and clearly define any notation used. Justify your choices
The Null hypothesis is that there is no difference for average viewing mean response time for congruent words vs viewing the incongruent words.
The Alternative hypothesis There is a significant difference (negative or positive) for mean response time for congruent words vs viewing the incongruent words.
Alternative Explanation -- Post Review
Ho - Null Hypothesis: ( μi - μc = 0 ) There is no significant difference in the population average response time in viewing the congruent(c) words vs viewing the incongruent(i) words.
Ha - Alternative Hypothesis: ( μi - μc ≠0 ) There is a significant difference, positive or negative, in the population average response times.
The Dependent Samples t-Test is the appropriate statistical test as the same subjects are assigned two different conditions.
The different conditions are dependent because by doing the first test you have some practice doing it and you might have an unfair advantage due to this learning effect in doing the similar type of test second.
In addition, we don't have any population parameters provided, only a small sample of 24 (so a z-test would not be appropriate here).
Include at least one measure of central tendency and at least one measure of variability. The name of the data file is 'stroopdata.csv'.
print("Count is")
print(df.count())
print("Mean Times for Both Independant Variables")
print(df.mean(axis=0))
print("Median Times for Both Independant Variables")
print(df.median(axis=0))
print("Standard Deviation Times")
cog_sd = round(cog.std(),2)
in_cog_sd = round(in_cog.std(),2)
print("")
print("The standard Deviation for Congruent Times is {} ".format(cog_sd))
print("The standard Deviation for Incongruent Times is {}".format(in_cog_sd))
range = df.max()-df.min()
mean_range = float(cog.mean() + in_cog.mean())/2
print("The range for this dataset is\n{}".format(range))
print("The Mean Range for the entire data {}".format(round(mean_range,2)))
Write one or two sentences noting what you observe about the plot or plots.
Plotting Scatter Plots
#Computing Basic Values
fig = plt.figure()
y = df['Congruent']
x = df.index.get_values()
colors = '#1F77B4'
area = np.pi * 15 # 0 to 15 point radiuses
#Adding Basic Labels for the ScatterPlot
fig.suptitle('Congruent Words: ', fontsize=14)
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
ax.set_title('Sample Response Time Scatterplot')
ax.set_xlabel('Observation Count')
ax.set_ylabel('Completion Time (seconds)')
#Adding Axis and Limits
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.yticks(np.arange(0, 35, 2))
plt.scatter(x, y, s=area, c=colors, alpha=0.75)
plt.ylim([0,35])
plt.xlim([0,24])
plt.show()
#Computing Basic Values
fig = plt.figure()
y = df['Incongruent']
x = df.index.get_values()
colors = '#FF9E4A'
area = np.pi * 15 # 0 to 15 point radiuses
#Adding Basic Labels for the ScatterPlot
fig.suptitle('In-Congruent Words: ', fontsize=14)
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
ax.set_title('Sample Response Time Scatterplot')
ax.set_xlabel('Observation Count')
ax.set_ylabel('Completion Time (seconds)')
#Adding Axis and Limits
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.yticks(np.arange(0, 35, 2))
plt.scatter(x, y, s=area, c=colors, alpha=0.75)
plt.ylim([0,35])
plt.xlim([0,24])
plt.show()
sns.distplot(cog);
sns.distplot(in_cog,color='#FF9E4A');
What is your confidence level or Type I error associated with your test? What is your conclusion regarding the hypotheses you set up? Did the results match up with your expectations?
For a confidence level of 90% and 23 values (n-1) , our t-critical value ends up being 1.714 Reference Sheet
#We need to the point estimate
PE = in_cog.mean() - cog.mean()
PE = round(PE,2)
print("Point Estimate is : {}".format(PE))
#Step 1
#Var Name df['D'] = Sample Diffrences
df['D'] = in_cog - cog
df.head()
#Step 2
#DFM = difference from the mean
#SQD = Squared differences from the mean
DFM = df['D'] - df['D'].mean()
df['SQD'] = DFM*DFM
df.head()
#Step 3
#SSD = sum of squared differences
SSD = df['SQD'].sum()
df.head()
#Step 4
#v(variance) = SSD/(n-1)
n = len(df) #Length of the Data Frame
v = SSD/(n-1)
v
#Step 5
#Square Root of v
s = round(sqrt(v),2)
s
#Applying T-Test
t = PE/ (s/(sqrt(n)) )
print("T Test Value is {}".format(round(t,4)))
Our T-Critical Value : 1.714
Our T-Test Value : 8.0159
T-Test > T-Critical -->
8.015 > 1.714
Our t-test value (8.0159) is greater than our t-test critical value (1.714)
So it is safe to say we can reject the null hypothesis.
Which matches up with what we expected, That it takes much less time to do the congruent task than it does to do the incongruent task.