In [1]:
##General Import Statements to Process Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from math import *
%matplotlib inline

df = pd.read_csv('stroopdata.csv')

Analyzing the Stroop Effect

(1) What is the independent variable? What is the dependent variable?

Independent variable: Word Conditions (Congruent or Incongruent)

Dependent variable: Time to complete test (In Seconds)

Explanation

  • An independent variable is the variable you have control over, what you can choose and/or manipulate. In this case the two different ways of taking this test (Congruent or Incongruent)
  • An Dependent variable has a strong correlation to another variable, in this case the tests we take.
In [2]:
#Sepeatring Data into two Variables for Easier Anaylysis
cog = df['Congruent']
in_cog = df['Incongruent']

(2) What is an appropriate set of hypotheses for this task?

Specify your null and alternative hypotheses, and clearly define any notation used. Justify your choices

The Null hypothesis is that there is no difference for average viewing mean response time for congruent words vs viewing the incongruent words.

The Alternative hypothesis There is a significant difference (negative or positive) for mean response time for congruent words vs viewing the incongruent words.

Alternative Explanation -- Post Review

Ho - Null Hypothesis: ( μi - μc = 0 ) There is no significant difference in the population average response time in viewing the congruent(c) words vs viewing the incongruent(i) words.

Ha - Alternative Hypothesis: ( μi - μc ≠ 0 ) There is a significant difference, positive or negative, in the population average response times.

Statistical Test Analysis

The Dependent Samples t-Test is the appropriate statistical test as the same subjects are assigned two different conditions. The different conditions are dependent because by doing the first test you have some practice doing it and you might have an unfair advantage due to this learning effect in doing the similar type of test second.
In addition, we don't have any population parameters provided, only a small sample of 24 (so a z-test would not be appropriate here).

(3) Report some descriptive statistics regarding this dataset.

Include at least one measure of central tendency and at least one measure of variability. The name of the data file is 'stroopdata.csv'.

In [3]:
print("Count is")
print(df.count())
Count is
Congruent      24
Incongruent    24
dtype: int64
In [4]:
print("Mean Times for Both Independant Variables")
print(df.mean(axis=0))
Mean Times for Both Independant Variables
Congruent      14.051125
Incongruent    22.015917
dtype: float64
In [5]:
print("Median Times for Both Independant Variables")
print(df.median(axis=0))
Median Times for Both Independant Variables
Congruent      14.3565
Incongruent    21.0175
dtype: float64
In [6]:
print("Standard Deviation Times")
cog_sd = round(cog.std(),2)
in_cog_sd = round(in_cog.std(),2)

print("")
print("The standard Deviation for Congruent Times is {} ".format(cog_sd))
print("The standard Deviation for Incongruent Times is {}".format(in_cog_sd))
Standard Deviation Times

The standard Deviation for Congruent Times is 3.56 
The standard Deviation for Incongruent Times is 4.8
In [7]:
range = df.max()-df.min()
mean_range = float(cog.mean() + in_cog.mean())/2
print("The range for this dataset is\n{}".format(range))
print("The Mean Range for the entire data {}".format(round(mean_range,2)))
The range for this dataset is
Congruent      13.698
Incongruent    19.568
dtype: float64
The Mean Range for the entire data 18.03

(4) Provide one or two visualizations that show the distribution of the sample data.

Write one or two sentences noting what you observe about the plot or plots.

Plotting Scatter Plots

In [8]:
#Computing Basic Values
fig = plt.figure()
y = df['Congruent']
x = df.index.get_values()
colors = '#1F77B4'
area = np.pi * 15 # 0 to 15 point radiuses

#Adding Basic Labels for the ScatterPlot
fig.suptitle('Congruent Words: ', fontsize=14)
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
ax.set_title('Sample Response Time Scatterplot')
ax.set_xlabel('Observation Count')
ax.set_ylabel('Completion Time (seconds)')

#Adding Axis and Limits
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.yticks(np.arange(0, 35, 2))
plt.scatter(x, y, s=area, c=colors, alpha=0.75)
plt.ylim([0,35])
plt.xlim([0,24])
plt.show()
In [9]:
#Computing Basic Values
fig = plt.figure()
y = df['Incongruent']
x = df.index.get_values()
colors = '#FF9E4A'
area = np.pi * 15 # 0 to 15 point radiuses

#Adding Basic Labels for the ScatterPlot
fig.suptitle('In-Congruent Words: ', fontsize=14)
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
ax.set_title('Sample Response Time Scatterplot')
ax.set_xlabel('Observation Count')
ax.set_ylabel('Completion Time (seconds)')

#Adding Axis and Limits
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.yticks(np.arange(0, 35, 2))
plt.scatter(x, y, s=area, c=colors, alpha=0.75)
plt.ylim([0,35])
plt.xlim([0,24])
plt.show()
In [10]:
sns.distplot(cog);
In [11]:
sns.distplot(in_cog,color='#FF9E4A');

I have plotted Four graphs.

  1. Plot #1 and #2 contains a plot of both Congruent and Incongruent respectively , and it shows that Congruent are considerably lower than the Incongruent times, hence confirming the null hypothesis.
    The congruent words sample has a distribution which is between 8 and 22 seconds and has a lower average completion time compared to the incongruent words scatterplot which shows the distribution is between 15 to about 26 seconds with what appears to be one outlier at 35 seconds. The average completion time is definitely higher.
  1. Plots #3 and #4, show the spread of the individual data sets showing the way the data is using the seaborn distplot method.

(5) Now, perform the statistical test and report your results.

What is your confidence level or Type I error associated with your test? What is your conclusion regarding the hypotheses you set up? Did the results match up with your expectations?

Performing T-Test for the Data

For a confidence level of 90% and 23 values (n-1) , our t-critical value ends up being 1.714 Reference Sheet

In [12]:
#We need to the point estimate 
PE = in_cog.mean() - cog.mean()
PE = round(PE,2)

print("Point Estimate is : {}".format(PE))
Point Estimate is : 7.96
In [13]:
#Step 1
#Var Name df['D'] = Sample Diffrences
df['D'] = in_cog - cog
df.head()
Out[13]:
Congruent Incongruent D
0 12.079 19.278 7.199
1 16.791 18.741 1.950
2 9.564 21.214 11.650
3 8.630 15.687 7.057
4 14.669 22.803 8.134
In [14]:
#Step 2
#DFM = difference from the mean
#SQD = Squared differences from the mean
DFM = df['D'] - df['D'].mean()
df['SQD'] = DFM*DFM

df.head()
Out[14]:
Congruent Incongruent D SQD
0 12.079 19.278 7.199 0.586437
1 16.791 18.741 1.950 36.177719
2 9.564 21.214 11.650 13.580760
3 8.630 15.687 7.057 0.824086
4 14.669 22.803 8.134 0.028631
In [15]:
#Step 3
#SSD = sum of squared differences
SSD = df['SQD'].sum()

df.head()
Out[15]:
Congruent Incongruent D SQD
0 12.079 19.278 7.199 0.586437
1 16.791 18.741 1.950 36.177719
2 9.564 21.214 11.650 13.580760
3 8.630 15.687 7.057 0.824086
4 14.669 22.803 8.134 0.028631
In [16]:
#Step 4 
#v(variance) = SSD/(n-1)

n = len(df)  #Length of the Data Frame
v = SSD/(n-1)
v
Out[16]:
23.666540867753632
In [17]:
#Step 5 
#Square Root of v
s = round(sqrt(v),2)
s
Out[17]:
4.86
In [18]:
#Applying T-Test

t = PE/ (s/(sqrt(n)) )
print("T Test Value is {}".format(round(t,4)))
T Test Value is 8.0238

Our T-Critical Value : 1.714
Our T-Test Value : 8.0159

T-Test > T-Critical -->
8.015 > 1.714

Conclusion

Our t-test value (8.0159) is greater than our t-test critical value (1.714)
So it is safe to say we can reject the null hypothesis.

Which matches up with what we expected, That it takes much less time to do the congruent task than it does to do the incongruent task.