In [1]:

```
##General Import Statements to Process Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from math import *
%matplotlib inline
df = pd.read_csv('stroopdata.csv')
```

**Independent variable**: Word Conditions (Congruent or Incongruent)

**Dependent variable**: Time to complete test (In Seconds)

**Explanation**

- An independent variable is the variable you have control over, what you can choose and/or manipulate. In this case the two different ways of taking this test (Congruent or Incongruent)
- An Dependent variable has a strong correlation to another variable, in this case the tests we take.

In [2]:

```
#Sepeatring Data into two Variables for Easier Anaylysis
cog = df['Congruent']
in_cog = df['Incongruent']
```

**Specify your null and alternative hypotheses, and clearly define any notation used. Justify your choices**

The **Null hypothesis** is that there is no difference for average viewing mean response time for congruent words vs viewing the incongruent words.

The **Alternative hypothesis** There is a significant difference (negative or positive) for mean response time for congruent words vs viewing the incongruent words.

**Alternative Explanation -- Post Review**

Ho - **Null Hypothesis**: ( Î¼i - Î¼c = 0 ) There is **no** significant difference in the population average response time in viewing the congruent(c) words vs viewing the incongruent(i) words.

Ha - **Alternative Hypothesis**: ( Î¼i - Î¼c â‰ 0 ) There is a significant difference, positive or negative, in the population average response times.

The **Dependent Samples t-Test** is the appropriate statistical test as the same subjects are assigned two different conditions.
The different conditions are dependent because by doing the first test you have some practice doing it and you might have an unfair advantage due to this learning effect in doing the similar type of test second.

In addition, we don't have any population parameters provided, only a small sample of 24 (so a z-test would not be appropriate here).

**Include at least one measure of central tendency and at least one measure of variability. The name of the data file is 'stroopdata.csv'.**

In [3]:

```
print("Count is")
print(df.count())
```

In [4]:

```
print("Mean Times for Both Independant Variables")
print(df.mean(axis=0))
```

In [5]:

```
print("Median Times for Both Independant Variables")
print(df.median(axis=0))
```

In [6]:

```
print("Standard Deviation Times")
cog_sd = round(cog.std(),2)
in_cog_sd = round(in_cog.std(),2)
print("")
print("The standard Deviation for Congruent Times is {} ".format(cog_sd))
print("The standard Deviation for Incongruent Times is {}".format(in_cog_sd))
```

In [7]:

```
range = df.max()-df.min()
mean_range = float(cog.mean() + in_cog.mean())/2
print("The range for this dataset is\n{}".format(range))
print("The Mean Range for the entire data {}".format(round(mean_range,2)))
```

**Write one or two sentences noting what you observe about the plot or plots.**

**Plotting Scatter Plots**

In [8]:

```
#Computing Basic Values
fig = plt.figure()
y = df['Congruent']
x = df.index.get_values()
colors = '#1F77B4'
area = np.pi * 15 # 0 to 15 point radiuses
#Adding Basic Labels for the ScatterPlot
fig.suptitle('Congruent Words: ', fontsize=14)
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
ax.set_title('Sample Response Time Scatterplot')
ax.set_xlabel('Observation Count')
ax.set_ylabel('Completion Time (seconds)')
#Adding Axis and Limits
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.yticks(np.arange(0, 35, 2))
plt.scatter(x, y, s=area, c=colors, alpha=0.75)
plt.ylim([0,35])
plt.xlim([0,24])
plt.show()
```

In [9]:

```
#Computing Basic Values
fig = plt.figure()
y = df['Incongruent']
x = df.index.get_values()
colors = '#FF9E4A'
area = np.pi * 15 # 0 to 15 point radiuses
#Adding Basic Labels for the ScatterPlot
fig.suptitle('In-Congruent Words: ', fontsize=14)
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
ax.set_title('Sample Response Time Scatterplot')
ax.set_xlabel('Observation Count')
ax.set_ylabel('Completion Time (seconds)')
#Adding Axis and Limits
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.yticks(np.arange(0, 35, 2))
plt.scatter(x, y, s=area, c=colors, alpha=0.75)
plt.ylim([0,35])
plt.xlim([0,24])
plt.show()
```

In [10]:

```
sns.distplot(cog);
```

In [11]:

```
sns.distplot(in_cog,color='#FF9E4A');
```

- Plot #1 and #2 contains a plot of both Congruent and Incongruent respectively , and it shows that Congruent are considerably lower than the Incongruent times, hence confirming the null hypothesis.

The congruent words sample has a distribution which is between 8 and 22 seconds and has a lower average completion time compared to the incongruent words scatterplot which shows the distribution is between 15 to about 26 seconds with what appears to be one outlier at 35 seconds. The average completion time is definitely higher.

- Plots #3 and #4, show the spread of the individual data sets showing the way the data is using the seaborn distplot method.

**What is your confidence level or Type I error associated with your test?
What is your conclusion regarding the hypotheses you set up? Did the results match up with your expectations?**

For a confidence level of **90%** and 23 values (n-1) , our t-critical value ends up being **1.714**
Reference Sheet

In [12]:

```
#We need to the point estimate
PE = in_cog.mean() - cog.mean()
PE = round(PE,2)
print("Point Estimate is : {}".format(PE))
```

In [13]:

```
#Step 1
#Var Name df['D'] = Sample Diffrences
df['D'] = in_cog - cog
df.head()
```

Out[13]:

In [14]:

```
#Step 2
#DFM = difference from the mean
#SQD = Squared differences from the mean
DFM = df['D'] - df['D'].mean()
df['SQD'] = DFM*DFM
df.head()
```

Out[14]:

In [15]:

```
#Step 3
#SSD = sum of squared differences
SSD = df['SQD'].sum()
df.head()
```

Out[15]:

In [16]:

```
#Step 4
#v(variance) = SSD/(n-1)
n = len(df) #Length of the Data Frame
v = SSD/(n-1)
v
```

Out[16]:

In [17]:

```
#Step 5
#Square Root of v
s = round(sqrt(v),2)
s
```

Out[17]:

In [18]:

```
#Applying T-Test
t = PE/ (s/(sqrt(n)) )
print("T Test Value is {}".format(round(t,4)))
```

Our T-Critical Value : **1.714**

Our T-Test Value : **8.0159**

T-Test > T-Critical -->

8.015 > 1.714

Our t-test value (**8.0159**) is greater than our t-test critical value (**1.714**)

So it is safe to say we can **reject the null hypothesis**.

**Which matches up with what we expected, That it takes much less time to do the congruent task than it does to do the incongruent task.**