# Untitled

unknown
plain_text
a month ago
2.2 kB
11
Indexable
Never
```"""
clear
set obs 200
gen female = 0
replace female = 1 if _n>100
** Education is costly, as in a Spence model.
** The cost of getting an education is distributed uniformly from 0 to 1, in both genders.
** Having an education doubles your wage.
** You only pursue an education if the benefit is worth the cost.
gen education_cost
** Women are discriminated against, and earning 20% less than men.
** Occupation 1 pays \$1 for men, and \$0.80 for women.
** Only people with an education can be in Occupation 1.
** Occupation 2 pays \$0.50 for men, and \$0.40 for women
=
mod (_n, 100)/100
** Women are less likely to get an education because of the discrimination
** Half of men get an education vs. 40% of women.
=
gen wage = 1 if ! female & education_cost>=0.5
replace wage
0.5 if ! female & education_cost<0.5
replace wage = 0.8 if female & education_cost>=0.6
replace wage = 0.4 if female & education_cost<0.6
gen goteducation = 0
replace goteducation=1 if wage>0.7
reg wage female
reg wage female goteducation
"""

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

def run():
# Create a DataFrame with 1000 observations
df = pd.DataFrame(index=range(1000))

# Generate 'female' column
df['female'] = 0
df.loc[500:, 'female'] = 1

# Generate 'education_cost' column
df['education_cost'] = np.random.uniform(0, 1, 1000)

# Generate 'wage' column
df.loc[(df['female'] == 0) & (df['education_cost'] >= 0.5), 'wage'] = 10
df.loc[(df['female'] == 0) & (df['education_cost'] < 0.5), 'wage'] = 5

df.loc[(df['female'] == 1) & (df['education_cost'] >= 0.7), 'wage'] = 10
df.loc[(df['female'] == 1) & (df['education_cost'] < 0.7), 'wage'] = 5

# Generate 'goteducation' column, anyone with wage > 5 must have gotten education
df['goteducation'] = 0
df.loc[df['wage'] > 7, 'goteducation'] = 1

# Log-transform 'wage'
df['log_wage'] = np.log(df['wage'])

# Run regressions
model1 = smf.ols(formula='log_wage ~ female', data=df).fit()
model2 = smf.ols(formula='log_wage ~ female + goteducation', data=df).fit()

print(model1.summary())
print(model2.summary())

run()```