Untitled

---
title: "ProblemSet03"
author: "Jesús Martín Godoy"
format: pdf
editor: visual
---

# Review of Key Theoretical Concepts

## Question 1

Suppose a standardized test contains 10 questions with four possible answers each. Assume the student guesses on each question by randomly picking an answer.

a)  What is the probability that a student gets the first 3 questions correct and the next 7 questions incorrect, given that he is guessing?

...

b)  What is the probability that a student gets exactly 3 questions correct, given that he is guessing?

...

c)  What is the probability that a student gets at least 3 questions correct, given that he is guessing?

    ...

d)  What number of questions correct has the greatest probability of occurring given the student is guessing (i.e., what is the mode)?

...

e)  If a student actually scores the number correct identified in the previous part, should you infer he is guessing?

## Question 2

a)  For a random variable $X$ and a constant $a$, show that $Var(aX)=a^2Var(X)$.

...

b)  For a random variables $X$ and $Y$, show that $Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$.

...

c)  For a random variables $X$ and $Y$, show that $Cov(X,Y)=E(XY)-E(X)E(Y)$.

...

d)  For a random variables $X$ and $Y$ and constant $a$ and $b$, show that $Cov(ax,bY)=abCov(X,Y)$.

...

# Immigration attitudes: the role of economic and cultural threat

## Question 1

Start by examining the distribution of immigration attitudes (as factor variables). What is the proportion of people who are willing to increase the quota for high-skilled foreign professionals (`h1bvis.supp`) or support immigration from India (`indimm.supp`)?

```{r}
# First, we read the data
data <- read.csv("~/Desktop/ProblemSet03/immig.csv")
```

```{r}
# Proportion of people who are willing to increase the quota for high-skilled foreign professionals (variable "h1bvis.supp")

mean(data$h1bvis.supp %in% c(0.75, 1))
```

The proportion of people who are willing to increase the quota for high-skilled foreign professionals is ≈0.165, or ≈16.50%.

```{r}
# Proportion of people who are willing to support immigration from India (variable "indimm.supp")

mean(data$indimm.supp %in% c(0.75, 1))
```

The proportion of people who are willing to support immigration from India is ≈0.132, or ≈13.15%.

Now compare the distribution of two distinct measures of cultural threat: explicit stereotyping about Indians (`expl.prejud`) and implicit bias against Indian Americans (`impl.prejud`). In particular, create a scatterplot, add a linear regression line to it, and calculate the correlation coefficient. Based on these results, what can you say about their relationship?

```{r}
library(tidyverse)
```

```{r}
#| warning : FALSE
#| message: FALSE
ggplot(
  data = data,
  mapping = aes(x = expl.prejud, y = impl.prejud)
) +
  geom_point() +
  geom_smooth(method = "lm")
```

```{r}
#I do the correlation removing the empty (aka NA) slots:
cor(data$expl.prejud, data$impl.prejud, use = "complete.obs")
```

COMENTAR

## Question 2

Compute the correlations between all four policy attitude and cultural threat measures. Do you agree that cultural threat is an important predictor of immigration attitudes as claimed in the literature?

```{r}
#policy attitude:
#h1bvis.supp: Soporte para aumentar las visas H-1B.
#indimm.supp: Soporte para aumentar la inmigración de la India.

#cultural threat measures:
#expl.prejud: Estereotipos explícitos sobre los indios.
#impl.prejud: Sesgo implícito contra los indioamericanos.

cor(data[c("expl.prejud", "impl.prejud", "h1bvis.supp", "indimm.supp")], use = "complete.obs")
```

COMENTARIO: LA RELACIONES ENTRE LAS CULTURAL THREAT Y LAS ATTITUDES SON NEGATIVAS. ERGO, NO SIRVEN!

If the labor market hypothesis is correct *(people simply do not want to face additional competition on the labor market)*, opposition to H-1B visas should also be more pronounced among those who are economically threatened by this policy such as individuals in the high-technology sector. At the same time, tech workers should not be more or less opposed to general Indian immigration because of any *economic* considerations. First, regress H-1B and Indian immigration attitudes on the indicator variable for tech workers (`tech.whitcol`). Do the results support the hypothesis? Is the relationship different from the one involving cultural threat and, if so, how?

```{r}
cor(data[c("h1bvis.supp", "indimm.supp", "tech.whitcol")], use = "complete.obs")
```

HACER COMENTARIO

## Question 3

When examining hypotheses, it is always important to have an appropriate comparison group. One may argue that comparing tech workers to everybody else as we did in Question 2 may be problematic due to a variety of confounding variables (such as skill level and employment status). First, create a single factor variable group which takes a value of tech if someone is employed in **tech**, **whitecollar** if someone is employed in other "white-collar" jobs (such as law or finance), other if someone is employed in any other sector, and **unemployed** if someone is unemployed. Then, compare the support for H-1B across these conditions by using the linear regression. Interpret the results: is this comparison more or less supportive of the labor market hypothesis than the one in Question 2?

```{r}
# Creamos la variable 'group' usando mutate y case_when
data <- data |>
  mutate(
    group = case_when(
      employed == 1 & tech.whitcol == 1 ~ "tech",
      employed == 1 & nontech.whitcol == 1 ~ "whitecollar",
      employed == 1 ~ "other",
      employed == 0 ~ "unemployed",
      TRUE ~ NA_character_  # En caso de que ninguna condición se cumpla
    )
  )

# Realizamos la regresión lineal para comparar el soporte para H-1B
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "tech")))
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "whitecollar")))
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "other")))
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "unemployed")))

```

Now, one may also argue that those who work in the tech sector are disproportionately young and male which may confound our results. To account for this possibility, fit another linear regression but also include age and female as pre-treatment covariates (in addition to group). Does it change the results and, if so, how?

```{r}
lm(h1bvis.supp ~ age + female + group, data = data)
```

Finally, fit a linear regression model with all threat indicators (group, expl.prejud, impl.prejud) and calculate its R2. How much of the variation is explained? Based on the model fit, what can you conclude about the role of threat factors?

```{r}
lm(h1bvis.supp ~ group + expl.prejud + impl.prejud, data = data)
```

```{r}
summary(lm(h1bvis.supp ~ group + expl.prejud + impl.prejud, data = data))$r.squared
```

## Question 4

Besides economic and cultural threat, many scholars also argue that gender is an important predictor of immigration attitudes. While there is some evidence that women are slightly less opposed to immigration than men, it may also be true that gender conditions the very effect of other factors such as cultural threat. To see if it is indeed the case, fit a linear regression of H-1B support on the interaction between gender and implicit prejudice. Then, create a plot with the predicted level of H-1B support (y-axis) across the range of implicit bias (x-axis) by gender (Hint: you can use the `predict()` function. Check the help file via `?predict.lm` in RStudio). Considering the results, would you agree that gender alters the relationship between cultural threat and immigration attitudes?

```{r}
model_interaction <- lm(h1bvis.supp ~ female * impl.prejud, data = data)
```

```{r}
# Predicciones para hombres (female = 0) y mujeres (female = 1) en los valores de impl.prejud
predictions_male <- predict(model_interaction, newdata = data.frame(female = 0, impl.prejud = seq(0, 1, 0.25)))
predictions_female <- predict(model_interaction, newdata = data.frame(female = 1, impl.prejud = seq(0, 1, 0.25)))

# Crear un dataframe para las predicciones
predictions_df <- data.frame(
  Gender = rep(c("Male", "Female"), each = length(predictions_male)),
  ImplicitBias = rep(seq(0, 1, 0.25), times = 2),
  PredictedSupport = c(predictions_male, predictions_female)
)

# Crear el gráfico de barras apiladas con colores por defecto
ggplot(predictions_df, aes(x = ImplicitBias, y = PredictedSupport, fill = Gender)) +
  geom_bar(stat = "identity") +
  labs(title = "Predicted H-1B Support by Implicit Bias and Gender", x = "Implicit Bias", y = "Predicted Support")

```

Age is another important covariate. Fit two regression models in which H-1B support is either a linear or quadratic function of age. Compare the results by plotting the predicted levels of support (y-axis) across the whole age range (x-axis). Would you say that people become more opposed to immigration with age?

```{r}
# Ajustar un modelo de regresión lineal
model_linear_age <- lm(h1bvis.supp ~ age, data = data)

# Ajustar un modelo de regresión cuadrática
model_quadratic_age <- lm(h1bvis.supp ~ age + I(age^2), data = data)

# Filtrar los valores de edad válidos
valid_age_data <- na.omit(data$age)  # Elimina los valores faltantes (NA)

# Crear un rango de edades válidas para predecir
age_range <- seq(min(valid_age_data), max(valid_age_data), length.out = 100)

# Predicciones del modelo lineal
predictions_linear <- predict(model_linear_age, newdata = data.frame(age = age_range))

# Predicciones del modelo cuadrático
predictions_quadratic <- predict(model_quadratic_age, newdata = data.frame(age = age_range))

# Crear un dataframe para las predicciones
predictions_df <- data.frame(
  Age = age_range,
  PredictedSupportLinear = predictions_linear,
  PredictedSupportQuadratic = predictions_quadratic
)

# Cargar la librería ggplot2 para graficar
library(ggplot2)

# Crear el gráfico de las predicciones
ggplot(predictions_df, aes(x = Age)) +
  geom_line(aes(y = PredictedSupportLinear)) +
  geom_line(aes(y = PredictedSupportQuadratic), linetype = "dashed") +
  labs(title = "Predicted H-1B Support by Age", y = "Predicted Support")

```
Editor is loading...