Untitled

---
title: "ProblemSet03"
author: "Jesús Martín Godoy"
format: pdf
editor: visual
---

# Review of Key Theoretical Concepts

## Question 1

Suppose a standardized test contains 10 questions with four possible answers each. Assume the student guesses on each question by randomly picking an answer.

a)  What is the probability that a student gets the first 3 questions correct and the next 7 questions incorrect, given that he is guessing?

$$
\left(\frac{1}{4}\right)^3 \times \left(\frac{3}{4}\right)^7 = \frac{1}{64} \times \frac{2187}{16384} \approx \frac{2187}{1048576} \approx 0.00208 \approx 0.208\%
$$

b)  What is the probability that a student gets exactly 3 questions correct, given that he is guessing?

    $$
    \left(\frac{1}{4}\right)^3 \cdot \left(\frac{3}{4}\right)^7 = \frac{1}{4^3} \cdot \frac{3^7}{4^7} = \frac{3^7}{4^{10}}
    $$

$$
\frac{3^7}{4^{10}} = \frac{2187}{1048576} \approx 0.00208
$$

b)  What is the probability that a student gets at least 3 questions correct, given that he is guessing? ...

c)  What number of questions correct has the greatest probability of occurring given the student is guessing (i.e., what is the mode)?

...

e)  If a student actually scores the number correct identified in the previous part, should you infer he is guessing?

## Question 2

a)  For a random variable $X$ and a constant $a$, show that $Var(aX)=a^2Var(X)$.

...

b)  For a random variables $X$ and $Y$, show that $Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$.

...

c)  For a random variables $X$ and $Y$, show that $Cov(X,Y)=E(XY)-E(X)E(Y)$.

...

d)  For a random variables $X$ and $Y$ and constant $a$ and $b$, show that $Cov(ax,bY)=abCov(X,Y)$.

...

# Immigration attitudes: the role of economic and cultural threat

## Question 1

Start by examining the distribution of immigration attitudes (as factor variables). What is the proportion of people who are willing to increase the quota for high-skilled foreign professionals (`h1bvis.supp`) or support immigration from India (`indimm.supp`)?

```{r}
# First, we read the data
data <- read.csv("~/Desktop/ProblemSet03/immig.csv")
```

```{r}
# Proportion of people who are willing to increase the quota for high-skilled foreign professionals (variable "h1bvis.supp")

mean(data$h1bvis.supp %in% c(0.75, 1))
```

The proportion of people who are willing to increase the quota for high-skilled foreign professionals is ≈0.165, or ≈16.50%.

```{r}
# Proportion of people who are willing to support immigration from India (variable "indimm.supp")

mean(data$indimm.supp %in% c(0.75, 1))
```

The proportion of people who are willing to support immigration from India is ≈0.132, or ≈13.15%.

Now compare the distribution of two distinct measures of cultural threat: explicit stereotyping about Indians (`expl.prejud`) and implicit bias against Indian Americans (`impl.prejud`). In particular, create a scatterplot, add a linear regression line to it, and calculate the correlation coefficient. Based on these results, what can you say about their relationship?

```{r}
library(tidyverse)
```

```{r}
#| warning : FALSE
#| message: FALSE
ggplot(
  data = data,
  mapping = aes(x = expl.prejud, y = impl.prejud)
) +
  geom_point() +
  geom_smooth(method = "lm")
```

```{r}
#I do the correlation removing the empty (aka NA) slots:
cor(data$expl.prejud, data$impl.prejud, use = "complete.obs")
```

COMENTAR

## Question 2

Compute the correlations between all four policy attitude and cultural threat measures. Do you agree that cultural threat is an important predictor of immigration attitudes as claimed in the literature?

```{r}
#policy attitude:
#h1bvis.supp: Soporte para aumentar las visas H-1B.
#indimm.supp: Soporte para aumentar la inmigración de la India.

#cultural threat measures:
#expl.prejud: Estereotipos explícitos sobre los indios.
#impl.prejud: Sesgo implícito contra los indioamericanos.

cor(data[c("expl.prejud", "impl.prejud", "h1bvis.supp", "indimm.supp")], use = "complete.obs")
```

COMENTARIO: LA RELACIONES ENTRE LAS CULTURAL THREAT Y LAS ATTITUDES SON NEGATIVAS. ERGO, NO SIRVEN!

If the labor market hypothesis is correct *(people simply do not want to face additional competition on the labor market)*, opposition to H-1B visas should also be more pronounced among
those who are economically threatened by this policy such as individuals in the high-technology sector.
At the same time, tech workers should not be more or less opposed to general Indian immigration because
of any *economic* considerations. First, regress H-1B and Indian immigration attitudes on the indicator
variable for tech workers (`tech.whitcol`). Do the results support the hypothesis? Is the relationship
different from the one involving cultural threat and, if so, how?

```{r}
cor(data[c("h1bvis.supp", "indimm.supp", "tech.whitcol")], use = "complete.obs")
```

HACER COMENTARIO

## Question 3

When examining hypotheses, it is always important to have an appropriate comparison group. One may
argue that comparing tech workers to everybody else as we did in Question 2 may be problematic due to
a variety of confounding variables (such as skill level and employment status). First, create a single factor
variable group which takes a value of tech if someone is employed in **tech**, **whitecollar** if someone is
employed in other \"white-collar\" jobs (such as law or finance), other if someone is employed in any other
sector, and **unemployed** if someone is unemployed. Then, compare the support for H-1B across these
conditions by using the linear regression. Interpret the results: is this comparison more or less supportive
of the labor market hypothesis than the one in Question 2?

```{r}
# Creamos la variable 'group' usando mutate y case_when
data <- data |>
  mutate(
    group = case_when(
      employed == 1 & tech.whitcol == 1 ~ "tech",
      employed == 1 & nontech.whitcol == 1 ~ "whitecollar",
      employed == 1 ~ "other",
      employed == 0 ~ "unemployed",
      TRUE ~ NA_character_  # En caso de que ninguna condición se cumpla
    )
  )

# Realizamos la regresión lineal para comparar el soporte para H-1B
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "tech")))
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "whitecollar")))
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "other")))
summary(lm(h1bvis.supp ~ 1, data = filter(data, group == "unemployed")))

```

Now, one may also argue that those who work in the tech sector are disproportionately young and male which may confound our results. To account for this possibility, fit another linear regression but also include age and female as pre-treatment covariates (in addition to group). Does it change the results and, if so, how?

```{r}
lm(h1bvis.supp ~ age + female + group, data = data)
```

Finally, fit a linear regression model with all threat indicators (group, expl.prejud, impl.prejud) and calculate its R2. How much of the variation is explained? Based on the model fit, what can you conclude about the role of threat factors?

```{r}
lm(h1bvis.supp ~ group + expl.prejud + impl.prejud, data = data)
```

```{r}
summary(lm(h1bvis.supp ~ group + expl.prejud + impl.prejud, data = data))$r.squared
```
Editor is loading...