Untitled

---
title: "Problem Set 05"
subtitle: "Applied Quantitative Methods for the Social Sciences I"
date: "11-27-2023"
date-format: "long"
author: "Jesús Martín Godoy"
format: pdf
editor: visual
---

```{r}
# Reading the data
nazis <- read.csv("nazis.csv")
```

## Question 1

```{r}
# Linear regression
model_q1 <- lm(nazivote/nvoter ~ shareblue, data = nazis)

# Estimated slope coefficient and standard error
coef_estimate_q1 <- coef(model_q1)["shareblue"]
se_estimate_q1 <- summary(model_q1)$coef["shareblue", "Std. Error"]

# 95% confidence interval
conf_interval_q1 <- confint(model_q1)["shareblue", ]

cat("Estimated slope coefficient:", coef_estimate_q1, "\n")
cat("Standard error of the slope coefficient:", se_estimate_q1, "\n")
cat("95% Confidence Interval for the slope coefficient:", conf_interval_q1, "\n")
```

The positive estimated slope coefficient of 0.06517999 suggests that, on average, there is a positive association between the proportion of blue-collar voters and the Nazi vote share. This implies that precincts with a higher percentage of blue-collar workers tend to have a higher Nazi vote share.

The standard error of 0.05219791 reflects the precision of the estimate. A smaller standard error indicates a more reliable estimate of the relationship between blue-collar voters and the Nazi vote share. Therefore, the positive association observed is reasonably precise, though caution is still warranted due to the potential for sampling variability.

The 95% confidence interval (-0.03730872, 0.1676687) helps contextualize the uncertainty around the estimated relationship. The fact that this interval includes zero suggests that, while there is evidence of a positive association, it is not strong enough to be statistically significant at the conventional 5% significance level. This means that the relationship between blue-collar voters and the Nazi vote share may not be definitive and could be influenced by other factors.

## Question 2

```{r, warning = FALSE, message = FALSE}

# Since we need to plot stuff we have to load tidyverse
library(tidyverse)
```

```{r}
# Prediction
predictions_q2 <- predict(model_q1, newdata = data.frame(shareblue = seq(min(nazis$shareblue), max(nazis$shareblue), length.out = 100)), interval = "confidence", level = 0.95)
```

```{r}
# Data frame with predicted values
plot_data_fit <- data.frame(
  shareblue = seq(min(nazis$shareblue), max(nazis$shareblue), length.out = 100),
  fit = predictions_q2[, "fit"]
)

# Data frame with confidence intervals
plot_data_interval <- data.frame(
  shareblue = rep(plot_data_fit$shareblue, each = 2),
  interval = c(predictions_q2[, "lwr"], predictions_q2[, "upr"])
)

# Plot
ggplot() +
  geom_line(data = plot_data_fit, aes(x = shareblue, y = fit), linewidth = 1, linetype = "solid") +
  geom_line(data = plot_data_interval, aes(x = shareblue, y = interval), linewidth = 1, linetype = "dashed") +
  labs(x = "Proportion of Blue-collar Voters (Xi)", y = "Average Nazi Vote Share (Yi)", title = "Predicted Nazi Vote Share vs. Proportion of Blue-collar Voters")
```

## Question 3

```{r}
model_q3 <- lm(nazivote/nvoter ~ shareblue - 1, data = nazis)
summary(model_q3)
```

In this alternative linear regression model without an intercept, the parameters $\hat{\alpha}$ and $\hat{\beta}$ take on distinct roles in capturing the relationship between the proportion of blue-collar voters and the Nazi vote share. The coefficient for $\hat{\alpha}$ represents the slope of the regression line when $X_i = 1$, indicating the effect of the proportion of blue-collar voters on the Nazi vote share in precincts predominantly composed of blue-collar workers. On the other hand, $\hat{\beta}$ serves as the slope of the regression line when $X_i = 0$, denoting the impact of the complement of blue-collar voters (non-blue-collar voters) on the Nazi vote share. The absence of an intercept in this model signifies that the expected Nazi vote share starts at zero when there are no blue-collar voters $(X_i = 0)$. In comparison to the linear regression model in Question 1, which included an intercept term, this alternative model offers a more nuanced understanding by separately capturing the influence of blue-collar and non-blue-collar voters on the Nazi vote share without assuming a baseline level when there are no blue-collar voters.

## Question 4

```{r}
model_q4 <- lm(nazivote/nvoter ~ shareself + shareblue + sharewhite + sharedomestic + shareunemployed - 1, data = nazis)

summary(model_q4)
```

The estimated coefficient for **`shareself`** is 1.11426 (95% CI: 0.78505 to 1.44348), suggesting that a one-unit increase in the proportion of self-employed potential voters is associated with an increase of approximately 1.11426 units in the Nazi vote share. Similarly, the coefficient for **`shareblue`** is 0.54038 (95% CI: 0.46485 to 0.61590), indicating a positive association between the proportion of blue-collar potential voters and the Nazi vote share. **`sharewhite`** has a coefficient of 0.28509 (95% CI: 0.13764 to 0.43255), suggesting a positive impact of the proportion of white-collar potential voters on the Nazi vote share. However, **`sharedomestic`** and **`shareunemployed`** do not appear to have statistically significant associations with the Nazi vote share, as their confidence intervals include zero. The assumption necessary for these interpretations is that the relationship between the occupation proportions and the Nazi vote share is linear, and the absence of an intercept assumes that the Nazi vote share starts at zero when all occupation proportions are zero. Additionally, it assumes the residuals are normally distributed and independent. The high adjusted R-squared (0.9431) indicates that the model explains a substantial proportion of the variance in the Nazi vote share.

## Question 5

```{r}
# Smallest possible value for Wi1
smallest_Wi1 <- pmax(nazis$shareblue, 0)
head(smallest_Wi1)
```

In the first precinct, approximately 46.7% of blue-collar voters are estimated to have voted for the Nazis under this conservative assumption, while the values for subsequent precincts range from around 27.8% to 45.9%. These results highlight the minimal proportion of blue-collar voters supporting the Nazis when assuming that all non-blue-collar voters in each precinct voted for the party. The variability underscores the importance of considering individual precinct dynamics in understanding potential voting behavior, emphasizing the diversity of political landscapes across different regions.

```{r}
# Largest possible value for Wi1
largest_Wi1 <- rep(1, nrow(nazis))
head(largest_Wi1)
```

The result indicates that, under the scenario where all voters in each precinct, regardless of their occupation, vote for the Nazis, the proportion of blue-collar voters who voted for the Nazis is maximized and equals 1.

```{r}
# Bounds for the nationwide proportion of blue-collar voters who voted for the Nazis
nationwide_bounds <- weighted.mean(smallest_Wi1, nazis$nvoter * nazis$shareself) |>
  cbind(weighted.mean(largest_Wi1, nazis$nvoter * nazis$shareself))

nationwide_bounds
```

The bounds for the nationwide proportion of blue-collar voters who voted for the Nazis range from approximately 30.73% to 100%. The lower bound of 30.73% represents the smallest possible value for $W_{i1}$, calculated by considering the scenario in which all non-blue-collar voters in each precinct vote for the Nazis. On the other hand, the upper bound of 100% is the largest possible value for $W_{i1}$, obtained when all voters in each precinct, regardless of occupation, cast their ballots for the Nazis. These bounds emphasize the variability and uncertainty in estimating the proportion of blue-collar voters supporting the Nazis across precincts, showcasing the broad range of possibilities within the confines of the ecological inference framework. The lower bound suggests a scenario where a significant portion of blue-collar voters supports the Nazis, while the upper bound indicates a scenario where all voters, irrespective of occupation, contribute to the Nazi vote share.
Editor is loading...