PhD Stress Level Dataset and Exercises
Dataset Description
This dataset contains information on PhD students’ stress levels (scale 0 to 100) along with several predictors:
- Coffee: Number of coffee cups consumed per day (integer)
- Spritz: Number of spritz (aperitif) consumed per week (integer)
- Statistical Knowledge: Self-assessed score from 0 to 10
- Software Used: Statistical software used (
R
,SPSS
,Excel
)
You can download the dataset here
Exercises
These exercises will help you explore the dataset, understand relationships, and fit a linear regression model.
1. Load and Explore the Data
- Load the dataset.
- Fix problems in the dataset, impute missing numerical values with the mean and missing categorical variables with a random value from the available values.
- Calculate means, medians, standard deviations, and ranges for all numeric variables.
- Examine the distribution of the categorical variable
software_used
with counts and proportions.
2. Visualize the Data
- Plot histograms or density plots for the numeric variables (stress, coffee, spritz, statistical knowledge).
- Create boxplots of stress by
software_used
.
- Create scatterplots:
- Stress vs Coffee
- Stress vs Spritz (color points by
software_used
)
- Stress vs Statistical Knowledge
- Stress vs Coffee
3. Check Relationships and Correlations
- Compute correlation coefficients between numeric variables.
- Comment on the strength and direction of relationships between stress and each predictor.
4. Fit a Linear Regression Model
- Fit a linear regression model predicting stress from coffee, spritz, statistical knowledge,
software_used
, and the interaction between spritz and software - Examine the model summary and interpret the coefficients, especially the interaction terms