2019-11-14

Overview

In higher education, being able to predict retention is important for multiple reasons including:

  • Provide support services for students “at-risk”
  • Used by some college rankings (e.g. US News)

Many colleges use admissions tests (e.g. ACT, SAT) to determine admission to the college. Our guiding research question is:

Does addmission tests predict college retention?

Retention is defined as:

A measure of the rate at which students persist in their educational program at an institution, expressed as a percentage. For four-year institutions, this is the percentage of first-time bachelors (or equivalent) degree-seeking undergraduates from the previous fall who are again enrolled in the current fall. For all other institutions this is the percentage of first-time degree/certificate-seeking students from the previous fall who either re-enrolled or successfully completed their program by the current fall. (IPEDS, 2019)

Data Source

The Integrated Postsecondary Education Data System (IPEDS) provides information about all higher education institutions that provide Federal Finacial Aid to students.

Data Preparation

The ipeds R package provides an interface to download IPEDS data directly into R.

directory <- getIPEDSSurvey('HD', 2011)
admissions <- getIPEDSSurvey("IC", 2011)
retention <- getIPEDSSurvey("EFD", 2011)

We will subset the columns we are interested in and rename them.

directory <- directory[,c('unitid', 'instnm', 'sector', 'control')]
admissions <- admissions[,c('unitid', 'admcon1', 'admcon2', 'admcon7', 'applcnm', 
                           'applcnw', 'applcn', 'admssnm', 'admssnw', 'admssn', 
                           'enrlftm', 'enrlftw', 'enrlptm', 'enrlptw', 'enrlt', 
                           'satnum', 'satpct', 'actnum', 'actpct', 'satvr25', 
                           'satvr75', 'satmt25', 'satmt75', 'satwr25', 'satwr75', 
                           'actcm25', 'actcm75', 'acten25', 'acten75', 'actmt25', 
                           'actmt75', 'actwr25', 'actwr75')]
retention <- retention[,c('unitid', 'ret_pcf', 'ret_pcp')]

Data Preparation: Rename Columns

names(admissions) <- c("unitid", "UseHSGPA", "UseHSRank", "UseAdmissionTestScores", 
                      "ApplicantsMen", "ApplicantsWomen", "ApplicantsTotal", 
                      "AdmissionsMen", "AdmissionsWomen", "AdmissionsTotal", 
                      "EnrolledFullTimeMen", "EnrolledFullTimeWomen", 
                      "EnrolledPartTimeMen", "EnrolledPartTimeWomen", 
                      "EnrolledTotal", "NumSATScores", "PercentSATScores", 
                      "NumACTScores", "PercentACTScores", "SATReading25", 
                      "SATReading75", "SATMath25", "SATMath75", "SATWriting25", 
                      "SATWriting75", "ACTComposite25", "ACTComposite75", 
                      "ACTEnglish25", "ACTEnglish75", "ACTMath25", "ACTMath75", 
                      "ACTWriting25", "ACTWriting75")
names(retention) = c("unitid", "FullTimeRetentionRate", "PartTimeRetentionRate")

Data Preparation: Recoding

Recode the openadmission and distanceEd variables to factors and enrollment to an integer.

admissionsLabels = c("Required", "Recommended", "Neither requiered nor recommended", 
                     "Do not know", "Not reported", "Not applicable")
admissions$UseHSGPA = factor(admissions$UseHSGPA, levels=c(1,2,3,4,-1,-2), 
                            labels=admissionsLabels)
admissions$UseHSRank = factor(admissions$UseHSRank, levels=c(1,2,3,4,-1,-2), 
                            labels=admissionsLabels)
admissions$UseAdmissionTestScores = factor(admissions$UseAdmissionTestScores, levels=c(1,2,3,4,-1,-2), 
                            labels=admissionsLabels)

Data Preparation: Merging

ret <- merge(directory, admissions, by="unitid")
ret <- merge(ret, retention, by="unitid")
#Use schools that require or recommend admission tests
ret2 <- ret[ret$UseAdmissionTestScores %in% 
            c('Required', 'Recommended', 'Neither requiered nor recommended'),] 
#Remove schools with low retention rates. Are these errors in the data?
ret2 <- ret2[-which(ret2$FullTimeRetentionRate < 20),] 
head(ret2, n = 3)
##   unitid                              instnm sector control UseHSGPA
## 1 100654            Alabama A & M University      1       1 Required
## 2 100663 University of Alabama at Birmingham      1       1 Required
## 4 100706 University of Alabama at Huntsville      1       1 Required
##                           UseHSRank UseAdmissionTestScores ApplicantsMen
## 1                       Recommended               Required          2847
## 2 Neither requiered nor recommended               Required          2307
## 4                       Recommended               Required          1100
##   ApplicantsWomen ApplicantsTotal AdmissionsMen AdmissionsWomen
## 1            4309            7156          1442            2143
## 2            3268            5575          1692            2335
## 4             852            1952           719             524
##   AdmissionsTotal EnrolledFullTimeMen EnrolledFullTimeWomen
## 1            3585                 421                   435
## 2            4027                 687                   891
## 4            1243                 384                   268
##   EnrolledPartTimeMen EnrolledPartTimeWomen EnrolledTotal NumSATScores
## 1                   5                    11           872           96
## 2                  15                    12          1605           79
## 4                  16                     9           677          126
##   PercentSATScores NumACTScores PercentACTScores SATReading25 SATReading75
## 1               12          783               90          370          450
## 2                5         1500               93          500          630
## 4               19          623               92          500          630
##   SATMath25 SATMath75 SATWriting25 SATWriting75 ACTComposite25
## 1       360       450          370          440             16
## 2       500       640            .            .             21
## 4       520       670            .            .             22
##   ACTComposite75 ACTEnglish25 ACTEnglish75 ACTMath25 ACTMath75
## 1             19           14           19        15        18
## 2             27           21           28        20        26
## 4             29           22           30        21        28
##   ACTWriting25 ACTWriting75 FullTimeRetentionRate PartTimeRetentionRate
## 1            .            .                    64                    10
## 2            .            .                    79                    62
## 4            .            .                    79                    50

Data Preparation: SAT and ACT Scores

IPEDS only provides the 25th and 75th percentile in SAT and ACT scores. We will use the mean of these two values as a proxy for the mean.

ret2$SATMath75 <- as.numeric(ret2$SATMath75)
ret2$SATMath25 <- as.numeric(ret2$SATMath25)
ret2$SATMath <- (ret2$SATMath75 + ret2$SATMath25) / 2
ret2$SATWriting75 <- as.numeric(ret2$SATWriting75)
ret2$SATWriting25 <- as.numeric(ret2$SATWriting25)
ret2$SATWriting <- (ret2$SATWriting75 + ret2$SATWriting25) / 2
ret2$SATTotal <- ret2$SATMath + ret2$SATWriting
ret2$NumSATScores <- as.integer(ret2$NumSATScores)

Data Preparation: Selectivity

Calculate the the selectivity of the institution by calculating the acceptance rate (i.e. # admissions / # applicants).

ret2$Selectivity <- as.numeric(ret2$AdmissionsTotal) / 
                as.numeric(ret2$ApplicantsTotal)
ret2$UseAdmissionTestScores <- as.factor(as.character(ret2$UseAdmissionTestScores))

SAT Scores vs. Full-Time Retention

Regression Results

lm.out <- lm(FullTimeRetentionRate ~ SATWriting + SATMath + 
             Selectivity + UseAdmissionTestScores, 
             data=ret2, 
             weights=ret2$NumSATScores)
##                                    Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)                       10.978338   3.810083  2.8814  0.004073 ** 
## SATWriting                         0.040317   0.010305  3.9125 9.974e-05 ***
## SATMath                            0.084695   0.009935  8.5249 < 2.2e-16 ***
## Selectivity                        4.116193   1.403863  2.9320  0.003471 ** 
## UseAdmissionTestScoresRecommended -4.253294   2.948248 -1.4427  0.149540    
## UseAdmissionTestScoresRequired    -1.003809   2.715252 -0.3697  0.711717    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adjusted \(R^2\) = 0.75

Conclusion

SAT scores are a significant predictor of full-time retention rates for higher education institutions in the United States.

SAT scores and selectivity account for approximately 75% of the variance of full-time retention.