Load Libraries

library(car)
library(BCA)
library(RcmdrMisc)

Load CCS data

data(CCS, package="BCA")

We will use the Canadian Charitable Society’s dataset from the BCA package.

Explore dataset

Let’s take a look at the variables that make up the dataset, as well as their data types. Primarily, there are variables related to donation history and demographic.

names(CCS)
##  [1] "MonthGive"  "Region"     "YearsGive"  "AveDonAmt"  "LastDonAmt"
##  [6] "DonPerYear" "NewDonor"   "Age20t29"   "Age20t39"   "Age60pls"  
## [11] "Age70pls"   "Age80pls"   "AdultAge"   "SomeUnivP"  "FinUnivP"  
## [16] "hh1t2mem"   "hh1mem"     "AveIncEA"   "DwelValEA"  "EngPrmLang"
str(CCS)
## 'data.frame':    1600 obs. of  20 variables:
##  $ MonthGive : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Region    : Factor w/ 6 levels "R1","R2","R3",..: 1 1 2 2 2 4 3 3 2 1 ...
##  $ YearsGive : num  5 9 1 12 2 3 9 8 2 1 ...
##  $ AveDonAmt : num  31.2 25 25 25 25 ...
##  $ LastDonAmt: num  25 25 25 25 25 50 50 20 35 100 ...
##  $ DonPerYear: num  0.8 0.111 2 0.583 0.5 ...
##  $ NewDonor  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Age20t29  : num  0.0779 0.1493 0.3211 0.2 0.22 ...
##  $ Age20t39  : num  0.201 0.289 0.587 0.4 0.54 ...
##  $ Age60pls  : num  0.143 0.154 0.128 0.229 0.23 ...
##  $ Age70pls  : num  0.0584 0.0846 0.0917 0.1429 0.16 ...
##  $ Age80pls  : num  0 0.0199 0.0367 0.0667 0.07 ...
##  $ AdultAge  : num  47.3 45.5 39.5 46.2 44.5 ...
##  $ SomeUnivP : num  0.417 0.537 0.435 0.278 0.64 ...
##  $ FinUnivP  : num  0.1565 0.3235 0.2609 0.0972 0.4651 ...
##  $ hh1t2mem  : num  0.48 0.443 0.836 0.897 0.958 ...
##  $ hh1mem    : num  0.08 0.129 0.475 0.448 0.676 ...
##  $ AveIncEA  : num  71703 70120 28662 35419 34228 ...
##  $ DwelValEA : num  222017 263469 0 183521 262905 ...
##  $ EngPrmLang: num  1 0.866 0.793 0.861 0.847 ...

The str() command reveals that there are three factor variables and the rest are numeric.

Create estimation and validation samples

We’ll create a variable within the CCS dataset called Sample, which takes on either “Validation” or “Estimation”.

CCS$Sample <- create.samples(CCS, est = 0.50, val = 0.50, rand.seed = 1)

Recode variable for visual exploration

We will create a new numeric variable that is a zero if MonthGive=“No” and one if MonthGive=“Yes”. In doing so, we can more easily explore our target variable visually.

CCS <- within(CCS, {
  MonthGive.Num <- Recode(MonthGive, '"Yes"= 1; "No"= 0', as.factor.result=FALSE)
})

Visualize data

We will start by exploring the data visually. The scatterplot is one of the easiest ways to look for the nonlinear relationships between two variables. In this scatterplot, we will explore the MonthGive.Num and AveDonAmt variables.

scatterplot(MonthGive.Num~AveDonAmt, reg.line=lm, smooth=TRUE, spread=TRUE, 
  id.method='mahal', id.n = 2, boxplots='xy', span=0.5, data=CCS)