Introduction to R for Stata Users
Let's set the software
R is a programming environment
Robert Gentleman and Ross Ihaka developed R at the University of Auckland, New Zealand in 1996.
They designed the language to combine the strengths of two existing languages, S and Scheme.
Tools are distributed as packages, which any user can download to customize the R environment.
https://cran.r-project.org/
Free Software.
RStudio is a better view (Similar to Stata). Problematic with an extensive database.
Data Type
1. Vector
Definition: A vector is a sequence of elements that share the same data type. A vector supports logical, integer, double, character, complex, or raw data types.
Example code
#Generating scalar
x<-2
#Generating a vector
x1 <- c(1,2,3)
x2 <- c(1,2,5.3,6,-2,4) # numeric vector
x3<- c("one","two","three") # character vector
x4 <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
x2
x2[c(2,4)] # 2nd and 4th elements of vector
Operations with vectors
x[c(posituin1, position)] subsetting
vectors rep(a, repetitions)
seq(from =,to =,by =)
a : b patterned Vectors
Exercise 1 - Practice with vectors
Suggested solution - 1 (Try yourself first)
2. Matrix
3. Arrays
4. Data Frame
Exercise 2 - Data frame
Suggested solution - 2 (Try yourself first)
5. List
6. Factor
Some functions to start
1. Get help
There are multiple blogs and help sources on the Internet. Try to google it and look for specific code.
R also can give you some advice using the following code
?options ## To Internet
help(options)
example(option)
example(lm)
# If the exact name of the command is not know
help.search("sum") # To Internet list of commands
apropos("sum")
2. Loops
3. Export and import
4. Merge function
5. More functions
6. Random variables
Linear Regression (Economists, such as myself, love regressions)
Let's study the demand for economics journals.
We begin with a small data set taken from Stock and Watson (2007) that provides information on the number of library subscriptions to economic journals in the US in 2000. The data set, collected initially by Bergstrom (2001), is available in package AER under the name Journals.
1. Upload database
We will need to install the package AER.
R has millions of packages that people create to run multiple statistical processes. Uploading packages in Windows is more straightforward than in IOS. In RStduio, I usually upload packages manually
Example code:
install.packages("AER") ## install packages
library(AER) ## Loaded a package
data ("Journals", package="AER") ## Call the date
Let's check the data before continuing
dim(Journals)
names(Journals)
2. Simple graphs
3. Estimations
Exercise 3 - Wage Equation
Suggested solution - 3 (Try yourself first)
Exercise 4 - Wages and year of experience
Suggested solution - 4 (Try yourself first)
Exercise 5 - Prices and subscripts
Suggested solution - 5 (Try yourself first)
4. Dichotomous variables (Dummy variables)
5. Non-Linear regressions
6. Comparison of models
Descriptive Statistics
In Stata, we can use the command summarize to calculate the descriptive statistics of the database. We can do the same in R with the following commands.
1. Mean, Median, and Standard Deviation
Example code:
rm(list=ls(all=TRUE)) # remove all the objects in the memory
data("CPS1985")
str(CPS1985)
head(CPS1985)
levels(CPS1985$occupation)[c(2, 6)] <- c("techn", "mgmt") #
attach(CPS1985) # to use column wage
summary(wage)
mean(wage)
median(wage)
var(wage)
sd(wage)
2. Histograms
3. More sophisticated graphs
Interactions, Separate, and Weights
y a + x Model without interaction. Identical slopes to x but different intercepts to a.
y a ∗ x Model with interaction. This interaction included ethnicity, education and the interaction between the two.
y a + x + a : x, the term a:x gives the difference in slopes compared with the reference category, in other words, just the interaction.
Example code:
#Interaction
cps_int <- lm(log(wage) ~ experience + I(experience^2) +
education * ethnicity, data = CPS1988)
# Test of coeficients
coeftest(cps_int)
cps_int <- lm(log(wage) ~ experience + I(experience^2) +
education + ethnicity + education:ethnicity,
data = CPS1988)
coeftest(cps_int) ## Both models are the same.
Separate regression for each level
As a further variation, it may be necessary to fit separate regressions for African-Americans and Caucasians.
This model specifies that the terms within parentheses are nested within ethnicity.
The term -1 removes the intercept of the nested model. A matrix to see results for both ethnicity
anova(model1, model2) the model where ethnicity interacts with every other regressor fits significantly better, at any reasonable level than the model without any interaction term.
Example code:
cps_sep <- lm(log(wage) ~ ethnicity /
(experience + I(experience^2) + education) - 1,
data = CPS1988)
#Estimate two models for separate
summary(cps_sep)
# To compare both models
cps_sep_cf <- matrix(coef(cps_sep), nrow = 2)
rownames(cps_sep_cf) <- levels(CPS1988$ethnicity)
colnames(cps_sep_cf) <- names(coef(cps_lm))[1:4]
cps_sep_cf
anova(cps_sep, cps_lm)
Weighted least squares
References
A Modern Approach to Regression with R.
An Introduction for R for Quantitative Economics.
R for STATA users.
Applied Econometric with R.
Comentarios