52 Matching Annotations
  1. Jul 2022
    1. # Create the dummy variable as defined above CASchools$D <- CASchools$STR < 20 # Plot the data plot(CASchools$D, CASchools$score, # provide the data to be plotted pch = 20, # use filled circles as plot symbols cex = 0.5, # set size of plot symbols to 0.5 col = "Steelblue", # set the symbols' color to "Steelblue" xlab = expression(D[i]), # Set title and axis names ylab = "Test Score", main = "Dummy Regression")

      If you want to reproduce the exact plot you need to run the following code:

                 CASchools$D <- CASchools$STR < 20                                   # Create the dummy variable as defined above
      
          mean.score.for.D.1 <- mean( CASchools$score[ CASchools$D == TRUE  ] )      # Compute the average score when D=1 (low  STR)
          mean.score.for.D.0 <- mean( CASchools$score[ CASchools$D == FALSE ] )      # Compute the average score when D=0 (high STR)
      
      
      
          plot( CASchools$score ~ CASchools$D,                                       # Plot the data
      
                       pch = 19,                       
                       cex = 0.5,                       
                       col = "Steelblue",                    
                      xlab = expression(D[i]),                
                      ylab = "Test Score",
                      main = "Dummy Regression")
      
          points( y = mean.score.for.D.0, x = 0,   col="red", pch = 19)               # Add the average for each group
          points( y = mean.score.for.D.1, x = 1,   col="red", pch = 19)
      
    1. # simulate the data

      Alternative code:

                     X <- runif(50, min = -5, max = 5)                                         # simulate the data 
                     u <- rnorm(50, sd = 1)  
                     Y <- X^2 + 2*X + u                                                        # the true relation, which is the true population regression equation
      
            mod_simple <- lm( Y ~ X )                                                          # estimate a simple but incorrect regression model
      
         mod_quadratic <- lm( Y ~ X + I(X^2) )                                                 # estimate the correct quadratic regression model
      
            prediction <- predict( mod_quadratic  , data.frame( X = sort(X) ) )                # predict (Y.hat) using the correct quadratic model
      
      
      
      par(mfrow=c(1,3))
      
              plot( Y ~ X, col = "steelblue", pch = 20, xlab = "X", ylab = "Y")                # plot the results
            abline( mod_simple, col = "red")                                                   #   red line = incorrect linear regression (this violates the first OLS assumption)
             lines( sort(X), prediction )                                                      # black line = correct quadratic regression
      
              plot( resid(mod_quadratic) ~ X , col = "steelblue", pch = 20, main=c("Quadratic regression","Residuals plotted against X"), ylim=c(-10,15) )
              plot( resid(mod_simple)    ~ X , col = "steelblue", pch = 20, main=c("Simple regression"   ,"Residuals plotted against X"), ylim=c(-10,15) )
      
    1. Here, the failure of the i.i.d. assumption implies that, on average, we underestimate μYμY<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>μ</mi><mi>Y</mi></msub></math> using ––YY―<math xmlns="http://www.w3.org/1998/Math/MathML"><mover><mi>Y</mi><mo accent="true">―</mo></mover></math>: the corresponding distribution of ––YY―<math xmlns="http://www.w3.org/1998/Math/MathML"><mover><mi>Y</mi><mo accent="true">―</mo></mover></math> is shifted to the left. In other words, ––YY―<math xmlns="http://www.w3.org/1998/Math/MathML"><mover><mi>Y</mi><mo accent="true">―</mo></mover></math> is a biased estimator for μYμY<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>μ</mi><mi>Y</mi></msub></math> if the i.i.d. assumption does not hold.

      Actually, the two density plots should have the same height if the sample sizes are both equal to 25. But in the code above you set size=10, even though the legend says n=25.

  2. Jun 2022
    1. # import data on the Wilshire 5000 index W5000 <- read.csv2("data/Wilshire5000.csv", stringsAsFactors = F, header = T, sep = ",", na.strings = ".") # transform the columns W5000$DATE <- as.Date(W5000$DATE) W5000$WILL5000INDFC <- as.numeric(W5000$WILL5000INDFC) # remove NAs W5000 <- na.omit(W5000) # compute daily percentage changes W5000_PC <- data.frame("Date" = W5000$DATE, "Value" = as.numeric(Delt(W5000$WILL5000INDFC) * 100)) W5000_PC <- na.omit(W5000_PC)

      I prefer to run a simpler code here:

             library(pdfetch)
             library(quantmod)
      
             W5000  = pdfetch_FRED("WILL5000INDFC")                      # fetch the data directly from the FRED database as an "xts" and "zoo" object
       names(W5000) = "W5000"                                            # the date format is "YYYY-MM-DD"
      
             W5000    = na.omit(  W5000["1989-12-29::2013-12-31"]  )     # subset the sample from 29 Dec 1989 to 31 Dec 2013, and remove the NAs
             W5000_PC = na.omit(  Delt(W5000)*100                  )     # compute the daily percentage changes, and remove the NAs
      
    1. The second one is the tt<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>-statistic for the hypothesis test that the drift term equals 00<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>0</mn></math>.

      Not correct. The second ADF test stat is an F-test on the joint null hypothesis that gamma=0 (unit root) and drift=0.

    1. Delt(PCECTPI)

      Error. The correct formula is: diff(PCECTPI)

      The plot ranges from -6 to 12, as it is in the Stock and Watson textbook. But your (incorrect) plot ranges from -1 to 4.

    2. function Delt() from the package quantmod

      Error. The function "Delt()" computes diff(log(X)), where X is in levels. But you defined X above to be in logs. So you cannot apply "Delt()". If X is already defined in logs, then you must use "diff()".

      The correct formula for the inflation rate is:

      400*diff( log(USMacroSWQ$PCECTPI) )

    1. # set the column names colnames(USMacroSWQ) <- c("Date", "GDPC96", "JAPAN_IP", "PCECTPI", "GS10", "GS1", "TB3MS", "UNRATE", "EXUSUK", "CPIAUCSL") # format the date column USMacroSWQ$Date <- as.yearqtr(USMacroSWQ$Date, format = "%Y:0%q")

      These lines of code are not necessary at all.

      Simplified code:

                       USMacroSWQ <- read_xlsx("us_macro_quarterly.xlsx")
      
                              GDP <- ts( USMacroSWQ$GDPC96,                    start = c(1957, 1), end = c(2013, 4), frequency = 4 )      # define GDP as ts object
                        GDPGrowth <- ts( 400*log(GDP[-1]/GDP[-length(GDP)]),   start = c(1957, 2), end = c(2013, 4), frequency = 4 )      # define GDP growth as a ts object
                            TB3MS <- ts( USMacroSWQ$TB3MS,                     start = c(1957, 1), end = c(2013, 4), frequency = 4 )      # 3-months Treasury bill interest rate as a 'ts' object
                           TB10YS <- ts( USMacroSWQ$GS10,                      start = c(1957, 1), end = c(2013, 4), frequency = 4 )      # 10-years Treasury bonds interest rate as a 'ts' object
      
                          TSpread <- TB10YS - TB3MS                                                                                       # generate the term spread series
      
    1. long-run cumulative multiplier of 0.37%0.37%<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>0.37</mn><mi mathvariant="normal">%</mi></math> which

      0.37 percentage points, not 0.37%

    2. Instead of using linearHypothesis()

      We can also use the "linearHypothesis()" function as follows, which gives us the same result as the Wald test:

      unrestricted_model <- FOJC_mod_CM2

      restricted_model <- FOJC_mod_CM1

      linearHypothesis(unrestricted_model, names( coef(unrestricted_model)[21:31] ) , vcov = NeweyWest(unrestricted_model, lag = m, prewhite = F) )

    3. truncation parameter mm<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>, we choose

      You could add this line to the code:

      m = ceiling( 0.75*( length(FDD)^(1/3) ) )

    4. increase of 0.5%0.5%<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>0.5</mn><mi mathvariant="normal">%</mi></math> in orange juice prices

      0.5 percentage point increase, not 0.5% increase

    1. For type = “trend”, the second statistics corresponds to the test that there is no unit root and no time trend while the third one corresponds to a test of the hypothesis that there is a unit root, no time trend and no drift term.
      # Interpretation of the ADF critical values:
      #
      #
      # type="none"  : no drift, no time trend in the regression: "tau1" refers to              Ho:                             gamma = 0 (unit root)  [t test]
      #
      #
      # type="drift" :    drift, no time trend in the regression: "tau2" refers to              Ho:                             gamma = 0 (unit root)  [t test]
      #                                                           "phi1" refers to the combined Ho:               drift = 0 and gamma = 0 (unit root)  [F test]
      #
      #
      # type="trend" :    drift and time trend in the regression: "tau3" refers to              Ho:                             gamma = 0 (unit root)  [t test]
      #                                                           "phi2" refers to the combined Ho:               drift = 0 and gamma = 0 (unit root)  [F test]
      #                                                           "phi3" refers to the combined Ho: trend = 0 and drift = 0 and gamma = 0 (unit root)  [F test]
      
    1. forecast error of 1.14%1.14%<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>1.14</mn><mi mathvariant="normal">%</mi></math>

      -1.14 percentage points, not 1.14%

    2. forecast error of −1.10%−1.10%<math xmlns="http://www.w3.org/1998/Math/MathML"><mo>−</mo><mn>1.10</mn><mi mathvariant="normal">%</mi></math>.

      -1.10 percentage points, not -1.10%

    1. stargazer(SR_AR1, SR_AR2, SR_AR4, title = "Autoregressive Models of Monthly Excess Stock Returns", header = FALSE, model.numbers = F, omit.table.layout = "n", digits = 3, column.labels = c("AR(1)", "AR(2)", "AR(4)"), dep.var.caption = "Dependent Variable: Excess Returns on the CSRP Value-Weighted Index", dep.var.labels.include = FALSE, covariate.labels = c("$excess return_{t-1}$", "$excess return_{t-2}$", "$excess return_{t-3}$", "$excess return_{t-4}$", "Intercept"), se = rob_se, omit.stat = "rsq")

      If you run the code like that you will get a consolidated table that does not make any sense. I suggest you run instead the following code:

              stargazer(
                          SR_AR1, 
                          SR_AR2, 
                          SR_AR4,
      
                           title = "AR Models of Monthly Excess Stock Returns",
                dep.var.caption  = "Dependent Variable: Excess Returns on the CSRP Value-Weighted Index",
          dep.var.labels.include = FALSE,
                   model.numbers = FALSE,
                   column.labels = c("AR(1)", "AR(2)", "AR(4)"),
                            type = "text",
                              se = rob_se )
      
    2. # read in data on stock returns SReturns <- read_xlsx("Data/Stock_Returns_1931_2002.xlsx", sheet = 1, col_types = "numeric")

      There's a problem reading the Excel file directly from the online folder. I prefer to run this alternative code, which reads the ASC file directly from the internet:

      SReturns <- read.csv("https://www.princeton.edu/~mwatson/Stock-Watson_3u/Students/EE_Datasets/Stock_Returns_1931_2002.asc", sep = "\t", header = FALSE )

      colnames(SReturns) <- c( "Year", "Month", "ExReturn", "ln_DivYield" )

      head(SReturns)

      StockReturns <- ts( SReturns[, 3:4], start = c(1931, 1), frequency = 12)

      head(StockReturns)

    1. from 4%4%<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>4</mn><mi mathvariant="normal">%</mi></math> to 5%5%<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>5</mn><mi mathvariant="normal">%</mi></math>)

      4 percentage points, not 4%

    1. is about 15.8%15.8%<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>15.8</mn><mi mathvariant="normal">%</mi></math>.

      15.8 percentage points, not 15.8 percent.

    2. a difference of 14.9%14.9%<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>14.9</mn><mi mathvariant="normal">%</mi></math>.

      difference of 14.9 percentage points, not 14.9 percent.

    1. estimate the price elasticity to be −0.899+0.374×ln(5)+0.141×[ln(1)×ln(5)]≈−0.3−0.899+0.374×ln⁡(5)+0.141×[ln⁡(1)×ln⁡(5)]≈−0.3<math xmlns="http://www.w3.org/1998/Math/MathML"><mo>−</mo><mn>0.899</mn><mo>+</mo><mn>0.374</mn><mo>×</mo><mi>ln</mi><mo data-mjx-texclass="NONE">⁡</mo><mo stretchy="false">(</mo><mn>5</mn><mo stretchy="false">)</mo><mo>+</mo><mn>0.141</mn><mo>×</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mi>ln</mi><mo data-mjx-texclass="NONE">⁡</mo><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo><mo>×</mo><mi>ln</mi><mo data-mjx-texclass="NONE">⁡</mo><mo stretchy="false">(</mo><mn>5</mn><mo stretchy="false">)</mo><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>≈</mo><mo>−</mo><mn>0.3</mn></math> so a one percent increase in price is predicted to reduce the demand by only 0.30.3<math xmlns="http://www.w3.org/1998/Math/MathML"><mn>0.3</mn></math> percent.

      This is not correct. The correct partial derivative is:

      price elasticity of demand (partial derivative) = del ln(subscriptions) / del ln(price) = beta.2 + beta.5*ln(age)

      In R this should be:

      beta.1 = Journals_mod4$coefficients[1] # intercept

      beta.2 = Journals_mod4$coefficients[2] # log(PricePerCitation)

      beta.3 = Journals_mod4$coefficients[3] # log(Age)

      beta.4 = Journals_mod4$coefficients[4] # log(Characters)

      beta.5 = Journals_mod4$coefficients[5] # log(PricePerCitation)*log(Age)

      price.elasticity.of.demand = function(age){ beta.2 + beta.5 * log(age) }

      price.elasticity.of.demand(age=5) # price elasticity of demand for new journals (age=5)

      price.elasticity.of.demand(age=80) # price elasticity of demand for old journals (age=80)