Implementing heteroskedasticity-consistent standard errors in SPSS (and SAS)

Homoskedasticity (also spelled as Homoscedasticity), or constant variance of regression error terms, is a key assumption of ordinary least squares (OLS) regression. When this assumption is violated, i.e. the errors are heteroskedastic (or heteroscedastic), the regression estimator is unbiased and consistent. However, it is less efficient and this leads to Type I error inflation or reduced statistical power for coefficient hypothesis tests.

Thus correcting for heteroskedasticity is necessary while conducting OLS. There are methods for this, which include transforming the data, use of weighted least squares (WLS) regression and generalized least squares (GLS) estimation. Another alternative is to use  heteroskedasticity-consistent standard error (HCSE) estimators of OLS parameter estimates (White, 1980).

Comparison of residuals between first order He...

Comparison of residuals between first order Heteroskedastic and Homoskedastic disturbances (Photo credit: Wikipedia)

HCSE are of four types. Standard errors from HC0 (the most common implementation) are best used for large sample sizes as these estimators are downward biased for small sample sizes. HC1, HC2, and HC3 estimators are better used for smaller samples.

Many researchers conduct their statistical analysis in STATA, which has in-built procedures for estimating standard errors using all of the HC methods. However, others use SPSS due to its pair-wise deletion capability (versus list-wise deletion in STATA) and suffer from its  lack of heteroskedasticity correction capabilities. This wonderful paper by Hayes and Cai, provides a macro (in the Appendix) that can implement HCSE estimators in SPSS. They also provide a similar macro for SAS.

Note that the macro has no error-handling procedures, hence pre-screening of the data is required. Also, missing data is handled by list-wise deletion (which might defeat the purpose of using SPSS for some users).

Another link to the paper is here.


Hayes, A.F. and Cai, L. (2007) , “Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation”, Behavior Research Methods, 39 – 4, 709-722, DOI: 10.3758/BF03192961.

White, H. (1980), “A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity,” Econometrica, 48, 817-838.

Factor Analysis in Stata

Conducting Exploratory Factor Analysis in Stata is relatively straight forward. Run the factor command, followed by the rotate command. There are also other post-estimation commands. For examples of running EFA in Stata, go here or here. Running a Confirmatory Factor Analysis in Stata is a little more complicated. A cfa module, which is maintained and updated by Stanislav Kolenikov, can be downloaded by running the following command in Stata:


net from


The command syntax is:

The accompanying paper which contains the description of the commands and illustrative examples is here -> cfa-sj-kolenikov.



Stata Annotated Output: Factor Analysis. UCLA: Academic Technology Services, Statistical Consulting Group.

from (accessed October 14, 2011).


John B. Willett, Conducting Exploratory Factor Analyses, in Selected Multivariate Data-Analytic Methods.

from (accessed October 14, 2011).


Stanislav Kolenikov, Confi rmatory factor analysis using cfa.

from (accessed October 14, 2011).

Response Rates in Organizational Surveys: How Much is Enough?

Books about survey research and survey design.

Image via Wikipedia

How much of a response rate is considered good in organizational surveys? The answer to this question seems to differ for different disciplines and journals. A few key points that I found in my reading of the literature are:

  • In 2005, studies that collected organizational level data (for example, data on sales, profit, strategic orientation, innovation) had an average response rate of 35%, with standard deviation of 18.2 (Baruch & Brooks, 2008).
  • Response rates have decreased over the years.
  • Response rates for studies that utilize individual level data are statistically significantly higher.
  • Response rates are statistically significantly lower for studies that are conducted outside the United States due to cultural differences.
  • Some research finds higher response rates for web-based surveys. Other research finds that web surveys have lower response rates due to confidentiality and security concerns.
  • Response rates from countries with high average power distance (Hofstede, 1980) are lower than countries with low average power distance (Harzing, 2000). Power distance reflects the average perception of differences in power within a society. Low power distance implies that less powerful members of institutions expect more consultative relationships with more powerful members, while high power distance implies a greater acceptance of autocratic relationships with those in higher, formal positions.  Thus studies conducted in India are expected to have lower response rates due to the high power distance score for India (77) as compared to the USA (40).


Baruch, Y. and Brooks, H. “Survey response rate levels and trends in organizational research“, Human Relations August 2008 61: 1139-1160, doi:10.1177/0018726708094863

Harzing, A.W. “Cross-national industrial mail surveys: Why do response rates differ between countries?” Industrial Marketing Management, 2000, 29, 243–54.

Hofstede, G. “Culture’s consequences: International differences in work-related values”, Vol. 5. Beverly Hills, CA: SAGE, 1980.

Smith, C. B.  “Casting the net: Surveying an Internet population”, . J. Comput. Mediat. Commun., 1997, 3: 77–84.

The Response Rate Conundrum in Survey Research

Response rates are suggested to be a critical indicator of survey and response quality. Thus research papers are expected to report response rates. However, this step is not as easy as it seems.

Response rate is defined as the percentage of the eligible sample that responds to the survey. As this definition indicates, how large is the eligible sample is an important criteria in this calculation.

Some texts and research papers suggest that non-contactable respondents be considered a part of the eligible sample. Thus,

Response Rate = Responses / Eligible Sample

where Eligible Sample = Responses + Refusals + Non-contacts

However, this is not true in several contexts. For example, making contact may be the only means by which one can establish the existence of a potential respondent. Or making contact may be the only way to determine eligibility. In such situations many papers define the eligible sample as responses plus refusals. This can plausibly lead to overstating the response rate.

Thus the response rate conundrum can be expressed as a range of response rates that lie with the following range:

Response Rate (Lower Bound) = Responses / (Responses + Refusals + Non-contacts)

Response Rate (Upper Bound) = Responses / (Responses + Refusals)

The true value of the response rate would lie near:

Response Rate (Likely) = Responses / (Responses + Refusals + EE(Non-contacts))

Here EE is Estimated Eligibility of non-contacts, i.e. the estimated proportion of non-contacts that would have been eligible. One way of calculating EE is by dividing the sum of responses and refusals (which is the determined eligible sample) by the number of contacted potential respondents.

An illustrative example is given below:


Non-verified Sample Pool: 100

Contacted Respondents: 50

Verified Sample: 25

Responses: 15

Refusals: 10


Response Rate (Lower Bound) = 15 / (15 + 10 + 50) = 5%

Response Rate (Upper Bound) = 10 / 25 = 40%

Response Rate (Likely) = 10 / (15 + 10 +(25/50)*50) = 20%