Chapter 1:
Standardizing Comparison Groups
for the COVID Era
ENABLING DEMAND SIDE PROGRAMS TO COMPETE AT SCALE
Summary
A comparison group should have a primary objective: identify a set of buildings likely to respond similarly to exogenous factors as would be expected of buildings enrolled in a demand-side energy program. However, this selection process is challenging for three reasons. First, different types of exogenous factors can lead to different types of responses. For example, a service territory-wide switch to a time-of-use rate structure would be expected to impact the energy usage patterns across the entire population. However, one might expect that income-sensitive customers with high peak-load consumption would be more sensitive to a time-of-use rate than a typical customer. Similarly, COVID-related impacts might be felt most acutely amongst customers, both residential and commercial, with greater work-from-home flexibility.
A second challenge associated with comparison group selection is that while comparison groups can be constructed based on historical data, exogenous events might introduce a new divergence between treated and comparison group buildings. For example, COVID-related energy changes due to business shutdowns were more extreme in certain small business sectors than in certain “essential” businesses, despite a broad similarity in consumption patterns prior to COVID that would otherwise indicate a good comparison group match.
A third challenge is the limited availability of data that might help account for differing exogenous effects. While, with the right data, we might be able to perform some filtering, such as classifying buildings according to their business type, other filters such as trying to determine which residential homes are adding occupants and which are losing occupants, for example, would be much more difficult to construct.
Along with the in-flight nature of a comparison group needed to facilitate meter-based programs, these three factors - dissimilar responses to exogenous factors, unpredictable exogenous events, and limited data for assigning buildings to cohorts - require the comparison group selection process to utilize a more standardized and consistent methodology than what might be found in traditional impact evaluations.
The methodological approach outlined here prioritizes replicability and universality, recognizing that under certain circumstances there will be a preference for waiting until after program participants have enrolled or for finding additional data about participants and non-participants to account for differential responses to exogenous conditions.
Many of the recommendations provided below stem from the results of analysis conducted with the support of MCE. Without MCE’s support to provide data for this project this effort would not be possible and we thank MCE for helping the entire demand side industry take on one of the more unique challenges in recent times.
The following recommendations flow from experience measuring the impacts of dozens of demand-side programs with both monthly and AMI data and from research results presented throughout the next five chapters and supporting appendices. The chapters that follow provide much more explanation, rationale, and data, and we encourage readers to explore this content.
Grid Methods Synopsis
A comparison group should have a primary objective: identify a set of buildings likely to respond similarly to exogenous factors as would be expected of buildings enrolled in a demand-side energy program. However, this selection process is challenging for three reasons. First, different types of exogenous factors can lead to different types of responses. For example, a service territory-wide switch to a time-of-use rate structure would be expected to impact the energy usage patterns across the entire population. However, one might expect that income-sensitive customers with high peak-load consumption would be more sensitive to a time-of-use rate than a typical customer. Similarly, COVID-related impacts might be felt most acutely amongst customers, both residential and commercial, with greater work-from-home flexibility.
A second challenge associated with comparison group selection is that while comparison groups can be constructed based on historical data, exogenous events might introduce a new divergence between treated and comparison group buildings. For example, COVID-related energy changes due to business shutdowns were more extreme in certain small business sectors than in certain “essential” businesses, despite a broad similarity in consumption patterns prior to COVID that would otherwise indicate a good comparison group match.
A third challenge is the limited availability of data that might help account for differing exogenous effects. While, with the right data, we might be able to perform some filtering, such as classifying buildings according to their business type, other filters such as trying to determine which residential homes are adding occupants and which are losing occupants, for example, would be much more difficult to construct.
Along with the in-flight nature of a comparison group needed to facilitate meter-based programs, these three factors - dissimilar responses to exogenous factors, unpredictable exogenous events, and limited data for assigning buildings to cohorts - require the comparison group selection process to utilize a more standardized and consistent methodology than what might be found in traditional impact evaluations.
The methodological approach outlined here prioritizes replicability and universality, recognizing that under certain circumstances there will be a preference for waiting until after program participants have enrolled or for finding additional data about participants and non-participants to account for differential responses to exogenous conditions.
Many of the recommendations provided below stem from the results of analysis conducted with the support of MCE. Without MCE’s support to provide data for this project this effort would not be possible and we thank MCE for helping the entire demand side industry take on one of the more unique challenges in recent times.
The following recommendations flow from experience measuring the impacts of dozens of demand-side programs with both monthly and AMI data and from research results presented throughout the next five chapters and supporting appendices. The chapters that follow provide much more explanation, rationale, and data, and we encourage readers to explore this content.
Grid Methods Synopsis
- Identify program-eligible participants
- All demand-side energy programs will have eligibility rules. Some are based on customer-specific designations, such as low-income or hard-to-reach customers. Others are based on sector, such as commercial, agricultural, residential, or industrial. Some programs may be intended for non-solar customers, while others may require that a customer have solar PV. Yet other programs might restrict participation based on energy consumption characteristics, such as high peak usage or annual usage within a certain range. These eligibility rules are valuable because they tend to organize customers into classes that respond similarly to exogenous events.
- Limit comparison group to eligible customers that meet program requirements. Identifying eligibility is also the first step to defining a relevant comparison pool from which a comparison group will ultimately be formed.
- Fit a CalTRACK 2.0 model on all eligible program participants prior to program launch. This model will uncover incomplete or missing data, erratic energy consumption patterns, and potential for higher savings. If program optimization techniques are applied, such as selecting targeted customers based on energy consumption profiles, customers who fit these criteria can be proportionally sampled as described in Chapter 3 in order to more specifically anticipate the likely program enrollees.
- Remove outlier customers from the comparison group sampling pool.
- Array all eligible customers by annualized consumption by fitting the baseline CalTRACK model to weather conditions of the baseline year.
- Remove customers with daily baseline CVRMSE values in excess of 1.0.
- Remove any remaining customers failing to meet program eligibility criteria
- Determine Comparison Group Requirements
- Annual savings that rely on daily or monthly savings calculations require different comparison group selection criteria then marginal hourly savings.
- Small treatment groups, irrespective of the granularity of savings calculations, require different comparison group selection criteria than large treatment groups
- Random Selection of Comparison Group from Population
- For some programs a sufficient comparison group can be formed via random selection from within an eligible comparison pool. The random selection should be made after filtering for program and other eligibility requirements.
- A comparison group selected randomly from an eligible population must minimize the potential for sampling error by selecting a large pool of non-participants. The assumption with this type of comparison group is that the treated customers are experiencing exogenous factors in the same way as the larger population. In this case, the randomly selected comparison group is expected to be representative of the larger population and is thus a suitable basis for calculating exogenous effects. Comparison group sizing is described in greater detail in Chapter 2.
- Random Selection of Comparison Group from Sub-Population
- If a treated group is not expected to reflect the usage patterns or bear the impact of exogenous factors the same as the population as a whole, the comparison group must be designed to reflect the treated group rather than the population. In this case, the sampling error observed is between the comparison group and the treated group rather than the comparison group and the population. Chapters 5 and 6 and Appendices B and C give more detail on quantifying and minimizing COVID-related residuals in the Residential and Commercial sectors.
- A treated group may be drawn from a targeted subset of the program eligible population. For example, a program may target customers with usage patterns substantially different from the program eligible population, such as those whose energy use peaks during peak evening hours. In this case, it will be important to draw a sample of non-participants from within the distribution of targeted participants.
- Targeting parameters will skew the distribution of a treated group away from the general population. The comparison group should attempt to replicate this skewness (as well as the probable kurtosis resulting from optimization strategies).
- If the treated group is likely to be large and normally distributed amongst the targeted population, a large and normally distributed comparison group drawn from within the same population will be the best way to achieve the desired similarity.
- Selection of Comparison Group from within Stratified Sample of Sub-Population
- If a treated group is substantially different from the comparison group selected, the sub-population may be resampled to select a comparison group more similar to the treatment group.
- Resampling should only occur once enrollment in the program has reached a sufficient level to support stratified sampling from the broader population of eligible non-participants. Chapter 3 provides a detailed procedure for conducting and optimizing stratified sampling.
- Multiple approaches to binning based on consumption parameters are acceptable. However, it should be noted that stratified sampling will not solve certain problems such as categorical bias, for example, where certain business sectors are differently affected by exogenous variables than other business sectors. Chapter 3 provides a decision framework to identify where random, proportional, or stratified sampling should be conducted
- Creation of Comparison Group Vintages and Difference of Differences
- Once a comparison group has been created, the baseline period of the comparison group must be aligned temporally with the baseline period of the participating customers.
- Where programs enroll customers over a period of time longer than 30 days, the comparison group must be rebaselined for each month of enrollment and a new vintage created that is assigned to a monthly cohort of enrolled participants.
- For each monthly cohort of participants, calculate a difference of differences of percentage savings between the treated customers and the associated vintage of the comparison group.
- The difference of differences in percentage terms can be multiplied by the raw total for the purposes of aggregation of multiple treated cohorts.
- The difference of differences calculation should be applied to the model counterfactual for the determination of savings. More detail is provided in Chapter 4 on conducting the difference of differences savings calculations.