Buildings Hub contains a geographical hierarchy in order to allow users to drill up and down to explore local factors not evident from higher levels of data aggregation and to see the “forest through the trees.” The software we use on Buildings Hub, Microsoft Power BI, supports hierarchical datasets natively. It does not store the geography, however, and we must add our own version of geographical data to Buildings Hub. This is complex process to improve accuracy and usability. As the figure below shows, the ZIP Code serves as the base geography for Buildings Hub. For each ZIP Code, a corresponding primary city or town, county, electric utility, combined statistical area, state, and one of several regional divisions are defined. Essentially, this looks like a giant table with a single row for each ZIP Code and columns for all of the other geographic categories.
Why ZIP Codes?
ZIP Codes in the United States are a great way to explore local data below the level of a city or town. They’re also an easy way to identify a location, since all ZIP Codes are unique. There are exceptions to the ease of using ZIP Codes, of course, as some can cross municipal boundaries. At this time, there is no better way to connect a broad swath of datasets at the local level than using the ZIP Code. For quantitative datasets (like building surveys), the hierarchy lets users quickly add up data or breakdown data. Buildings Hub blends this data with the geography data on the Hub and allows users to easily add up the number of vehicles in a city, county, electric utility territory, and statewide.
Potential Errors in Geography
Microsoft Power BI only supports a one-to-many relationship, which means a ZIP Code can only map to a single city, county, electric utility, combined statistical area, state, or regional division. As a result, we had to make tradeoffs when assigning the geography for each ZIP Code, which creates an error when the data is aggregated. The structure of the geography allows for some inconsistencies in geography data, however. A single city can be listed in more than one county, a county can be listed in more than one electric utility territory, and so forth, as long as each instance can be connected to a unique ZIP Code. For example, both a municipal utility and American Electric Power operate in Columbus, Ohio. Buildings Hub distinguishes these two territories by ZIP Code, though that may not align well with the actual operating territories of the two utilities. Since the base geography is the ZIP Code, users should exercise caution when using Buildings Hub to aggregate and use data for statistical analyses.
Methodology to Map ZIP Codes to Electric Utility Territories
ZIP Codes are the base geography for Buildings Hub. Electric utility territories in the United States are not neatly divided by ZIP Code or many other geographic boundaries. In fact, many electric utilities can operate in the same ZIP Code, such as investor-owned utilities and municipal utilities, each serving different segments of a city. Buildings Hub uses a two-step process to make a best approximation of the primary electric utility in each ZIP Code. We rely on data from the U.S. Energy Information Administration (EIA-860 and EIA-861), which provides a listing of all electric utilities and counties that they operate in. We also use the utility-ZIP mapping from OpenEI. For each ZIP Code, we first choose the utility defined on OpenEI. If it’s not there, then we fall back on the utility-county lookup data from the U.S. EIA. If more than one utility operates in a ZIP code from OpenEI or a county from the EIA surveys, then we’ll choose the largest utility as measured in megawatt hours served, according to EIA data. The data is updated annually and the mapping of utility to ZIP code can be re-assigned based on the data from OpenEI and EIA.
As a last step in this process, we include a custom override table for ZIP codes using data provided directly by electric utilities in some states.